Vall-E Backend Training: "list indices must be integers or slices, not dict" #339

New Issue

Bluebomber182 · 2023-08-23T17:04:07Z

Bluebomber182 commented

2023-08-23 17:04:07 +00:00

I get this error message if I don't enable Slice Segments in the prepare dataset section. Is there a way to prepare the dataset without enabling Slice Segments If I already sliced the audio beforehand ?
list indices must be integers or slices, not dict

I get this error message if I don't enable Slice Segments in the prepare dataset section. Is there a way to prepare the dataset without enabling Slice Segments If I already sliced the audio beforehand ? `list indices must be integers or slices, not dict`

mrq commented

2023-08-23 21:01:56 +00:00

I guess that would explain #335 better. I'll finagle with the web UI and see why it's breaking when not slicing.

In the meantime you should be able to enable slicing even if things look fine anyways. I want to say to err on the side of caution, set the offsets to something like -100 and 100 so you don't have to try and play around with finding the right offset slices (faster-whisper-based WhisperX has different offsets than normal openai/whisper or anything based on that, and I don't recall safe slice offsets).

I will preface, though, that if you were looking to inference, you do not need to prepare a dataset, unlike for Bark's integration, you just need the ./voices/{voice}/ for it.

If you were looking to finetune, I would not use the web UI's config generator + training at the moment, as I have not updated those in a long long time. For the meantime, after preparing the dataset:

modify the ./training/valle/config.yaml's:
- dataset.training to ["./training/{voice}/valle/"]
- dataset.speaker_name_getter to `"lambda p: f'{p.parts[-2]}'"
- dataset.use_hdf5 to False
to train, with the current working directory set to your ai-voice-cloning folder, run: deepspeed --module vall_e.train yaml="./training/valle/config.yaml".
- you might need to prepend CUDA_HOME=/path/to/your/cuda/folder/ (mine is /opt/cuda/ but might be /usr/local/cuda/) or ROCM_HOME=/path/to/your/rocm/folder/, if using ROCm (mine is /opt/rocm/).

I guess that would explain https://git.ecker.tech/mrq/ai-voice-cloning/issues/335 better. I'll finagle with the web UI and see why it's breaking when not slicing. In the meantime you *should* be able to enable slicing even if things look fine anyways. I want to say to err on the side of caution, set the offsets to something like -100 and 100 so you don't have to try and play around with finding the right offset slices (faster-whisper-based WhisperX has different offsets than normal openai/whisper or anything based on that, and I don't recall safe slice offsets). I will preface, though, that if you were looking to inference, you do not need to prepare a dataset, unlike for Bark's integration, you just need the `./voices/{voice}/` for it. If you were looking to finetune, I would ***not*** use the web UI's config generator + training at the moment, as I have not updated those in a long long time. For the meantime, after preparing the dataset: * modify the `./training/valle/config.yaml`'s: - `dataset.training` to `["./training/{voice}/valle/"]` - `dataset.speaker_name_getter` to `"lambda p: f'{p.parts[-2]}'" - `dataset.use_hdf5` to `False` * to train, with the current working directory set to your `ai-voice-cloning` folder, run: `deepspeed --module vall_e.train yaml="./training/valle/config.yaml"`. - you might need to prepend `CUDA_HOME=/path/to/your/cuda/folder/` (mine is `/opt/cuda/` but might be `/usr/local/cuda/`) or `ROCM_HOME=/path/to/your/rocm/folder/`, if using ROCm (mine is `/opt/rocm/`).

mrq commented

2023-08-23 21:52:38 +00:00

The root issue should be fixed in commit 29290f574e.

Additionally, if you were going to finetune with the web UI, generating the training YAML should be working again in commit 0a5483e57a, as I had needed to update the template YAML. I do not know how well it works to train under the web UI, though.

The root issue should be fixed in commit 29290f574eb2f8aa9c2cf19b9aaf131a2fd1c3ff. Additionally, if you *were* going to finetune with the web UI, generating the training YAML should be working again in commit 0a5483e57a7809b296c3ee75a608d631530a515a, as I had needed to update the template YAML. I do not know how well it works to train under the web UI, though.

Bluebomber182 closed this issue

2023-08-24 13:57:29 +00:00

Sign in to join this conversation.