Vall-E Backend Training: "list indices must be integers or slices, not dict" #339

Closed
opened 2023-08-23 17:04:07 +00:00 by Bluebomber182 · 2 comments

I get this error message if I don't enable Slice Segments in the prepare dataset section. Is there a way to prepare the dataset without enabling Slice Segments If I already sliced the audio beforehand ?
list indices must be integers or slices, not dict

I get this error message if I don't enable Slice Segments in the prepare dataset section. Is there a way to prepare the dataset without enabling Slice Segments If I already sliced the audio beforehand ? `list indices must be integers or slices, not dict`
Owner

I guess that would explain #335 better. I'll finagle with the web UI and see why it's breaking when not slicing.

In the meantime you should be able to enable slicing even if things look fine anyways. I want to say to err on the side of caution, set the offsets to something like -100 and 100 so you don't have to try and play around with finding the right offset slices (faster-whisper-based WhisperX has different offsets than normal openai/whisper or anything based on that, and I don't recall safe slice offsets).

I will preface, though, that if you were looking to inference, you do not need to prepare a dataset, unlike for Bark's integration, you just need the ./voices/{voice}/ for it.

If you were looking to finetune, I would not use the web UI's config generator + training at the moment, as I have not updated those in a long long time. For the meantime, after preparing the dataset:

  • modify the ./training/valle/config.yaml's:
    • dataset.training to ["./training/{voice}/valle/"]
    • dataset.speaker_name_getter to `"lambda p: f'{p.parts[-2]}'"
    • dataset.use_hdf5 to False
  • to train, with the current working directory set to your ai-voice-cloning folder, run: deepspeed --module vall_e.train yaml="./training/valle/config.yaml".
    • you might need to prepend CUDA_HOME=/path/to/your/cuda/folder/ (mine is /opt/cuda/ but might be /usr/local/cuda/) or ROCM_HOME=/path/to/your/rocm/folder/, if using ROCm (mine is /opt/rocm/).
I guess that would explain https://git.ecker.tech/mrq/ai-voice-cloning/issues/335 better. I'll finagle with the web UI and see why it's breaking when not slicing. In the meantime you *should* be able to enable slicing even if things look fine anyways. I want to say to err on the side of caution, set the offsets to something like -100 and 100 so you don't have to try and play around with finding the right offset slices (faster-whisper-based WhisperX has different offsets than normal openai/whisper or anything based on that, and I don't recall safe slice offsets). I will preface, though, that if you were looking to inference, you do not need to prepare a dataset, unlike for Bark's integration, you just need the `./voices/{voice}/` for it. If you were looking to finetune, I would ***not*** use the web UI's config generator + training at the moment, as I have not updated those in a long long time. For the meantime, after preparing the dataset: * modify the `./training/valle/config.yaml`'s: - `dataset.training` to `["./training/{voice}/valle/"]` - `dataset.speaker_name_getter` to `"lambda p: f'{p.parts[-2]}'" - `dataset.use_hdf5` to `False` * to train, with the current working directory set to your `ai-voice-cloning` folder, run: `deepspeed --module vall_e.train yaml="./training/valle/config.yaml"`. - you might need to prepend `CUDA_HOME=/path/to/your/cuda/folder/` (mine is `/opt/cuda/` but might be `/usr/local/cuda/`) or `ROCM_HOME=/path/to/your/rocm/folder/`, if using ROCm (mine is `/opt/rocm/`).
Owner

The root issue should be fixed in commit 29290f574e.

Additionally, if you were going to finetune with the web UI, generating the training YAML should be working again in commit 0a5483e57a, as I had needed to update the template YAML. I do not know how well it works to train under the web UI, though.

The root issue should be fixed in commit 29290f574eb2f8aa9c2cf19b9aaf131a2fd1c3ff. Additionally, if you *were* going to finetune with the web UI, generating the training YAML should be working again in commit 0a5483e57a7809b296c3ee75a608d631530a515a, as I had needed to update the template YAML. I do not know how well it works to train under the web UI, though.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#339
No description provided.