It's not an error, just a warning. Does it still work?
What's the recommendation on number of epochs when training for a dataset of 200 vs something like 1000 clips (assuming they're all cut down between 1 and 11 seconds and transcribed properly)?
…
I had actually assumed that, since I couldn't get the program to work for me, I could just skip this step by manually transcribing each .wav file in the whisper.json file manually; line by line.…
After you've trained a model am I correct in saying that the voice chunks should be set to 0 when you're using that model?
IFUIC when set to 0 it'll attempt to calculate a size automatically…
It makes sense, since there's no actual option to refresh the "dataset source" list, there was no way for me to select my voice from the list
"Refresh Voice List" on the Generate tab will…
Hmm. What's in the directory for the voice you're attempting to prepare the dataset from? Are the files valid .wav's?
What do you have set as your Whisper Backend?
The .pth files are actually zips. See if you can open your 300_gpt.pth in 7z or similiar archive program. If it's corrupted you might be out of luck.
Somehow I missed the [Training] [2023-03-23T01:40:04.035070] ModuleNotFoundError: No module named 'dlas'
bit above. You might need to re-run the setup script. If that doesn't fix it I could try…
If you have Notepad++ you can open up those two files, then go to Encoding>Convert to UTF8, save them and see if there's any difference.
Huh, that's weird. Here's a log for some training I did earlier today to compare with:
Spawning process: ./train.sh ./training/HyeonSeo/train.yaml
[Training] [2023-03-23T13:12:10.328080]…
And the lr_scheduler one (Though it didn't show in this instance):
This this one occurs for me also, but AFAIK it's harmless, the only thing there that looks weird to me is:
[Training]…
Use a small subset then.
With a small subset (8 clips of ~4 seconds each):
1 chunk: https://vocaroo.com/15lY8pR1WRhb 2 chunks: https://vocaroo.com/19R30vtl8gjn 4 chunks: https://vocaroo.c…
Too large. Start small and increase upwards.
Large data set, smaller values OOM.
...because you need to click (Re)compute Voice Latents when you want to regenerate them.
<face palm emoji>
Anyway, with regenerating the latents between each:
512 chunks:…
Regardless of semantics, the same principle I've preached applies: play around with it
Okay:
sneed@FMRLYCHKS:~/ai-voice-cloning/results/HyeonSeo$ ll
total 849648
drwxrwxrwx 1 sneed…
Which goes back to the main thing I keep telling you all: play around with the damn voice latent chunk size slider. The defaults will never, ever be a catch-all size.
On the Wiki you…
Likely to be invalid UTF8 characters in your train.txt or validation.txt files. Are you training a language other than English?