Add another sample so that your batch size is divisible by more numbers. Probably not the "correct" solution but the most expedient one.
You could try restarting with a higher learning rate for a lower number of iterations and see if it makes a difference.
Ahh, I forgot to mention that, sorry. Appending -ar 22050
to your ffmpeg command should fix it.
I haven't run into that error before. Does running ffprobe
on the wav files reveal any errors?
If your dataset has a thick accent you might need to check the transcriptions to make sure that they're accurate.
The most important thing is that when you do the transcription with whisperx
you specify --align_model WAV2VEC2_ASR_LARGE_LV60K_960H
or else the timestamps are going to be inaccurate. I have…
So, do you do it manually?
Semi-manually, I use whisperx
to produce a timestamped transcription and then feed the timestamps into ffmpeg
to cut things to size.
Also, do you recommend…
Or is it more iterations per epoch?
AIUI more iterations per epoch just means a smaller batch size.
Are multiple wavs required in the vocals folder, or just one good example?
Just…
I'd restart the training with a clean dataset, just to be sure.
so, how would you install? right in the main ai-voice-cloning directory?
I do pip install git+https://github.com/m-bain/whisperx.git
in my home directory, but keep in mind I've never used…
I wouldn't try cloning them in a subdirectory of ai-voice-cloning just in case the different venv's conflict.
You can select which GPU to use but CPU allocation is up to the OS, I think.
Can whisper run through a batch of single files with the same level of convenience?
Kind of. You can specify multiple files when you run it, ex: `whisperx --model large ---task transcribe…
Can we use english_cleaners also to train non english languages? Or the modification is necessary?
See the proviso in modules/dlas/dlas/models/audio/tts/tacotron2/text/cleaners.py
:
"""…
Haven't run into it. Can you run whisper
outside the script, and if so does it do the same thing?
Did you run the appropriate setup script?
You can check out models/tokenizers/japanese.json
for an example of how to do it, however because Japanese rules for syllable construction are far more limited you've got your work cut out for…