Check this out if you haven't https://git.ecker.tech/mrq/ai-voice-cloning/wiki/Training
I have; those settings don't really give me a good point of reference.
I tried redoing it with commit 0231550287 from about 2 weeks ago, and the output was much better; close to the dataset voice. The training ran much faster too.
This repo itself doesn't…
Did redoing it include re-preparing the dataset using the old version? I've had terrible luck with the audio slicing in the newer versions.
Nope. I reused the exact same audio files and…
I'm having issues too. I trained a model with a single-voice dataset normalized to between 1-11 seconds, using a recent version of the repo, and got a terrible voice that was way too deep.
I…
Just to clear up my understanding: The recommendation now is to not use all-in-one files and instead make sure the audio clips post-transcription are always under 11.6s?
How feasible would it be to run sliced audio through Whisper a second time to see if the transcription matches the sliced audio? If the re-transcription doesn't match, you can throw out that slice.
The biggest problem I'm having when it comes to transcription, even when using WhisperX, is the transcribed text having a full sentence, but the audio having the first or last word or two cut off.…