Audio artifacts/repetitive words after training #296
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#296
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hey, I'm training a voice and under training/MyVoice/audio I have about 936 files, and the length varies quite a lot. A lot of them are only a second or two long, many go to around 7 seconds, and few go to around 20 seconds.
I'm adding the graphs of my training below
The output of the audio is not ideal. To get stable audio, I'm reducing all the randomness as much as possible, I'm increasing the penalties for length and repetition, and still, I get audio artefacts. For example, if I type "How was your day?", the audio is like "How was your day? How was your day? How waaaaass ahhhhh"
Any idea of what I'm doing wrong? Is it my audio data? Am I training for too long?
Audio artifacts/repetitve words after trainningto Audio artifacts/repetitive words after trainingTake a look at this, might be helpful for diagnosing your issue #82 (comment)
I wouldn't be surprised if those artifacts and weirdness are because of so many short samples used to train it on.
I would also increase the temperature to over .5, and increase diffusion temp to above .75, decrease the cond free K, and increase the CVVP but ymmv depending on the model and inference samples.
repetitive words are usually the audio and train.txt not matching...which happens a lot even with WhisperX