need help, trying to train voice but just sounds like a generic male tts voice? #129
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#129
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
attached are the voice files im using to train with as well as the results i'm having, it seems like it doesn't seem to get even close to matching a similar voice
here are some of the files im using and the results
https://mega.nz/folder/NzhS0R5A#-uzK5nMySVFH1GetYDa-5A
I have the same problem too. I created a Merida dataset with wav files that are more than 0.6 seconds and less than 11 seconds. I trained it up to 2160 steps. With 29 seconds worth of samples, the output files sound like a generic British accent instead of a Scottish accent. If I put in more samples, the output files will have an American accent.
Dataset with train.txt file
https://files.catbox.moe/u0esfz.zip
Model
https://pixeldrain.com/u/pKEfJPdV
Edit: I got it to work by following the suggested training settings linked below. The only thing I changed was setting epochs to 250 after looking at the /vsg/ AI Voice Synthesis General archives. I have 204 wav files that are 10 minutes and 40 seconds in total.
https://git.ecker.tech/mrq/ai-voice-cloning/wiki/Training#suggested-settings
https://desuarchive.org/g/thread/91867084/#q91878556
Edit 2: I made a new model with louder wav files, and the accent is American instead of Scottish for some reason. It's the same settings as the previous model. Then I made a new model with the learning rate set to 1e-3 and the sound quality of the output files were terrible. So I made another new model with the learning rate set to 1e-4, huge improvement.