"Saved" voice is wildly inconsistent #211
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#211
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
So I want to start with saying that I know that I've got no knoweldge whatsoever. Normally I try to find out what I'm doing wrong before claiming an issue, but I can't find any community, forum, discord server or whatever where you'll actually get answered on this subject.
So I'm on the fence if this is an issue or just me doing something wrong and making wrong assumptions.
I understand that this is mainly for voice cloning but for now, I'm fine with just using a synthesized default voice. Sometimes the "random" voice option creates a very nice voice. And I wish to "save" that one so I can re-use it. I assumed that copying over the cond_latents file from the random folder (after it's generated by using the 'random' voice option), and moving it into it's own 'custom voice' directory would allow me to select that voice. But every time I generate from that "saved" voice, it sounds like a new voice (pretty much like how the random voice option does that aswell). And it even jumps from female to male etc.
So, am I doing something wrong (shouldn't use random voice and expect it to be good for example), or should I not have assumed that this is unregular behaviour? Is it expected that a voice will often just suddenly be a female speaker etc?
And lastly, just because I don't know where someone can answer me this: If I want 1 model that I can prompt with 9 different voices, is that something this repo is suitable for? I've seen only finetunes that are made for 1 particular voice for now. If I want to train for my use case, do I need to mix all my 9 different speaker into the same training dataset?
Did you reuse the same seed?
I did not, and I'm now left wondering how I couldn't have thought about that before writing this entire post.
So thanks for that.