"Saved" voice is wildly inconsistent #211

Closed
opened 2023-04-19 13:54:52 +00:00 by FrioGlakka · 2 comments

So I want to start with saying that I know that I've got no knoweldge whatsoever. Normally I try to find out what I'm doing wrong before claiming an issue, but I can't find any community, forum, discord server or whatever where you'll actually get answered on this subject.

So I'm on the fence if this is an issue or just me doing something wrong and making wrong assumptions.

I understand that this is mainly for voice cloning but for now, I'm fine with just using a synthesized default voice. Sometimes the "random" voice option creates a very nice voice. And I wish to "save" that one so I can re-use it. I assumed that copying over the cond_latents file from the random folder (after it's generated by using the 'random' voice option), and moving it into it's own 'custom voice' directory would allow me to select that voice. But every time I generate from that "saved" voice, it sounds like a new voice (pretty much like how the random voice option does that aswell). And it even jumps from female to male etc.

So, am I doing something wrong (shouldn't use random voice and expect it to be good for example), or should I not have assumed that this is unregular behaviour? Is it expected that a voice will often just suddenly be a female speaker etc?

And lastly, just because I don't know where someone can answer me this: If I want 1 model that I can prompt with 9 different voices, is that something this repo is suitable for? I've seen only finetunes that are made for 1 particular voice for now. If I want to train for my use case, do I need to mix all my 9 different speaker into the same training dataset?

So I want to start with saying that I know that I've got no knoweldge whatsoever. Normally I try to find out what I'm doing wrong before claiming an issue, but I can't find any community, forum, discord server or whatever where you'll actually get answered on this subject. So I'm on the fence if this is an issue or just me doing something wrong and making wrong assumptions. I understand that this is mainly for voice cloning but for now, I'm fine with just using a synthesized default voice. Sometimes the "random" voice option creates a very nice voice. And I wish to "save" that one so I can re-use it. I assumed that copying over the cond_latents file from the random folder (after it's generated by using the 'random' voice option), and moving it into it's own 'custom voice' directory would allow me to select that voice. But every time I generate from that "saved" voice, it sounds like a new voice (pretty much like how the random voice option does that aswell). And it even jumps from female to male etc. So, am I doing something wrong (shouldn't use random voice and expect it to be good for example), or should I not have assumed that this is unregular behaviour? Is it expected that a voice will often just suddenly be a female speaker etc? And lastly, just because I don't know where someone can answer me this: If I want 1 model that I can prompt with 9 different voices, is that something this repo is suitable for? I've seen only finetunes that are made for 1 particular voice for now. If I want to train for my use case, do I need to mix all my 9 different speaker into the same training dataset?

Did you reuse the same seed?

Did you reuse the same seed?
Author

I did not, and I'm now left wondering how I couldn't have thought about that before writing this entire post.

So thanks for that.

I did not, and I'm now left wondering how I couldn't have thought about that before writing this entire post. So thanks for that.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#211
No description provided.