"Saved" voice is wildly inconsistent #211

New Issue

FrioGlakka · 2023-04-19T13:54:52Z

FrioGlakka commented

2023-04-19 13:54:52 +00:00

So I want to start with saying that I know that I've got no knoweldge whatsoever. Normally I try to find out what I'm doing wrong before claiming an issue, but I can't find any community, forum, discord server or whatever where you'll actually get answered on this subject.

So I'm on the fence if this is an issue or just me doing something wrong and making wrong assumptions.

I understand that this is mainly for voice cloning but for now, I'm fine with just using a synthesized default voice. Sometimes the "random" voice option creates a very nice voice. And I wish to "save" that one so I can re-use it. I assumed that copying over the cond_latents file from the random folder (after it's generated by using the 'random' voice option), and moving it into it's own 'custom voice' directory would allow me to select that voice. But every time I generate from that "saved" voice, it sounds like a new voice (pretty much like how the random voice option does that aswell). And it even jumps from female to male etc.

So, am I doing something wrong (shouldn't use random voice and expect it to be good for example), or should I not have assumed that this is unregular behaviour? Is it expected that a voice will often just suddenly be a female speaker etc?

And lastly, just because I don't know where someone can answer me this: If I want 1 model that I can prompt with 9 different voices, is that something this repo is suitable for? I've seen only finetunes that are made for 1 particular voice for now. If I want to train for my use case, do I need to mix all my 9 different speaker into the same training dataset?

So I want to start with saying that I know that I've got no knoweldge whatsoever. Normally I try to find out what I'm doing wrong before claiming an issue, but I can't find any community, forum, discord server or whatever where you'll actually get answered on this subject. So I'm on the fence if this is an issue or just me doing something wrong and making wrong assumptions. I understand that this is mainly for voice cloning but for now, I'm fine with just using a synthesized default voice. Sometimes the "random" voice option creates a very nice voice. And I wish to "save" that one so I can re-use it. I assumed that copying over the cond_latents file from the random folder (after it's generated by using the 'random' voice option), and moving it into it's own 'custom voice' directory would allow me to select that voice. But every time I generate from that "saved" voice, it sounds like a new voice (pretty much like how the random voice option does that aswell). And it even jumps from female to male etc. So, am I doing something wrong (shouldn't use random voice and expect it to be good for example), or should I not have assumed that this is unregular behaviour? Is it expected that a voice will often just suddenly be a female speaker etc? And lastly, just because I don't know where someone can answer me this: If I want 1 model that I can prompt with 9 different voices, is that something this repo is suitable for? I've seen only finetunes that are made for 1 particular voice for now. If I want to train for my use case, do I need to mix all my 9 different speaker into the same training dataset?

psammites commented

2023-04-19 14:48:49 +00:00

Did you reuse the same seed?

👍 1

FrioGlakka commented

2023-04-19 16:33:47 +00:00

I did not, and I'm now left wondering how I couldn't have thought about that before writing this entire post.

So thanks for that.

I did not, and I'm now left wondering how I couldn't have thought about that before writing this entire post. So thanks for that.

FrioGlakka closed this issue

2023-04-19 16:33:49 +00:00

Sign in to join this conversation.