Is there anyway to save the voice from a random generation ? #264

Open
opened 2023-06-12 21:45:49 +00:00 by wiznat · 2 comments

I noticed in the original TTTS repo they author notes :

Random voice :
For the those in the ML space: this is created by projecting a random vector onto the voice conditioning latent space.

Full disclosure, I am not well versed in ML, but is there any hacky solution I could do to save whatever this random vector is and create a voice with it?

If so, where in the code would I find the random generation going on?

I noticed in the original TTTS repo they author notes : Random voice : For the those in the ML space: this is created by projecting a random vector onto the voice conditioning latent space. Full disclosure, I am not well versed in ML, but is there any hacky solution I could do to save whatever this random vector is and create a voice with it? If so, where in the code would I find the random generation going on?
Owner

If you have Embed Output Metadata enabled in settings, the latents used for that generation are "embedded" into the result sound file. You can take that into Utilities > Import / Analyze and the web UI can rip the latents back out, and you can place it in a new folder under ./voices/.

However, if I remember right, they're rather sensitive to not sound all that similar across generations, so your mileage will vary if you're looking to reroll the dice for a new voice and wanting to keep using it.

If you have `Embed Output Metadata` enabled in settings, the latents used for that generation are "embedded" into the result sound file. You can take that into `Utilities > Import / Analyze` and the web UI can rip the latents back out, and you can place it in a new folder under `./voices/`. However, if I remember right, they're rather *sensitive* to not sound all that similar across generations, so your mileage will vary if you're looking to reroll the dice for a new voice and wanting to keep using it.

I can testify that the saved random voices aren't "solid" or however you'd call it.

What I mean is that I had some nice random voices saved, and sometimes they would change gender when generating audio.

So your best bet would probably be trying to generate as much audio as possible from the random voice you like, and then using those clips to generate latents with as a custom voice?

I can testify that the saved random voices aren't "solid" or however you'd call it. What I mean is that I had some nice random voices saved, and sometimes they would change gender when generating audio. So your best bet would probably be trying to generate as much audio as possible from the random voice you like, and then using those clips to generate latents with as a custom voice?
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#264
No description provided.