Is there anyway to save the voice from a random generation ? #264
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#264
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I noticed in the original TTTS repo they author notes :
Random voice :
For the those in the ML space: this is created by projecting a random vector onto the voice conditioning latent space.
Full disclosure, I am not well versed in ML, but is there any hacky solution I could do to save whatever this random vector is and create a voice with it?
If so, where in the code would I find the random generation going on?
If you have
Embed Output Metadata
enabled in settings, the latents used for that generation are "embedded" into the result sound file. You can take that intoUtilities > Import / Analyze
and the web UI can rip the latents back out, and you can place it in a new folder under./voices/
.However, if I remember right, they're rather sensitive to not sound all that similar across generations, so your mileage will vary if you're looking to reroll the dice for a new voice and wanting to keep using it.
I can testify that the saved random voices aren't "solid" or however you'd call it.
What I mean is that I had some nice random voices saved, and sometimes they would change gender when generating audio.
So your best bet would probably be trying to generate as much audio as possible from the random voice you like, and then using those clips to generate latents with as a custom voice?