Two randoms in the list, shifting random voices #98
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#98
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Did a update.sh. just now
Two randoms appear in the voice list after clicking refresh voice. Clicking on one selects them both. It doesn't matter which one you click on.
Random voice changes with each generation of a line \n
So each of these lines gets a new voice:
this is a test. \n
of the emergency broadcasting system \n
it is only a test \n
Similarly, setting generations to 3 with this input:
This is a test
will give you three different voices for each reading.
Oh right. I suppose it's a side-effect from how I make embedding a random voice's latents into the output possible. Remedied in
dc1902b91c
.Gross, I'll need to validate it again. It worked earlier, but I wonder if I made a regression happen.
Ah, I think I understand what causes it.
The latents and seed used most definitely persist between lines, but any slight variation in the input lines will cause it to diverge pretty fast, even at low temperatures like 0.05.
Unfortunately, there's not much I can do about it.
I suppose implementing loopback (take the output, compute latents against it and use that for subsequent generations) could maybe fix it, but I'm not too sure if it would. And desu it seems like a bit of work to implement for a very small use-case, especially when it's technically still beyond the scope of AIVC (voice cloning, not voice synthesis), and I can't imagine another use-case for it, since it's not like Stable Diffusion where loopback img2img has applications.
If you insist on retaining a random voice, I suggest doing manual loopback, where you generate one line, and then treat it as a voice input. If it somehow works without much of a quality degradation, maybe I'll implement it.