Two randoms in the list, shifting random voices #98

New Issue

st33lmouse · 2023-03-09T03:58:26Z

st33lmouse commented

2023-03-09 03:58:26 +00:00

Did a update.sh. just now

Two randoms appear in the voice list after clicking refresh voice. Clicking on one selects them both. It doesn't matter which one you click on.

Random voice changes with each generation of a line \n
So each of these lines gets a new voice:

this is a test. \n
of the emergency broadcasting system \n
it is only a test \n

Similarly, setting generations to 3 with this input:
This is a test

will give you three different voices for each reading.

Did a update.sh. just now Two randoms appear in the voice list after clicking refresh voice. Clicking on one selects them both. It doesn't matter which one you click on. Random voice changes with each generation of a line \n So each of these lines gets a new voice: this is a test. \n of the emergency broadcasting system \n it is only a test \n Similarly, setting generations to 3 with this input: This is a test will give you three different voices for each reading.

mrq commented

2023-03-09 04:24:43 +00:00

Two randoms appear in the voice list after clicking refresh voice. Clicking on one selects them both. It doesn't matter which one you click on.

Oh right. I suppose it's a side-effect from how I make embedding a random voice's latents into the output possible. Remedied in dc1902b91c.

Random voice changes with each generation of a line \n

Gross, I'll need to validate it again. It worked earlier, but I wonder if I made a regression happen.

> Two randoms appear in the voice list after clicking refresh voice. Clicking on one selects them both. It doesn't matter which one you click on. Oh right. I suppose it's a side-effect from how I make embedding a random voice's latents into the output possible. Remedied in dc1902b91c1a16aca94df5bc2fcd761c19c8eb75. > Random voice changes with each generation of a line \n Gross, I'll need to validate it again. It worked earlier, but I wonder if I made a regression happen.

mrq commented

2023-03-09 04:57:32 +00:00

Ah, I think I understand what causes it.

The latents and seed used most definitely persist between lines, but any slight variation in the input lines will cause it to diverge pretty fast, even at low temperatures like 0.05.

Unfortunately, there's not much I can do about it.

I suppose implementing loopback (take the output, compute latents against it and use that for subsequent generations) could maybe fix it, but I'm not too sure if it would. And desu it seems like a bit of work to implement for a very small use-case, especially when it's technically still beyond the scope of AIVC (voice cloning, not voice synthesis), and I can't imagine another use-case for it, since it's not like Stable Diffusion where loopback img2img has applications.

If you insist on retaining a random voice, I suggest doing manual loopback, where you generate one line, and then treat it as a voice input. If it somehow works without much of a quality degradation, maybe I'll implement it.

Ah, I think I understand what causes it. The latents and seed used most definitely persist between lines, but any slight variation in the input lines will cause it to diverge pretty fast, even at low temperatures like 0.05. Unfortunately, there's not much I can do about it. I *suppose* implementing loopback (take the output, compute latents against it and use that for subsequent generations) *could* maybe fix it, but I'm not too sure if it would. And desu it seems like a bit of work to implement for a very small use-case, especially when it's technically still beyond the scope of AIVC (voice cloning, not voice synthesis), and I can't imagine another use-case for it, since it's not like Stable Diffusion where loopback img2img has applications. If you insist on retaining a random voice, I suggest doing manual loopback, where you generate one line, and then treat it as a voice input. If it somehow works without much of a quality degradation, *maybe* I'll implement it.

mrq closed this issue

2023-03-09 04:57:32 +00:00

Sign in to join this conversation.