CVVP latents missing #80

Closed
opened 2023-03-07 07:35:42 +00:00 by yqxtqymn · 6 comments

Since recent update code won't calculate CVVP latents.

Requesting weighing against CVVP weight, but voice latents are missing some extra data. Please regenerate your voice latents.

Tried regenerating latents, 1 CVVP weight, bigvgan and univnet all produce same problem.

Since recent update code won't calculate CVVP latents. > Requesting weighing against CVVP weight, but voice latents are missing some extra data. Please regenerate your voice latents. Tried regenerating latents, 1 CVVP weight, bigvgan and univnet all produce same problem.
Owner

Per the wiki:

Slimmer Computed Latents: falls back to the original, 12.9KiB way of storing latents (without the extra bits required for using the CVVP model).

You shouldn't even be using the CVVP.

Per [the wiki](https://git.ecker.tech/mrq/ai-voice-cloning/wiki/Settings#settings): > Slimmer Computed Latents: falls back to the original, 12.9KiB way of storing latents (without the extra bits required for using the CVVP model). You shouldn't even be using the CVVP.
mrq closed this issue 2023-03-07 13:24:24 +00:00
Author

Why shouldn't I use CVVP? I had some good generations with it in the 0.8-1 range.

Why shouldn't I use CVVP? I had some good generations with it in the 0.8-1 range.
Owner

Per https://github.com/neonbjb/tortoise-tts#v24-2022517:

Removed CVVP model. Found that it does not, in fact, make an appreciable difference in the output.

Per https://nonint.com/2022/04/25/tortoise-architectural-design-doc/:

CVVP’s contribution to Tortoise is minor. It could be entirely ommitted and you would still be left with a very good TTS program. I do not have a way to quantify its contribution to Tortoise, but I have subjectively been able to tell outputs that were generated with CVVP in the picture versus those that were generated without CVVP.

Per my own findings, it increases generation time for little to no gain. It's something I've tried seeking quality uplifts and found none in many of my early tests. If it works for you, great, but it's not something I'll endorse (for lack of a better term).

Anyways, I've made the message more detailed, as it was before I've added the setting that governs exporting the data for comparing against the CVVP. The setting isn't on by default because of mrq/tortoise-tts#10 (it greatly increases output size if embedding metadata + latents are enabled, which are by default).

Per https://github.com/neonbjb/tortoise-tts#v24-2022517: > Removed CVVP model. Found that it does not, in fact, make an appreciable difference in the output. Per https://nonint.com/2022/04/25/tortoise-architectural-design-doc/: > CVVP’s contribution to Tortoise is minor. It could be entirely ommitted and you would still be left with a very good TTS program. I do not have a way to quantify its contribution to Tortoise, but I have subjectively been able to tell outputs that were generated with CVVP in the picture versus those that were generated without CVVP. Per my own findings, it increases generation time for little to no gain. It's something I've tried seeking quality uplifts and found none in many of my early tests. If it works for you, great, but it's not something I'll endorse (for lack of a better term). Anyways, I've made the message more detailed, as it was before I've added the setting that governs exporting the data for comparing against the CVVP. The setting isn't on by default because of https://git.ecker.tech/mrq/tortoise-tts/issues/10 (it greatly increases output size if embedding metadata + latents are enabled, which are by default).
Author

I appreciate the detailed response. Was not aware support got dropped in the original Tortoise.

CVVP impact is small, but it's a nice to have and play around with when you're trying to get a generation just right.

As far as I can tell CVVP is completely broken since yesterday's commit. If it's a small change to get it functional again (even if not officially endorsed) then I would consider it worth it. Of course it's up to you whether or not you wish to carry it into the future.

If you'd like I could give it a go and see if I can get it working?

I appreciate the detailed response. Was not aware support got dropped in the original Tortoise. CVVP impact is small, but it's a nice to have and play around with when you're trying to get a generation just right. As far as I can tell CVVP is completely broken since yesterday's commit. If it's a small change to get it functional again (even if not officially endorsed) then I would consider it worth it. Of course it's up to you whether or not you wish to carry it into the future. If you'd like I could give it a go and see if I can get it working?
Owner

Slimmer Computed Latents: falls back to the original, 12.9KiB way of storing latents (without the extra bits required for using the CVVP model).

Disable the setting.

> Slimmer Computed Latents: falls back to the original, 12.9KiB way of storing latents (without the extra bits required for using the CVVP model). Disable the setting.
Author

I tried it before and it wasn't working for some reason. Anyway it's working now. Thanks! :)

I tried it before and it wasn't working for some reason. Anyway it's working now. Thanks! :)
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#80
No description provided.