Implement BigVGAN #52

Closed
opened 2023-03-03 04:21:28 +07:00 by deviandice · 3 comments

I implemented BigVGAN over here using your fork as a base. It's way better. Go implement it. Credit would be appreciated. Just throw this model file in models/tortoise.
https://disk.yandex.com/d/fOjzTs8HQiFVdg
https://github.com/deviandice/tortoise-tts-BigVGAN

I implemented BigVGAN over here using your fork as a base. It's way better. Go implement it. Credit would be appreciated. Just throw this model file in models/tortoise. https://disk.yandex.com/d/fOjzTs8HQiFVdg https://github.com/deviandice/tortoise-tts-BigVGAN
deviandice changed title from BigVGAN to Implement BigVGAN 2023-03-03 04:22:35 +07:00

Naisu, I'll play around with it whenever I get a chance to (probably tomorrow evening).

If you don't mind, to make my life a little easier (and it'll retain credit to you in the commit history), can you fork the mrq/tortoise-tts repo, apply your changes to it, then do a pull request? It's not a big deal, I can probably figure out what to slap back in. Ah, it seems fairly simple it re-implement it. I'll play around with it in a separate branch then merge it.

Naisu, I'll play around with it whenever I get a chance to (probably tomorrow evening). ~~If you don't mind, to make my life a little easier (and it'll retain credit to you in the commit history), can you fork the mrq/tortoise-tts repo, apply your changes to it, then do a pull request? It's not a big deal, I can probably figure out what to slap back in.~~ Ah, it seems fairly simple it re-implement it. I'll play around with it in a separate branch then merge it.

Very nice, implemented in mrq/tortoise-tts commit aca32a71f7, and added a toggle (default enabled) in commit 740b5587df.

In some of my comparisons there's definitely a noticeable improvement, but in others it's slightly perceptible. For example, the treble isn't so bad with it enabled (but you really have to tune your ears to it):

I know it's not a giant improvement, but it's another nice QoL uplift.

Very nice, implemented in mrq/tortoise-tts commit https://git.ecker.tech/mrq/tortoise-tts/commit/aca32a71f798ebd8487c113d41d1b4e9ee15c315, and added a toggle (default enabled) in commit 740b5587df13f205f02a84113d29f67f3b9a2219. In some of my comparisons there's definitely a noticeable improvement, but in others it's slightly perceptible. For example, the treble isn't so bad with it enabled (but you really have to tune your ears to it): * with BigVGAN: https://files.catbox.moe/e53ngv.wav * without BigVGAN: https://files.catbox.moe/81nxcr.wav I know it's not a giant improvement, but it's another nice QoL uplift.
mrq closed this issue 2023-03-03 06:59:44 +07:00
mrq added the
enhancement
label 2023-03-03 07:00:06 +07:00

Thanks for implementing this so quickly, and its pretty neato that it's having a noticeable effect.

Thanks for implementing this so quickly, and its pretty neato that it's having a noticeable effect.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#52
There is no content yet.