I implemented BigVGAN over here using your fork as a base. It's way better. Go implement it. Credit would be appreciated. Just throw this model file in models/tortoise.
https://disk.yandex.com/d/fOjzTs8HQiFVdg
https://github.com/deviandice/tortoise-tts-BigVGAN
deviandice
changed title from BigVGAN to Implement BigVGAN2023-03-03 04:22:35 +07:00
Naisu, I'll play around with it whenever I get a chance to (probably tomorrow evening).
If you don't mind, to make my life a little easier (and it'll retain credit to you in the commit history), can you fork the mrq/tortoise-tts repo, apply your changes to it, then do a pull request? It's not a big deal, I can probably figure out what to slap back in. Ah, it seems fairly simple it re-implement it. I'll play around with it in a separate branch then merge it.
Naisu, I'll play around with it whenever I get a chance to (probably tomorrow evening).
~~If you don't mind, to make my life a little easier (and it'll retain credit to you in the commit history), can you fork the mrq/tortoise-tts repo, apply your changes to it, then do a pull request? It's not a big deal, I can probably figure out what to slap back in.~~ Ah, it seems fairly simple it re-implement it. I'll play around with it in a separate branch then merge it.
Very nice, implemented in mrq/tortoise-tts commit aca32a71f7, and added a toggle (default enabled) in commit 740b5587df.
In some of my comparisons there's definitely a noticeable improvement, but in others it's slightly perceptible. For example, the treble isn't so bad with it enabled (but you really have to tune your ears to it):
I know it's not a giant improvement, but it's another nice QoL uplift.
Very nice, implemented in mrq/tortoise-tts commit https://git.ecker.tech/mrq/tortoise-tts/commit/aca32a71f798ebd8487c113d41d1b4e9ee15c315, and added a toggle (default enabled) in commit 740b5587df13f205f02a84113d29f67f3b9a2219.
In some of my comparisons there's definitely a noticeable improvement, but in others it's slightly perceptible. For example, the treble isn't so bad with it enabled (but you really have to tune your ears to it):
* with BigVGAN: https://files.catbox.moe/e53ngv.wav
* without BigVGAN: https://files.catbox.moe/81nxcr.wav
I know it's not a giant improvement, but it's another nice QoL uplift.
I implemented BigVGAN over here using your fork as a base. It's way better. Go implement it. Credit would be appreciated. Just throw this model file in models/tortoise.
https://disk.yandex.com/d/fOjzTs8HQiFVdg
https://github.com/deviandice/tortoise-tts-BigVGAN
BigVGANto Implement BigVGANNaisu, I'll play around with it whenever I get a chance to (probably tomorrow evening).
If you don't mind, to make my life a little easier (and it'll retain credit to you in the commit history), can you fork the mrq/tortoise-tts repo, apply your changes to it, then do a pull request? It's not a big deal, I can probably figure out what to slap back in.Ah, it seems fairly simple it re-implement it. I'll play around with it in a separate branch then merge it.Very nice, implemented in mrq/tortoise-tts commit
aca32a71f7
, and added a toggle (default enabled) in commit740b5587df
.In some of my comparisons there's definitely a noticeable improvement, but in others it's slightly perceptible. For example, the treble isn't so bad with it enabled (but you really have to tune your ears to it):
I know it's not a giant improvement, but it's another nice QoL uplift.
Thanks for implementing this so quickly, and its pretty neato that it's having a noticeable effect.