forked from mrq/tortoise-tts
8139afd0e5
After training a similar model for a different purpose, I realized that this model is faulty: the contrastive loss it uses only pays attention to high-frequency details which do not contribute meaningfully to output quality. I validated this by comparing a no-CVVP output with a baseline using tts-scores and found no differences. |
||
---|---|---|
.. | ||
__init__.py | ||
arch_util.py | ||
autoregressive.py | ||
classifier.py | ||
clvp.py | ||
diffusion_decoder.py | ||
random_latent_generator.py | ||
transformer.py | ||
vocoder.py | ||
xtransformers.py |