forked from mrq/tortoise-tts
8fdf516e62
After training a similar model for a different purpose, I realized that this model is faulty: the contrastive loss it uses only pays attention to high-frequency details which do not contribute meaningfully to output quality. I validated this by comparing a no-CVVP output with a baseline using tts-scores and found no differences. |
||
---|---|---|
.. | ||
data | ||
models | ||
utils | ||
__init__.py | ||
api.py | ||
do_tts.py | ||
eval.py | ||
get_conditioning_latents.py | ||
is_this_from_tortoise.py | ||
read.py |