1
1
forked from mrq/tortoise-tts
Commit Graph

16 Commits

Author SHA1 Message Date
mrq
1eb92a1236 QoL fixes 2023-02-02 21:13:28 +00:00
Kian-Meng Ang
49bbdd597e Fix typos
Found via `codespell -S *.json -L splitted,nd,ser,broadcat`
2023-01-06 11:04:36 +08:00
Johan Nordberg
0ca4d8f291 Revive CVVP model 2022-05-25 10:22:50 +00:00
James Betker
e0be49f02f Fix bug 2022-05-22 05:50:26 -06:00
James Betker
42a3bc9cfd Support combining voices in do_tts 2022-05-22 05:28:15 -06:00
James Betker
a1c131bde9 Merge remote-tracking branch 'origin/main'
# Conflicts:
#	tortoise/read.py
2022-05-19 10:34:54 -06:00
Johan Nordberg
00730d2786 Allow setting models path from environment variable 2022-05-19 21:02:09 +09:00
James Betker
8fdf516e62 Remove CVVP
After training a similar model for a different purpose, I realized that
this model is faulty: the contrastive loss it uses only pays attention
to high-frequency details which do not contribute meaningfully to
output quality. I validated this by comparing a no-CVVP output with
a baseline using tts-scores and found no differences.
2022-05-17 12:21:25 -06:00
James Betker
a1ae84c49d Add a way to get deterministic behavior from tortoise and add debug states for reporting 2022-05-17 12:11:18 -06:00
James Betker
fc7b308e3b Add support for multiple output candidates in do_tts. 2022-05-12 11:25:35 -06:00
James Betker
12acac6f77 Fix default output path 2022-05-02 21:37:39 -06:00
James Betker
5663e98904 misc fixes 2022-05-02 18:00:57 -06:00
James Betker
ee24d3ee4b Support totally random voices (and make fixes to previous changes) 2022-05-02 15:40:03 -06:00
James Betker
66805da4bd add support for specifying the model_dir 2022-05-01 17:29:25 -06:00
James Betker
01b783fc02 Add support for extracting and feeding conditioning latents directly into the model
- Adds a new script and API endpoints for doing this
- Reworks autoregressive and diffusion models so that the conditioning is computed separately (which will actually provide a mild performance boost)
- Updates README

This is untested. Need to do the following manual tests (and someday write unit tests for this behemoth before
it becomes a problem..)
1) Does get_conditioning_latents.py work?
2) Can I feed those latents back into the model by creating a new voice?
3) Can I still mix and match voices (both with conditioning latents and normal voices) with read.py?
2022-05-01 17:25:18 -06:00
James Betker
23a3d5d00b Move everything into the tortoise/ subdirectory
For eventual packaging.
2022-05-01 16:24:24 -06:00