Commit Graph

10 Commits

Author SHA1 Message Date
James Betker
736c2626ee build in character tokenizer 2021-12-25 15:21:01 -07:00
James Betker
52410fd9d9 256-bpe tokenizer 2021-12-25 08:52:08 -07:00
James Betker
ead2a74bf0 Add debug_failures flag 2021-12-23 16:12:16 -07:00
James Betker
e55d949855 GrandConjoinedDataset 2021-12-23 14:32:33 -07:00
James Betker
b9de8a8eda More fixes 2021-12-22 19:21:29 -07:00
James Betker
191e0130ee Another fix 2021-12-22 18:30:50 -07:00
James Betker
6c6daa5795 Build a bigger, better tokenizer 2021-12-22 17:46:18 -07:00
James Betker
c737632eae Train and use a bespoke tokenizer 2021-12-22 15:06:14 -07:00
James Betker
a9629f7022 Try out using the GPT tokenizer rather than nv_tacotron
This results in a significant compression of the text domain, I'm curious what the
effect on speech quality will be.
2021-12-22 14:03:18 -07:00
James Betker
7bf4f9f580 duplicate nvtacotron 2021-12-22 13:48:30 -07:00