Commit Graph

1668 Commits

Author SHA1 Message Date
James Betker
48e3ee9a5b Shuffle conditioning inputs along the positional axis to reduce fitting on prosody and other positional information
The mels should still retain some short-range positional information the model can use
for tone and frequencies, for example.
2021-12-20 19:05:56 -07:00
James Betker
53858b2055 Fix gpt_tts_hf inference 2021-12-20 17:45:26 -07:00
James Betker
712d746e9b gpt_tts: format conditioning inputs more for contextual voice clues and less for prosidy
also support single conditional inputs
2021-12-19 17:42:29 -07:00
James Betker
c813befd53 Remove dedicated positioning embeddings 2021-12-19 09:01:31 -07:00
James Betker
b4ddcd7111 More inference improvements 2021-12-19 09:01:19 -07:00
James Betker
f9c45d70f0 Fix mel terminator 2021-12-18 17:18:06 -07:00
James Betker
937045cb63 Fixes 2021-12-18 16:45:38 -07:00
James Betker
9b9f7ea61b GptTtsHf: Make the input/target placement easier to reason about 2021-12-17 10:24:14 -07:00
James Betker
2fb4213a3e More lossy fixes 2021-12-17 10:01:42 -07:00
James Betker
dee34f096c Add use_gpt_tts script 2021-12-16 23:28:54 -07:00
James Betker
9e8a9bf6ca Various fixes to gpt_tts_hf 2021-12-16 23:28:44 -07:00
James Betker
62c8ed9a29 move speech utils 2021-12-16 20:47:37 -07:00
James Betker
e7957e4897 Make loss accumulator for logs accumulate better 2021-12-12 22:23:17 -07:00
James Betker
4f8c4d130c gpt_tts_hf: pad mel tokens with an <end_of_sequence> token. 2021-12-12 20:04:50 -07:00
James Betker
76f86c0e47 gaussian_diffusion: support fp16 2021-12-12 19:52:21 -07:00
James Betker
aa7cfd1edf Add support for mel norms across the channel dim 2021-12-12 19:52:08 -07:00
James Betker
8917c02a4d gpt_tts_hf inference first pass 2021-12-12 19:51:44 -07:00
James Betker
63bf135b93 Support norms 2021-12-11 08:30:49 -07:00
James Betker
959979086d fix 2021-12-11 08:18:00 -07:00
James Betker
5a664aa56e misc 2021-12-11 08:17:26 -07:00
James Betker
d610540ce5 mel norm computation script 2021-12-11 08:16:50 -07:00
James Betker
306274245b Also do dynamic range compression across mel 2021-12-10 20:06:24 -07:00
James Betker
faf55684b8 Use slaney norm in the mel filterbank computation 2021-12-10 20:04:52 -07:00
James Betker
b2d8fbcfc0 build a better speech synthesis toolset 2021-12-09 22:59:56 -07:00
James Betker
32cfcf3684 Turn off optimization in find_faulty_files 2021-12-09 09:02:09 -07:00
James Betker
a66a2bf91b Update find_faulty_files 2021-12-09 09:00:00 -07:00
James Betker
9191201f05 asd 2021-12-07 09:55:39 -07:00
James Betker
ef15a39841 fix gdi bug? 2021-12-07 09:53:48 -07:00
James Betker
6ccff3f49f Record codes more often 2021-12-07 09:22:45 -07:00
James Betker
d0b2f931bf Add feature to diffusion vocoder where the spectrogram conditioning layers can be re-trained apart from the rest of the model 2021-12-07 09:22:30 -07:00
James Betker
662920bde3 Log codes when simply fetching codebook_indices 2021-12-06 09:21:43 -07:00
James Betker
380a5d5475 gdi.. 2021-12-03 08:53:09 -07:00
James Betker
101a01f744 Fix dvae codes issue 2021-12-02 23:28:36 -07:00
James Betker
31fc693a8a dafsdf 2021-12-02 22:55:36 -07:00
James Betker
040d998922 maasd 2021-12-02 22:53:48 -07:00
James Betker
cc10e7e7e8 Add tsv loader 2021-12-02 22:43:07 -07:00
James Betker
702607556d nv_tacotron_dataset: allow it to load conditioning signals 2021-12-02 22:14:44 -07:00
James Betker
07b0124712 GptTtsHf! 2021-12-02 21:48:42 -07:00
James Betker
85542ec547 One last fix for gpt_asr_hf2 2021-12-02 21:19:28 -07:00
James Betker
68e9db12b5 Add interleaving and direct injectors 2021-12-02 21:04:49 -07:00
James Betker
04454ee63a Add evaluation logic for gpt_asr_hf2 2021-12-02 21:04:36 -07:00
James Betker
47fe032a3d Try to make diffusion validator more reproducible 2021-11-24 09:38:10 -07:00
James Betker
5956eb757c ffffff 2021-11-24 00:19:47 -07:00
James Betker
f1ed0588e3 another fix 2021-11-24 00:11:21 -07:00
James Betker
7a3c4a4fc6 Fix lr quantizer decode 2021-11-24 00:01:26 -07:00
James Betker
3f6ecfe0db q fix 2021-11-23 23:50:27 -07:00
James Betker
d9747fe623 Integrate with lr_quantizer 2021-11-23 19:48:22 -07:00
James Betker
82d0e7720e Add choke to lucidrains_dvae 2021-11-23 18:53:37 -07:00
James Betker
934395d4b8 A few fixes for gpt_asr_hf2 2021-11-23 09:29:29 -07:00
James Betker
3b5c3d85d8 Allow specification of wandb run name 2021-11-22 17:31:29 -07:00