Commit Graph

219 Commits

Author SHA1 Message Date
James Betker
48e3ee9a5b Shuffle conditioning inputs along the positional axis to reduce fitting on prosody and other positional information
The mels should still retain some short-range positional information the model can use
for tone and frequencies, for example.
2021-12-20 19:05:56 -07:00
James Betker
53858b2055 Fix gpt_tts_hf inference 2021-12-20 17:45:26 -07:00
James Betker
712d746e9b gpt_tts: format conditioning inputs more for contextual voice clues and less for prosidy
also support single conditional inputs
2021-12-19 17:42:29 -07:00
James Betker
c813befd53 Remove dedicated positioning embeddings 2021-12-19 09:01:31 -07:00
James Betker
b4ddcd7111 More inference improvements 2021-12-19 09:01:19 -07:00
James Betker
f9c45d70f0 Fix mel terminator 2021-12-18 17:18:06 -07:00
James Betker
937045cb63 Fixes 2021-12-18 16:45:38 -07:00
James Betker
9b9f7ea61b GptTtsHf: Make the input/target placement easier to reason about 2021-12-17 10:24:14 -07:00
James Betker
2fb4213a3e More lossy fixes 2021-12-17 10:01:42 -07:00
James Betker
9e8a9bf6ca Various fixes to gpt_tts_hf 2021-12-16 23:28:44 -07:00
James Betker
62c8ed9a29 move speech utils 2021-12-16 20:47:37 -07:00
James Betker
4f8c4d130c gpt_tts_hf: pad mel tokens with an <end_of_sequence> token. 2021-12-12 20:04:50 -07:00
James Betker
8917c02a4d gpt_tts_hf inference first pass 2021-12-12 19:51:44 -07:00
James Betker
5a664aa56e misc 2021-12-11 08:17:26 -07:00
James Betker
6ccff3f49f Record codes more often 2021-12-07 09:22:45 -07:00
James Betker
d0b2f931bf Add feature to diffusion vocoder where the spectrogram conditioning layers can be re-trained apart from the rest of the model 2021-12-07 09:22:30 -07:00
James Betker
662920bde3 Log codes when simply fetching codebook_indices 2021-12-06 09:21:43 -07:00
James Betker
380a5d5475 gdi.. 2021-12-03 08:53:09 -07:00
James Betker
101a01f744 Fix dvae codes issue 2021-12-02 23:28:36 -07:00
James Betker
07b0124712 GptTtsHf! 2021-12-02 21:48:42 -07:00
James Betker
85542ec547 One last fix for gpt_asr_hf2 2021-12-02 21:19:28 -07:00
James Betker
04454ee63a Add evaluation logic for gpt_asr_hf2 2021-12-02 21:04:36 -07:00
James Betker
5956eb757c ffffff 2021-11-24 00:19:47 -07:00
James Betker
f1ed0588e3 another fix 2021-11-24 00:11:21 -07:00
James Betker
7a3c4a4fc6 Fix lr quantizer decode 2021-11-24 00:01:26 -07:00
James Betker
3f6ecfe0db q fix 2021-11-23 23:50:27 -07:00
James Betker
d9747fe623 Integrate with lr_quantizer 2021-11-23 19:48:22 -07:00
James Betker
82d0e7720e Add choke to lucidrains_dvae 2021-11-23 18:53:37 -07:00
James Betker
934395d4b8 A few fixes for gpt_asr_hf2 2021-11-23 09:29:29 -07:00
James Betker
01e635168b whoops 2021-11-22 17:24:13 -07:00
James Betker
973f47c525 misc nonfunctional 2021-11-22 17:16:39 -07:00
James Betker
3125ca38f5 Further wandb logs 2021-11-22 16:40:19 -07:00
James Betker
0604060580 Finish up mods for next version of GptAsrHf 2021-11-20 21:33:49 -07:00
James Betker
14f3155ec4 misc 2021-11-20 17:45:14 -07:00
James Betker
555b7e52ad Add rev2 of GptAsrHf 2021-11-18 20:02:24 -07:00
James Betker
1287915f3c Fix dvae test failure 2021-11-18 00:58:36 -07:00
James Betker
019acfa4c5 Allow flat dvae 2021-11-18 00:53:42 -07:00
James Betker
f3db41f125 Fix code logging 2021-11-18 00:34:37 -07:00
James Betker
c584320cf3 Fix gpt_asr_hf distillation 2021-11-07 21:53:21 -07:00
James Betker
a367ea3fda Add script for computing attention for gpt_asr 2021-11-07 18:42:06 -07:00
James Betker
756b4dad09 Working gpt_asr_hf inference - and it's a beast! 2021-11-06 21:47:15 -06:00
James Betker
596a62fe01 Apply fix to gpt_asr_hf and prep it for inference
Fix is that we were predicting two characters in advance, not next character
2021-11-04 10:09:24 -06:00
James Betker
993bd52d42 Add spec_augment injector 2021-11-01 18:43:11 -06:00
James Betker
4cff774b0e Reduce complexity of the encoder for gpt_asr_hf 2021-11-01 17:02:28 -06:00
James Betker
da55ca0438 gpt_asr using the huggingfaces transformer 2021-11-01 17:00:22 -06:00
James Betker
83cccef9d8 Condition on full signal 2021-10-30 19:58:34 -06:00
James Betker
df45a9dec2 Fix inference mode for lucidrains_gpt 2021-10-30 16:59:18 -06:00
James Betker
92fe8b4dd9 ffffpt2 2021-10-29 17:29:49 -06:00
James Betker
95ca88efce Fix feedforward 2021-10-29 17:27:51 -06:00
James Betker
b476516340 Check in backing changes (which may have broken something?) 2021-10-29 17:22:33 -06:00
James Betker
986fc9628d Check in GPT with new inference methods (but not the backing code..) 2021-10-29 17:21:40 -06:00
James Betker
58494b0888 Add support for distilling gpt_asr 2021-10-27 13:10:07 -06:00
James Betker
3a9d1c53ea Rework conditioning inputs provided 2021-10-26 10:46:33 -06:00
James Betker
43e389aac6 Add time_embed_dim_multiplier 2021-10-26 08:55:55 -06:00
James Betker
ba6e46c02a Further simplify diffusion_vocoder and make noise_surfer work 2021-10-26 08:54:30 -06:00
James Betker
0ee1c67ce5 Rework how conditioning inputs are applied to DiffusionVocoder 2021-10-24 09:08:58 -06:00
James Betker
0dee15f875 base DVAE & vector_quantizer 2021-10-20 21:19:38 -06:00
James Betker
a6f0f854b9 Fix codes when inferring from dvae 2021-10-17 22:51:17 -06:00
James Betker
d016a2fbad Go back to vanilla flavor of diffusion 2021-10-17 17:32:46 -06:00
James Betker
23da073037 Norm decoder outputs now 2021-10-16 09:07:10 -06:00
James Betker
0edc98f6c4 Throw out the idea of conditioning on discrete codes. Oh well :( 2021-10-16 09:02:01 -06:00
James Betker
62c8c5d93e Zero out spectrogram code inputs initially. 2021-10-15 12:10:11 -06:00
James Betker
1d0b44ebc2 More tweaks to diffusion-vocoder 2021-10-15 11:51:17 -06:00
James Betker
3b19581f9a Allow num_resblocks to specified per-level 2021-10-14 11:26:04 -06:00
James Betker
83798887a8 Mods to support unet diffusion vocoder with conditioning 2021-10-13 21:23:18 -06:00
James Betker
33120cb35c Add norming to discretization_loss 2021-10-06 17:10:50 -06:00
James Betker
f2977d360c Allow attention_dim in channel attention to be specified, add converter 2021-10-05 17:29:38 -06:00
James Betker
9c0d7288ea Discretization loss attempt 2021-10-04 20:59:21 -06:00
James Betker
66f99a159c Rev2 2021-10-03 15:20:50 -06:00
James Betker
09f373e3b1 Add dvae with channel attention 2021-10-03 10:52:01 -06:00
James Betker
0396a9d2ca Increase baseline codes recording across all dvae models 2021-09-30 08:09:07 -06:00
James Betker
6e550edfe3 Attentive dvae 2021-09-29 14:17:29 -06:00
James Betker
6833048bf7 Alterations to diffusion_dvae so it can be used directly on spectrograms 2021-09-23 15:56:25 -06:00
James Betker
a6544f1684 More checkpointing fixes 2021-09-16 23:12:43 -06:00
James Betker
f78ce9d924 Get diffusion_dvae ready for prime time! 2021-09-16 22:43:10 -06:00
James Betker
0382660159 Get diffusion_dvae functional 2021-09-14 17:43:31 -06:00
James Betker
b8f2e0f452 mydvae 2021-09-06 17:45:30 -06:00
James Betker
dabd87246d Add unet_diffusion_vocoder 2021-08-31 14:38:33 -06:00
James Betker
909754cc27 Add find_faulty_files.py 2021-08-25 18:00:43 -06:00
James Betker
08b33c8e3a Support silu activation 2021-08-25 09:03:14 -06:00
James Betker
67bf7f5219 dvae mods
Trying to squeeze as much performance out of this net as possible
2021-08-25 08:55:13 -06:00
James Betker
b521d94b01 Make gpt-asr more configurable 2021-08-19 16:33:41 -06:00
James Betker
570ed327ed Stop dataset - attempt #2 2021-08-18 18:29:38 -06:00
James Betker
17453ccbe8 Revert mods to lrdvae
They didn't really change anything
2021-08-17 09:09:29 -06:00
James Betker
8332923f5c Two more tools to test the audio segmentor 2021-08-17 09:09:11 -06:00
James Betker
1fede41b7b Audio segmentor 2021-08-16 22:51:53 -06:00
James Betker
729c1fd5a9 Fix up max lengths to save memory 2021-08-15 21:29:28 -06:00
James Betker
9e47e64d5a Add gpt_segmentor model
The idea is to specifically train a model that extracts phrases from
audio clips.
2021-08-15 21:23:07 -06:00
James Betker
a826d5f658 Mods to dvae
- Add resblock to each layer
- Increase filter size for each layer
- Use SiLU
2021-08-15 20:54:10 -06:00
James Betker
b8bec22f1a Fix gpt_asr inference bug 2021-08-15 20:53:42 -06:00
James Betker
98057b6516 Make lrdvae use quantized mode in eval() 2021-08-14 23:43:01 -06:00
James Betker
007976082b GPT_asr for inference 2021-08-14 14:37:17 -06:00
James Betker
e1bdd3f7c7 Fix gpt_asr bug. Initial implementation of beam search 2021-08-13 22:47:00 -06:00
James Betker
cdee31c60b GPT_ASR 2021-08-13 15:02:18 -06:00
James Betker
20586a8edc Fix LRDVAE bug with quantizer integration 2021-08-11 16:17:22 -06:00
James Betker
82fc69abfa Add "pure" evaluator
Which simply computes the training loss against an eval dataset
2021-08-09 14:58:35 -06:00
James Betker
080bea2f19 No, really 2021-08-09 12:02:31 -06:00
James Betker
e1ce4671e4 Apply dropout to gpt_tts, get rid of min_gpt implementation 2021-08-09 12:01:10 -06:00
James Betker
1068f53b78 Add a sampling beam search 2021-08-09 11:56:06 -06:00
James Betker
01cfae28d8 Beam search implementation in one pass? Dayyyum 2021-08-08 23:22:42 -06:00
James Betker
690d7e86d3 Fix nv_tacotron_dataset bug which incorrectly mapped filenames
dammit..
2021-08-08 11:38:52 -06:00
James Betker
a2afb25e42 Fix inference, always flow full text tokens through transformer 2021-08-07 20:11:10 -06:00
James Betker
4c678172d6 ugh 2021-08-06 22:10:18 -06:00
James Betker
e723137273 Make gpttts more configurable 2021-08-06 22:08:51 -06:00
James Betker
a7496b661c combined dvae ftw 2021-08-06 22:01:06 -06:00
James Betker
0237e96b34 Fix dvae bug 2021-08-06 14:17:01 -06:00
James Betker
0799d95af5 Use quantizer from rosinality/vqvae with openai dvae 2021-08-06 14:06:26 -06:00
James Betker
d3ace153af Add logic for performing inference using gpt_tts with dual-encoder modes 2021-08-06 12:04:12 -06:00
James Betker
b43683b772 Add lucidrains_dvae 2021-08-06 12:03:46 -06:00
James Betker
89d15c9e74 Move gpt-tts back to lucidrains implementation
Much better performance.
2021-08-05 22:15:13 -06:00
James Betker
c0f61a2e15 Rework how DVAE tokens are ordered
It might make more sense to have top tokens, then bottom tokens
with top tokens having different discretized values.
2021-08-05 07:07:17 -06:00
James Betker
4017236ba9 Fix up inference for gpt_tts 2021-08-05 06:46:30 -06:00
James Betker
341f28dd82 It works! 2021-08-04 20:07:51 -06:00
James Betker
36c7c1fbdb Fix training flow for NEXT TOKEN prediction instead of same token prediction
doh
2021-08-04 10:28:09 -06:00
James Betker
d9936df363 Add gpt_tts dataset and implement inference
- Adds a script which preprocesses quantized mels given a DVAE
- Adds a dataset which can consume preprocessed qmels
- Reworks GPT TTS to consume the outputs of that dataset (removes logic to add padding and start/end tokens)
- Adds inference to gpt_tts
2021-08-04 00:44:04 -06:00
James Betker
4c98b9703f Get dalle-style TTS to "work" 2021-08-03 21:08:27 -06:00
James Betker
0c9e75bc69 Improvements to GptTts 2021-07-31 15:57:57 -06:00
James Betker
31ee9ae262 Checkin 2021-07-30 23:07:35 -06:00
James Betker
dadc54795c Add gpt_tts 2021-07-27 20:33:30 -06:00