Commit Graph

307 Commits

Author SHA1 Message Date
James Betker
6c6daa5795 Build a bigger, better tokenizer 2021-12-22 17:46:18 -07:00
James Betker
c737632eae Train and use a bespoke tokenizer 2021-12-22 15:06:14 -07:00
James Betker
a9629f7022 Try out using the GPT tokenizer rather than nv_tacotron
This results in a significant compression of the text domain, I'm curious what the
effect on speech quality will be.
2021-12-22 14:03:18 -07:00
James Betker
ced81a760b restore nv_tacotron 2021-12-22 13:48:53 -07:00
James Betker
7bf4f9f580 duplicate nvtacotron 2021-12-22 13:48:30 -07:00
James Betker
9e8a9bf6ca Various fixes to gpt_tts_hf 2021-12-16 23:28:44 -07:00
James Betker
31fc693a8a dafsdf 2021-12-02 22:55:36 -07:00
James Betker
040d998922 maasd 2021-12-02 22:53:48 -07:00
James Betker
cc10e7e7e8 Add tsv loader 2021-12-02 22:43:07 -07:00
James Betker
702607556d nv_tacotron_dataset: allow it to load conditioning signals 2021-12-02 22:14:44 -07:00
James Betker
0604060580 Finish up mods for next version of GptAsrHf 2021-11-20 21:33:49 -07:00
James Betker
9b3c3b1227 use sets instead of list ops 2021-11-07 20:45:57 -07:00
James Betker
722d3dbdc2 f 2021-11-07 18:52:05 -07:00
James Betker
18b1de9b2c Add exclusion_lists to unsupervised_audio_dataset 2021-11-07 18:46:47 -07:00
James Betker
fd14746bf8 badtimes 2021-11-03 00:33:38 -06:00
James Betker
2fa80486de tacotron_dataset: recover gracefully 2021-11-03 00:31:50 -06:00
James Betker
af51d00dee Load wav files from voxpopuli instead of oggs 2021-11-02 09:32:26 -06:00
James Betker
f7d0901ce6 Decouple MEL from nv_tacotron_dataset 2021-10-31 15:01:38 -06:00
James Betker
b8b268b5f6 Misc 2021-10-31 14:29:23 -06:00
James Betker
579f0a70ee Move UnsupervisedAudioDataset to use my new mp3 loader 2021-10-28 22:33:12 -06:00
James Betker
5d714bc566 Add deepspeech model and support for decoding with it 2021-10-27 13:09:46 -06:00
James Betker
21b6daa0ed Introduce clip resampling 2021-10-26 10:42:23 -06:00
James Betker
c3421b7f6d Dataset work for audio quality processor 2021-10-24 09:09:34 -06:00
James Betker
06ea6191a9 Initial implementation of audio_with_noise dataset 2021-10-21 16:45:19 -06:00
James Betker
d016a2fbad Go back to vanilla flavor of diffusion 2021-10-17 17:32:46 -06:00
James Betker
6833048bf7 Alterations to diffusion_dvae so it can be used directly on spectrograms 2021-09-23 15:56:25 -06:00
James Betker
359e9e27a7 unsupervised_audio_dataset: try to recover from failures of audio2numpy 2021-09-17 15:25:57 -06:00
James Betker
f78ce9d924 Get diffusion_dvae ready for prime time! 2021-09-16 22:43:10 -06:00
James Betker
1197ae1928 Misc 2021-09-16 10:53:56 -06:00
James Betker
8d9857f33d More fixes 2021-09-14 20:45:05 -06:00
James Betker
9a9c90660f Fixes 2021-09-14 18:29:17 -06:00
James Betker
e513052fca Add unsupervised_audio_dataset 2021-09-14 17:43:16 -06:00
James Betker
b8f2e0f452 mydvae 2021-09-06 17:45:30 -06:00
James Betker
30cd33fe44 another fix 2021-08-31 14:46:46 -06:00
James Betker
8810d3de97 fix wavfile_dataset 2021-08-31 14:45:29 -06:00
James Betker
dabd87246d Add unet_diffusion_vocoder 2021-08-31 14:38:33 -06:00
James Betker
570ed327ed Stop dataset - attempt #2 2021-08-18 18:29:38 -06:00
James Betker
8332923f5c Two more tools to test the audio segmentor 2021-08-17 09:09:11 -06:00
James Betker
93e903af15 Rework wavfile dataset to be usable for things other than augments 2021-08-16 22:52:35 -06:00
James Betker
d7f30232c3 Oh yeah 2021-08-16 22:52:15 -06:00
James Betker
4c01d82265 Fix for voxpopuli 2021-08-16 22:52:05 -06:00
James Betker
1fede41b7b Audio segmentor 2021-08-16 22:51:53 -06:00
James Betker
2d3372054d Add support for voxpopuli to nv_tacotron_dataset 2021-08-16 17:13:40 -06:00
James Betker
3580c52eac Fix up wavfile_dataset to be able to provide a full clip 2021-08-15 20:53:26 -06:00
James Betker
a523c4f932 Auto-normalize wav files by data type 2021-08-15 09:09:51 -06:00
James Betker
c28f657ab8 Allow usage of pre-rendered mels saved to npy files 2021-08-14 23:38:15 -06:00
James Betker
ad3391bd96 Fix nan issue when interpolating audio 2021-08-14 20:42:01 -06:00
James Betker
769f0acc53 Moar fix 2021-08-14 17:23:15 -06:00
James Betker
3d2e724083 Fix audio ranging problem 2021-08-14 17:18:55 -06:00
James Betker
d6a73acaed Allow processing of multiple audio sources at once from nv_tacotron_dataset 2021-08-14 16:04:05 -06:00
James Betker
007976082b GPT_asr for inference 2021-08-14 14:37:17 -06:00
James Betker
72622b4d61 Allow saving mel strips as files from the dataset implementation 2021-08-13 22:46:41 -06:00
James Betker
cfd284f425 Fix up some stuff that allows the MEL to be computed on-GPU 2021-08-13 18:35:55 -06:00
James Betker
fff1a59e08 max/min mel invalid fix 2021-08-13 09:36:31 -06:00
James Betker
4b2946e581 More fix 2021-08-12 15:51:23 -06:00
James Betker
4c76257c71 Dont require collation for nv_tacotron 2021-08-12 15:44:55 -06:00
James Betker
5b07d3b623 Found error that I was trying to fix with reload=True 2021-08-12 15:22:34 -06:00
James Betker
430b650a34 ...... 2021-08-12 10:31:10 -06:00
James Betker
b35d6ae028 Print some metrics from tacotron dataset when it croaks 2021-08-12 09:21:12 -06:00
James Betker
0c4d6b1916 Just offer generic re-load for nv-tacotron 2021-08-12 09:09:12 -06:00
James Betker
154f5aa73c Fix annoying warning and add to requirements 2021-08-11 17:32:06 -06:00
James Betker
f04a7bdf63 Bug fixes for tacotron dataset on mozilla cv
- Support a max mel length (mozilla cv has some tracks that are basically unbounded..)
- Don't fail on low sample rates (mozilla cv has some of those)
2021-08-11 16:17:03 -06:00
James Betker
2d3f0cc33c nv_tacotron_dataset - Allow training on mozilla cv 2021-08-11 13:34:31 -06:00
James Betker
d0c74278bf Enable multiple wavfile paths to be specified, fix eps bug in mp3 splitter 2021-08-11 08:46:02 -06:00
James Betker
e19c00398e More improvements to random_mp3_splitter 2021-08-09 21:31:12 -06:00
James Betker
74342b860b Revert "Undo forced text padding"
This reverts commit 83ab5e6a00.
2021-08-09 11:56:34 -06:00
James Betker
d4e33bf15f Fixes to the mp3 splitter 2021-08-09 11:55:46 -06:00
James Betker
4100469902 Add a tool to split mp3 files into arbitrary chunks of wav files 2021-08-08 23:23:13 -06:00
James Betker
83ab5e6a00 Undo forced text padding 2021-08-08 11:42:20 -06:00
James Betker
690d7e86d3 Fix nv_tacotron_dataset bug which incorrectly mapped filenames
dammit..
2021-08-08 11:38:52 -06:00
James Betker
a2afb25e42 Fix inference, always flow full text tokens through transformer 2021-08-07 20:11:10 -06:00
James Betker
b43683b772 Add lucidrains_dvae 2021-08-06 12:03:46 -06:00
James Betker
62c7570512 Constrain wav_aug a bit more 2021-08-06 08:19:38 -06:00
James Betker
f126040da2 Undo noise first 2021-08-05 23:24:38 -06:00
James Betker
908ef5495f Add noise first to audio_aug 2021-08-05 23:22:44 -06:00
James Betker
d6007c6de1 dataset fixes 2021-08-05 23:12:59 -06:00
James Betker
d120e1aa99 Add audio augmentation to wavfile_dataset, utility to test audio similary 2021-08-05 22:14:49 -06:00
James Betker
4017236ba9 Fix up inference for gpt_tts 2021-08-05 06:46:30 -06:00
James Betker
5037220ac7 Mods to support contrastive learning on audio files 2021-08-05 05:57:04 -06:00
James Betker
341f28dd82 It works! 2021-08-04 20:07:51 -06:00
James Betker
d9936df363 Add gpt_tts dataset and implement inference
- Adds a script which preprocesses quantized mels given a DVAE
- Adds a dataset which can consume preprocessed qmels
- Reworks GPT TTS to consume the outputs of that dataset (removes logic to add padding and start/end tokens)
- Adds inference to gpt_tts
2021-08-04 00:44:04 -06:00
James Betker
dadc54795c Add gpt_tts 2021-07-27 20:33:30 -06:00
James Betker
49e3b310ea Allow audio sample rate interpolation for faster training 2021-07-26 17:44:06 -06:00
James Betker
96e90e7047 Add support for a gaussian-diffusion-based wave tacotron 2021-07-26 16:27:31 -06:00
James Betker
d81386c1be Mods to support vqvae in audio mode (1d) 2021-07-20 08:36:46 -06:00
James Betker
1ff434218e tacotron2, ready for prime time! 2021-07-08 22:13:44 -06:00
James Betker
86fd3ad7fd Initial checkin of nvidia tacotron model & dataset
These two are tested, full support for training to come.
2021-07-06 11:11:35 -06:00
James Betker
afa41f1804 Allow hq color jittering and corruptions that are not included in the corruption factor 2021-06-30 09:44:46 -06:00
James Betker
6fd16ea9c8 Add meta-anomaly detection, colorjitter augmentation 2021-06-29 13:41:55 -06:00
James Betker
46e9f62be0 Add unet with latent guide
This is a diffusion network that uses both a LQ image
and a reference sample HQ image that is compressed into
a latent vector to perform upsampling

The hope is that we can steer the upsampling network
with sample images.
2021-06-26 11:02:58 -06:00
James Betker
0ded106562 Merge remote-tracking branch 'origin/master' 2021-06-25 13:16:28 -06:00
James Betker
a57ed8e960 Various mods to support better jpeg image filtering 2021-06-25 13:16:15 -06:00
James Betker
61e7ca39cd
Update image_folder_dataset.py 2021-06-25 11:48:31 -06:00
James Betker
6b32c87dcb Try to make diffusion fid more deterministic 2021-06-14 09:27:43 -06:00
James Betker
65c474eecf Various changes to fix testing 2021-06-11 15:31:10 -06:00
James Betker
6c6e82406e Pass a corruption factor through the dataset into the upsampling network
The intuition is this will help guide the network to make better informed decisions
about how it performs upsampling based on how it perceives the underlying content.

(I'm giving up on letting networks detect their own quality - I'm not convinced it is
actually feasible)
2021-06-07 09:13:54 -06:00
James Betker
fb405d9ef1 CIFAR stuff
- Extract coarse labels for the CIFAR dataset
- Add simple resnet that branches lower layers based on coarse labels
- Some other cleanup
2021-06-05 14:16:02 -06:00
James Betker
e6c537824a Allow validation for ce 2021-06-04 21:21:04 -06:00
James Betker
7c251af7a8 Support cifar100 with resnet 2021-06-04 17:29:07 -06:00
James Betker
6084915af8 Support gaussian diffusion models
Adds support for GD models, courtesy of some maths from openai.

Also:
- Fixes requirement for eval{} even when it isn't being used
- Adds support for denormalizing an imagenet norm
2021-06-02 21:47:32 -06:00