James Betker
0237e96b34
Fix dvae bug
2021-08-06 14:17:01 -06:00
James Betker
0799d95af5
Use quantizer from rosinality/vqvae with openai dvae
2021-08-06 14:06:26 -06:00
James Betker
d3ace153af
Add logic for performing inference using gpt_tts with dual-encoder modes
2021-08-06 12:04:12 -06:00
James Betker
b43683b772
Add lucidrains_dvae
2021-08-06 12:03:46 -06:00
James Betker
62c7570512
Constrain wav_aug a bit more
2021-08-06 08:19:38 -06:00
James Betker
f126040da2
Undo noise first
2021-08-05 23:24:38 -06:00
James Betker
908ef5495f
Add noise first to audio_aug
2021-08-05 23:22:44 -06:00
James Betker
d6007c6de1
dataset fixes
2021-08-05 23:12:59 -06:00
James Betker
3ca51e80b2
Only fix weird path bug in windows
2021-08-05 22:21:25 -06:00
James Betker
70dcd1107f
Fix byol_model_wrapper to function with audio inputs
2021-08-05 22:20:22 -06:00
James Betker
f86df53ce0
Export extract_byol_model as a function
2021-08-05 22:15:26 -06:00
James Betker
89d15c9e74
Move gpt-tts back to lucidrains implementation
...
Much better performance.
2021-08-05 22:15:13 -06:00
James Betker
d120e1aa99
Add audio augmentation to wavfile_dataset, utility to test audio similary
2021-08-05 22:14:49 -06:00
James Betker
c0f61a2e15
Rework how DVAE tokens are ordered
...
It might make more sense to have top tokens, then bottom tokens
with top tokens having different discretized values.
2021-08-05 07:07:17 -06:00
James Betker
4017236ba9
Fix up inference for gpt_tts
2021-08-05 06:46:30 -06:00
James Betker
5037220ac7
Mods to support contrastive learning on audio files
2021-08-05 05:57:04 -06:00
James Betker
341f28dd82
It works!
2021-08-04 20:07:51 -06:00
James Betker
36c7c1fbdb
Fix training flow for NEXT TOKEN prediction instead of same token prediction
...
doh
2021-08-04 10:28:09 -06:00
James Betker
d9936df363
Add gpt_tts dataset and implement inference
...
- Adds a script which preprocesses quantized mels given a DVAE
- Adds a dataset which can consume preprocessed qmels
- Reworks GPT TTS to consume the outputs of that dataset (removes logic to add padding and start/end tokens)
- Adds inference to gpt_tts
2021-08-04 00:44:04 -06:00
James Betker
4c98b9703f
Get dalle-style TTS to "work"
2021-08-03 21:08:27 -06:00
James Betker
2814307eee
Alterations to support VQVAE on mel spectrograms
2021-08-01 07:54:21 -06:00
James Betker
965f6e6b52
Fixes to weight_decay in adamw
2021-07-31 15:58:41 -06:00
James Betker
0c9e75bc69
Improvements to GptTts
2021-07-31 15:57:57 -06:00
James Betker
31ee9ae262
Checkin
2021-07-30 23:07:35 -06:00
James Betker
dadc54795c
Add gpt_tts
2021-07-27 20:33:30 -06:00
James Betker
398185e109
More work on wave-diffusion
2021-07-27 05:36:17 -06:00
James Betker
49e3b310ea
Allow audio sample rate interpolation for faster training
2021-07-26 17:44:06 -06:00
James Betker
96e90e7047
Add support for a gaussian-diffusion-based wave tacotron
2021-07-26 16:27:31 -06:00
James Betker
97d7cbbc34
Additional work for audio xformer (which doesnt really do a great job)
2021-07-23 10:58:14 -06:00
James Betker
2325e7a88c
Allow inference for vqvae
2021-07-20 10:40:05 -06:00
James Betker
d81386c1be
Mods to support vqvae in audio mode (1d)
2021-07-20 08:36:46 -06:00
James Betker
5584cfcc7a
tacotron2 work
2021-07-14 21:41:57 -06:00
James Betker
fe0c699ced
Various fixes
2021-07-14 00:08:42 -06:00
James Betker
be2745f42d
Add waveglow & inference capabilities to audio generator
2021-07-08 23:07:36 -06:00
James Betker
1ff434218e
tacotron2, ready for prime time!
2021-07-08 22:13:44 -06:00
James Betker
86fd3ad7fd
Initial checkin of nvidia tacotron model & dataset
...
These two are tested, full support for training to come.
2021-07-06 11:11:35 -06:00
James Betker
3801d5d55e
diffusion surfin'
2021-07-06 09:36:52 -06:00
James Betker
afa41f1804
Allow hq color jittering and corruptions that are not included in the corruption factor
2021-06-30 09:44:46 -06:00
James Betker
6fd16ea9c8
Add meta-anomaly detection, colorjitter augmentation
2021-06-29 13:41:55 -06:00
James Betker
46e9f62be0
Add unet with latent guide
...
This is a diffusion network that uses both a LQ image
and a reference sample HQ image that is compressed into
a latent vector to perform upsampling
The hope is that we can steer the upsampling network
with sample images.
2021-06-26 11:02:58 -06:00
James Betker
0ded106562
Merge remote-tracking branch 'origin/master'
2021-06-25 13:16:28 -06:00
James Betker
a57ed8e960
Various mods to support better jpeg image filtering
2021-06-25 13:16:15 -06:00
James Betker
61e7ca39cd
Update image_folder_dataset.py
2021-06-25 11:48:31 -06:00
James Betker
a0ef07ddb8
Create unet_latent_guide.py
2021-06-25 11:25:14 -06:00
James Betker
e7890dc0ba
Misc fixes for diffusion nets
2021-06-21 10:38:07 -06:00
James Betker
8e3a33e001
Fix a bug where non-rank-0 is computing FID before all images are saved.
2021-06-16 16:27:09 -06:00
James Betker
68cbbed886
Add some cool diffusion testing scripts
2021-06-16 16:26:36 -06:00
James Betker
ae8de0cb9d
fid saving images across all rank fix
2021-06-15 10:31:07 -06:00
James Betker
6a75bd0777
Another fix
2021-06-14 09:51:44 -06:00
James Betker
54bff35171
Fix issue where eval was not being used by all ddp processes
2021-06-14 09:50:04 -06:00