James Betker
70dcd1107f
Fix byol_model_wrapper to function with audio inputs
2021-08-05 22:20:22 -06:00
James Betker
89d15c9e74
Move gpt-tts back to lucidrains implementation
...
Much better performance.
2021-08-05 22:15:13 -06:00
James Betker
d120e1aa99
Add audio augmentation to wavfile_dataset, utility to test audio similary
2021-08-05 22:14:49 -06:00
James Betker
c0f61a2e15
Rework how DVAE tokens are ordered
...
It might make more sense to have top tokens, then bottom tokens
with top tokens having different discretized values.
2021-08-05 07:07:17 -06:00
James Betker
4017236ba9
Fix up inference for gpt_tts
2021-08-05 06:46:30 -06:00
James Betker
5037220ac7
Mods to support contrastive learning on audio files
2021-08-05 05:57:04 -06:00
James Betker
341f28dd82
It works!
2021-08-04 20:07:51 -06:00
James Betker
36c7c1fbdb
Fix training flow for NEXT TOKEN prediction instead of same token prediction
...
doh
2021-08-04 10:28:09 -06:00
James Betker
d9936df363
Add gpt_tts dataset and implement inference
...
- Adds a script which preprocesses quantized mels given a DVAE
- Adds a dataset which can consume preprocessed qmels
- Reworks GPT TTS to consume the outputs of that dataset (removes logic to add padding and start/end tokens)
- Adds inference to gpt_tts
2021-08-04 00:44:04 -06:00
James Betker
4c98b9703f
Get dalle-style TTS to "work"
2021-08-03 21:08:27 -06:00
James Betker
2814307eee
Alterations to support VQVAE on mel spectrograms
2021-08-01 07:54:21 -06:00
James Betker
0c9e75bc69
Improvements to GptTts
2021-07-31 15:57:57 -06:00
James Betker
31ee9ae262
Checkin
2021-07-30 23:07:35 -06:00
James Betker
dadc54795c
Add gpt_tts
2021-07-27 20:33:30 -06:00
James Betker
398185e109
More work on wave-diffusion
2021-07-27 05:36:17 -06:00
James Betker
49e3b310ea
Allow audio sample rate interpolation for faster training
2021-07-26 17:44:06 -06:00
James Betker
96e90e7047
Add support for a gaussian-diffusion-based wave tacotron
2021-07-26 16:27:31 -06:00
James Betker
97d7cbbc34
Additional work for audio xformer (which doesnt really do a great job)
2021-07-23 10:58:14 -06:00
James Betker
d81386c1be
Mods to support vqvae in audio mode (1d)
2021-07-20 08:36:46 -06:00
James Betker
5584cfcc7a
tacotron2 work
2021-07-14 21:41:57 -06:00
James Betker
fe0c699ced
Various fixes
2021-07-14 00:08:42 -06:00
James Betker
be2745f42d
Add waveglow & inference capabilities to audio generator
2021-07-08 23:07:36 -06:00
James Betker
1ff434218e
tacotron2, ready for prime time!
2021-07-08 22:13:44 -06:00
James Betker
86fd3ad7fd
Initial checkin of nvidia tacotron model & dataset
...
These two are tested, full support for training to come.
2021-07-06 11:11:35 -06:00
James Betker
afa41f1804
Allow hq color jittering and corruptions that are not included in the corruption factor
2021-06-30 09:44:46 -06:00
James Betker
6fd16ea9c8
Add meta-anomaly detection, colorjitter augmentation
2021-06-29 13:41:55 -06:00
James Betker
46e9f62be0
Add unet with latent guide
...
This is a diffusion network that uses both a LQ image
and a reference sample HQ image that is compressed into
a latent vector to perform upsampling
The hope is that we can steer the upsampling network
with sample images.
2021-06-26 11:02:58 -06:00
James Betker
0ded106562
Merge remote-tracking branch 'origin/master'
2021-06-25 13:16:28 -06:00
James Betker
a57ed8e960
Various mods to support better jpeg image filtering
2021-06-25 13:16:15 -06:00
James Betker
a0ef07ddb8
Create unet_latent_guide.py
2021-06-25 11:25:14 -06:00
James Betker
e7890dc0ba
Misc fixes for diffusion nets
2021-06-21 10:38:07 -06:00
James Betker
65c474eecf
Various changes to fix testing
2021-06-11 15:31:10 -06:00
James Betker
220f11a5e4
Half channel sizes in cifar_resnet
2021-06-09 17:06:37 -06:00
James Betker
9b5f4abb91
Add fade in for hard switch
2021-06-07 18:15:09 -06:00
James Betker
108c5d829c
Fix dropout norm
2021-06-07 16:13:23 -06:00
James Betker
438217094c
Also debug distribution of switch
2021-06-07 15:36:07 -06:00
James Betker
44b09e5f20
Amplify dropout rate
2021-06-07 15:20:53 -06:00
James Betker
f0d4eb9182
Fixor
2021-06-07 11:58:36 -06:00
James Betker
c456a60466
Another go at fixing nan
2021-06-07 11:51:43 -06:00
James Betker
1c574c5bd1
Attempt to fix nan
2021-06-07 11:43:42 -06:00
James Betker
eda796985b
Try out dropout norm
2021-06-07 11:33:33 -06:00
James Betker
6c6e82406e
Pass a corruption factor through the dataset into the upsampling network
...
The intuition is this will help guide the network to make better informed decisions
about how it performs upsampling based on how it perceives the underlying content.
(I'm giving up on letting networks detect their own quality - I'm not convinced it is
actually feasible)
2021-06-07 09:13:54 -06:00
James Betker
061dbcd458
Another fix to anorm
2021-06-06 15:09:49 -06:00
James Betker
9a6991e461
Fix switch norm average
2021-06-06 15:04:28 -06:00
James Betker
57e1a6a0f2
cifar: add hard routing
...
Also mods switched_routing to support non-pixular inputs
2021-06-06 14:53:43 -06:00
James Betker
692e9c417b
Support diffusion unet
2021-06-06 13:57:22 -06:00
James Betker
a0158ebc69
Simplify cifar resnet further for faster training
2021-06-06 10:02:24 -06:00
James Betker
75567a9814
Only head norm removed
2021-06-05 23:29:11 -06:00
James Betker
65d0376b90
Re-add normalization at the tail of the RRDB
2021-06-05 23:04:05 -06:00
James Betker
184e887122
Remove rrdb normalization
2021-06-05 21:39:19 -06:00