Commit Graph

819 Commits

Author SHA1 Message Date
James Betker
0237e96b34 Fix dvae bug 2021-08-06 14:17:01 -06:00
James Betker
0799d95af5 Use quantizer from rosinality/vqvae with openai dvae 2021-08-06 14:06:26 -06:00
James Betker
d3ace153af Add logic for performing inference using gpt_tts with dual-encoder modes 2021-08-06 12:04:12 -06:00
James Betker
b43683b772 Add lucidrains_dvae 2021-08-06 12:03:46 -06:00
James Betker
70dcd1107f Fix byol_model_wrapper to function with audio inputs 2021-08-05 22:20:22 -06:00
James Betker
89d15c9e74 Move gpt-tts back to lucidrains implementation
Much better performance.
2021-08-05 22:15:13 -06:00
James Betker
d120e1aa99 Add audio augmentation to wavfile_dataset, utility to test audio similary 2021-08-05 22:14:49 -06:00
James Betker
c0f61a2e15 Rework how DVAE tokens are ordered
It might make more sense to have top tokens, then bottom tokens
with top tokens having different discretized values.
2021-08-05 07:07:17 -06:00
James Betker
4017236ba9 Fix up inference for gpt_tts 2021-08-05 06:46:30 -06:00
James Betker
5037220ac7 Mods to support contrastive learning on audio files 2021-08-05 05:57:04 -06:00
James Betker
341f28dd82 It works! 2021-08-04 20:07:51 -06:00
James Betker
36c7c1fbdb Fix training flow for NEXT TOKEN prediction instead of same token prediction
doh
2021-08-04 10:28:09 -06:00
James Betker
d9936df363 Add gpt_tts dataset and implement inference
- Adds a script which preprocesses quantized mels given a DVAE
- Adds a dataset which can consume preprocessed qmels
- Reworks GPT TTS to consume the outputs of that dataset (removes logic to add padding and start/end tokens)
- Adds inference to gpt_tts
2021-08-04 00:44:04 -06:00
James Betker
4c98b9703f Get dalle-style TTS to "work" 2021-08-03 21:08:27 -06:00
James Betker
2814307eee Alterations to support VQVAE on mel spectrograms 2021-08-01 07:54:21 -06:00
James Betker
0c9e75bc69 Improvements to GptTts 2021-07-31 15:57:57 -06:00
James Betker
31ee9ae262 Checkin 2021-07-30 23:07:35 -06:00
James Betker
dadc54795c Add gpt_tts 2021-07-27 20:33:30 -06:00
James Betker
398185e109 More work on wave-diffusion 2021-07-27 05:36:17 -06:00
James Betker
49e3b310ea Allow audio sample rate interpolation for faster training 2021-07-26 17:44:06 -06:00
James Betker
96e90e7047 Add support for a gaussian-diffusion-based wave tacotron 2021-07-26 16:27:31 -06:00
James Betker
97d7cbbc34 Additional work for audio xformer (which doesnt really do a great job) 2021-07-23 10:58:14 -06:00
James Betker
d81386c1be Mods to support vqvae in audio mode (1d) 2021-07-20 08:36:46 -06:00
James Betker
5584cfcc7a tacotron2 work 2021-07-14 21:41:57 -06:00
James Betker
fe0c699ced Various fixes 2021-07-14 00:08:42 -06:00
James Betker
be2745f42d Add waveglow & inference capabilities to audio generator 2021-07-08 23:07:36 -06:00
James Betker
1ff434218e tacotron2, ready for prime time! 2021-07-08 22:13:44 -06:00
James Betker
86fd3ad7fd Initial checkin of nvidia tacotron model & dataset
These two are tested, full support for training to come.
2021-07-06 11:11:35 -06:00
James Betker
afa41f1804 Allow hq color jittering and corruptions that are not included in the corruption factor 2021-06-30 09:44:46 -06:00
James Betker
6fd16ea9c8 Add meta-anomaly detection, colorjitter augmentation 2021-06-29 13:41:55 -06:00
James Betker
46e9f62be0 Add unet with latent guide
This is a diffusion network that uses both a LQ image
and a reference sample HQ image that is compressed into
a latent vector to perform upsampling

The hope is that we can steer the upsampling network
with sample images.
2021-06-26 11:02:58 -06:00
James Betker
0ded106562 Merge remote-tracking branch 'origin/master' 2021-06-25 13:16:28 -06:00
James Betker
a57ed8e960 Various mods to support better jpeg image filtering 2021-06-25 13:16:15 -06:00
James Betker
a0ef07ddb8
Create unet_latent_guide.py 2021-06-25 11:25:14 -06:00
James Betker
e7890dc0ba Misc fixes for diffusion nets 2021-06-21 10:38:07 -06:00
James Betker
65c474eecf Various changes to fix testing 2021-06-11 15:31:10 -06:00
James Betker
220f11a5e4 Half channel sizes in cifar_resnet 2021-06-09 17:06:37 -06:00
James Betker
9b5f4abb91 Add fade in for hard switch 2021-06-07 18:15:09 -06:00
James Betker
108c5d829c Fix dropout norm 2021-06-07 16:13:23 -06:00
James Betker
438217094c Also debug distribution of switch 2021-06-07 15:36:07 -06:00
James Betker
44b09e5f20 Amplify dropout rate 2021-06-07 15:20:53 -06:00
James Betker
f0d4eb9182 Fixor 2021-06-07 11:58:36 -06:00
James Betker
c456a60466 Another go at fixing nan 2021-06-07 11:51:43 -06:00
James Betker
1c574c5bd1 Attempt to fix nan 2021-06-07 11:43:42 -06:00
James Betker
eda796985b Try out dropout norm 2021-06-07 11:33:33 -06:00
James Betker
6c6e82406e Pass a corruption factor through the dataset into the upsampling network
The intuition is this will help guide the network to make better informed decisions
about how it performs upsampling based on how it perceives the underlying content.

(I'm giving up on letting networks detect their own quality - I'm not convinced it is
actually feasible)
2021-06-07 09:13:54 -06:00
James Betker
061dbcd458 Another fix to anorm 2021-06-06 15:09:49 -06:00
James Betker
9a6991e461 Fix switch norm average 2021-06-06 15:04:28 -06:00
James Betker
57e1a6a0f2 cifar: add hard routing
Also mods switched_routing to support non-pixular inputs
2021-06-06 14:53:43 -06:00
James Betker
692e9c417b Support diffusion unet 2021-06-06 13:57:22 -06:00
James Betker
a0158ebc69 Simplify cifar resnet further for faster training 2021-06-06 10:02:24 -06:00
James Betker
75567a9814 Only head norm removed 2021-06-05 23:29:11 -06:00
James Betker
65d0376b90 Re-add normalization at the tail of the RRDB 2021-06-05 23:04:05 -06:00
James Betker
184e887122 Remove rrdb normalization 2021-06-05 21:39:19 -06:00
James Betker
f5e75602b9 Add regular attention to cifar_resnet 2021-06-05 21:34:07 -06:00
James Betker
af52751d6b Fix device error 2021-06-05 14:21:32 -06:00
James Betker
5f0cc65f3b Register branched resnet properly 2021-06-05 14:19:03 -06:00
James Betker
fb405d9ef1 CIFAR stuff
- Extract coarse labels for the CIFAR dataset
- Add simple resnet that branches lower layers based on coarse labels
- Some other cleanup
2021-06-05 14:16:02 -06:00
James Betker
80d4404367 A few fixes:
- Output better prediction of xstart from eps
- Support LossAwareSampler
- Support AdamW
2021-06-05 13:40:32 -06:00
James Betker
7c251af7a8 Support cifar100 with resnet 2021-06-04 17:29:07 -06:00
James Betker
bf811f80c1 GD mods & fixes
- Report variational loss separately
- Report model prediction from injector
- Log these things
- Use respacing like guided diffusion
2021-06-04 17:13:16 -06:00
James Betker
6084915af8 Support gaussian diffusion models
Adds support for GD models, courtesy of some maths from openai.

Also:
- Fixes requirement for eval{} even when it isn't being used
- Adds support for denormalizing an imagenet norm
2021-06-02 21:47:32 -06:00
James Betker
f129eaa39e Clean up byol a bit
- Remove option to aug in dataset (there's really no reason for this now that kornia works on GPU on windows)
- Other stufff
2021-05-24 21:35:46 -06:00
James Betker
1a2b9fa130 Get rid of old byol net wrapping
Simplifies and makes this usable with DLAS' multi-gpu trainer
2021-04-27 12:48:34 -06:00
James Betker
119f17c808 Add testing capabilities for segformer & contrastive feature 2021-04-27 09:59:50 -06:00
James Betker
9bbe6fc81e Get segformer to a trainable state 2021-04-25 11:45:20 -06:00
James Betker
fc623d4b5a Add segformer model. Start work on BYOL adaptation that will support training it. 2021-04-23 17:16:46 -06:00
James Betker
17555e7d07 misc adjustments for stylegan 2021-04-21 18:14:17 -06:00
James Betker
b687ef4cd0 Misc 2021-04-21 18:09:46 -06:00
James Betker
9fc3df3f5b Switched conv: add conversion function with allowlist 2021-03-13 10:44:56 -07:00
James Betker
cf9a6da889 Fix some bugs, checkin work on vqvae3 2021-03-02 20:56:19 -07:00
James Betker
f89ea5f1c6 Mods to support lightweight_gan model 2021-03-02 20:51:48 -07:00
James Betker
39fd755baa New benchmark numbers 2021-02-08 08:09:41 -07:00
James Betker
784b96c059 Misc options to add support for training stylegan2-rosinality models:
- Allow image_folder_dataset to normalize inbound images
- ExtensibleTrainer can denormalize images on the output path
- Support .webp - an output from LSUN
- Support logistic GAN divergence loss
- Support stylegan2 TF weight extraction for discriminator
- New injector that produces latent noise (with separated paths)
- Modify FID evaluator to be operable with rosinality-style GANs
2021-02-08 08:09:21 -07:00
James Betker
e7be4bdff3 Revert 2021-02-05 08:43:07 -07:00
James Betker
6dec1f5968 Back to groupnorm 2021-02-05 08:42:11 -07:00
James Betker
336f807c8e lambda2 2021-02-05 00:00:24 -07:00
James Betker
025a5867c4 Use syncbatchnorm instead 2021-02-04 22:26:36 -07:00
James Betker
bb79fafb89 Fix groupnorm specification 2021-02-04 22:15:38 -07:00
James Betker
43da1f9c4b Convert lambda coupler to use groupnorm instead of batchnorm 2021-02-04 21:59:44 -07:00
James Betker
7070142805 Make vqvae3_hard more configurable 2021-02-04 09:03:22 -07:00
James Betker
b980028ca8 Add get_debug_values for vqvae_3_hardswitch 2021-02-03 14:12:24 -07:00
James Betker
1405ff06b8 Fix SwitchedConvHardRoutingFunction for current cuda router 2021-02-03 14:11:55 -07:00
James Betker
d7bec392dd ... 2021-02-02 23:50:25 -07:00
James Betker
b0a8fa00bc Visual dbg in vqvae3hs 2021-02-02 23:50:01 -07:00
James Betker
f5f91850fd hardswitch variant of vqvae3 2021-02-02 21:00:04 -07:00
James Betker
320edbaa3c Move switched_conv logic around a bit 2021-02-02 20:41:24 -07:00
James Betker
0dca36946f Hard Routing mods
- Turns out my custom convolution was RIDDLED with backwards bugs, which is
   why the existing implementation wasn't working so well.
- Implements the switch logic from both Mixture of Experts and Switch Transformers
  for testing purposes.
2021-02-02 20:35:58 -07:00
James Betker
29c1c3bede Register vqvae3 2021-01-29 15:26:28 -07:00
James Betker
bc20b4739e vqvae3
Changes VQVAE as so:
- Reverts back to smaller codebook
- Adds an additional conv layer at the highest resolution for both the encoder & decoder
- Uses LeakyReLU on trunk
2021-01-29 15:24:26 -07:00
James Betker
96bc80313c Add switch norm, up dropout rate, detach selector 2021-01-26 09:31:53 -07:00
James Betker
2cdac6bd09 Add PWCNet for human optical flow 2021-01-25 08:25:44 -07:00
James Betker
51b63b2aa6 Add switched_conv with hard routing and make vqvae use it. 2021-01-25 08:25:29 -07:00
James Betker
ae4ff4a1e7 Enable lambda visualization 2021-01-23 15:53:27 -07:00
James Betker
10ec6bda1d lambda nets in switched_conv and a vqvae to use it 2021-01-23 14:57:57 -07:00
James Betker
b374dcdd46 update vqvae to double codebook size for bottom quantizer 2021-01-23 13:47:07 -07:00
James Betker
1b8a26db93 New switched_conv 2021-01-23 13:46:30 -07:00
James Betker
d919ae7148 Add VQVAE with no Conv2dTranspose 2021-01-18 08:49:59 -07:00
James Betker
587a4f4050 resnet_unet_3
I'm being really lazy here - these nets are not really different from each other
except at which layer they terminate. This one terminates at 2x downsampling,
which is simply indicative of a direction I want to go for testing these pixpro networks.
2021-01-15 14:51:03 -07:00
James Betker
038b8654b6 Pixpro: unwrap losses 2021-01-13 11:54:25 -07:00