Commit Graph

110 Commits

Author SHA1 Message Date
James Betker
7929fd89de Refactor audio-style models into the audio folder 2022-03-15 11:06:25 -06:00
James Betker
08599b4c75 fix random_audio_crop injector 2022-03-12 20:42:29 -07:00
James Betker
d1dc8dbb35 Support tts9 2022-03-05 20:14:36 -07:00
James Betker
f87e10ffef Make deterministic sampler work with distributed training & microbatches 2022-03-04 11:50:50 -07:00
James Betker
2d1cb83c1d Add a deterministic timestep sampler, with provisions to employ it every n steps 2022-03-04 10:40:14 -07:00
James Betker
db0c3340ac Implement guidance-free diffusion in eval
And a few other fixes
2022-03-01 11:49:36 -07:00
James Betker
de1a1d501a Move audio injectors into their own file 2022-02-03 21:42:37 -07:00
James Betker
8f48848f91 misc 2022-01-22 08:23:29 -07:00
James Betker
b12f47b36d Add some noise to voice_voice_clip 2021-12-29 13:56:30 -07:00
James Betker
62c8ed9a29 move speech utils 2021-12-16 20:47:37 -07:00
James Betker
76f86c0e47 gaussian_diffusion: support fp16 2021-12-12 19:52:21 -07:00
James Betker
aa7cfd1edf Add support for mel norms across the channel dim 2021-12-12 19:52:08 -07:00
James Betker
63bf135b93 Support norms 2021-12-11 08:30:49 -07:00
James Betker
5a664aa56e misc 2021-12-11 08:17:26 -07:00
James Betker
306274245b Also do dynamic range compression across mel 2021-12-10 20:06:24 -07:00
James Betker
faf55684b8 Use slaney norm in the mel filterbank computation 2021-12-10 20:04:52 -07:00
James Betker
9191201f05 asd 2021-12-07 09:55:39 -07:00
James Betker
ef15a39841 fix gdi bug? 2021-12-07 09:53:48 -07:00
James Betker
68e9db12b5 Add interleaving and direct injectors 2021-12-02 21:04:49 -07:00
James Betker
47fe032a3d Try to make diffusion validator more reproducible 2021-11-24 09:38:10 -07:00
James Betker
934395d4b8 A few fixes for gpt_asr_hf2 2021-11-23 09:29:29 -07:00
James Betker
973f47c525 misc nonfunctional 2021-11-22 17:16:39 -07:00
James Betker
3125ca38f5 Further wandb logs 2021-11-22 16:40:19 -07:00
James Betker
0604060580 Finish up mods for next version of GptAsrHf 2021-11-20 21:33:49 -07:00
James Betker
687e0746b3 Add Torch-derived MelSpectrogramInjector 2021-11-18 20:02:45 -07:00
James Betker
c30a38cdf1 Undo baseline GDI changes 2021-11-18 20:02:09 -07:00
James Betker
f36bab95dd Audio resample injector 2021-11-10 20:06:33 -07:00
James Betker
596a62fe01 Apply fix to gpt_asr_hf and prep it for inference
Fix is that we were predicting two characters in advance, not next character
2021-11-04 10:09:24 -06:00
James Betker
993bd52d42 Add spec_augment injector 2021-11-01 18:43:11 -06:00
James Betker
928e7026c2 Mod STFT injector to be specifiable 2021-10-28 22:34:12 -06:00
James Betker
c3421b7f6d Dataset work for audio quality processor 2021-10-24 09:09:34 -06:00
James Betker
d016a2fbad Go back to vanilla flavor of diffusion 2021-10-17 17:32:46 -06:00
James Betker
e24c619387 Fix 2021-09-23 16:07:58 -06:00
James Betker
f78ce9d924 Get diffusion_dvae ready for prime time! 2021-09-16 22:43:10 -06:00
James Betker
6f48674647 Support diffusion models with extra return values & inference in diffusion_dvae 2021-09-16 10:53:46 -06:00
James Betker
b8f2e0f452 mydvae 2021-09-06 17:45:30 -06:00
James Betker
92e7e57f81 Update diffusion_noise_surfer to support audio 2021-09-01 08:34:47 -06:00
James Betker
dabd87246d Add unet_diffusion_vocoder 2021-08-31 14:38:33 -06:00
James Betker
cfd284f425 Fix up some stuff that allows the MEL to be computed on-GPU 2021-08-13 18:35:55 -06:00
James Betker
cdee31c60b GPT_ASR 2021-08-13 15:02:18 -06:00
James Betker
2814307eee Alterations to support VQVAE on mel spectrograms 2021-08-01 07:54:21 -06:00
James Betker
96e90e7047 Add support for a gaussian-diffusion-based wave tacotron 2021-07-26 16:27:31 -06:00
James Betker
97d7cbbc34 Additional work for audio xformer (which doesnt really do a great job) 2021-07-23 10:58:14 -06:00
James Betker
2325e7a88c Allow inference for vqvae 2021-07-20 10:40:05 -06:00
James Betker
d81386c1be Mods to support vqvae in audio mode (1d) 2021-07-20 08:36:46 -06:00
James Betker
be2745f42d Add waveglow & inference capabilities to audio generator 2021-07-08 23:07:36 -06:00
James Betker
e7890dc0ba Misc fixes for diffusion nets 2021-06-21 10:38:07 -06:00
James Betker
68cbbed886 Add some cool diffusion testing scripts 2021-06-16 16:26:36 -06:00
James Betker
5b4f86293f Add FID evaluator for diffusion models 2021-06-14 09:14:30 -06:00
James Betker
65c474eecf Various changes to fix testing 2021-06-11 15:31:10 -06:00
James Betker
7c5478bc2c Formatting issue with gdi 2021-06-06 16:35:37 -06:00
James Betker
692e9c417b Support diffusion unet 2021-06-06 13:57:22 -06:00
James Betker
80d4404367 A few fixes:
- Output better prediction of xstart from eps
- Support LossAwareSampler
- Support AdamW
2021-06-05 13:40:32 -06:00
James Betker
bf811f80c1 GD mods & fixes
- Report variational loss separately
- Report model prediction from injector
- Log these things
- Use respacing like guided diffusion
2021-06-04 17:13:16 -06:00
James Betker
6084915af8 Support gaussian diffusion models
Adds support for GD models, courtesy of some maths from openai.

Also:
- Fixes requirement for eval{} even when it isn't being used
- Adds support for denormalizing an imagenet norm
2021-06-02 21:47:32 -06:00
James Betker
f89ea5f1c6 Mods to support lightweight_gan model 2021-03-02 20:51:48 -07:00
James Betker
784b96c059 Misc options to add support for training stylegan2-rosinality models:
- Allow image_folder_dataset to normalize inbound images
- ExtensibleTrainer can denormalize images on the output path
- Support .webp - an output from LSUN
- Support logistic GAN divergence loss
- Support stylegan2 TF weight extraction for discriminator
- New injector that produces latent noise (with separated paths)
- Modify FID evaluator to be operable with rosinality-style GANs
2021-02-08 08:09:21 -07:00
James Betker
acf1535b14 Fix for randomresizedcrop injector 2021-01-07 16:31:43 -07:00
James Betker
04961b91cf Add random-crop injector 2021-01-07 12:14:55 -07:00
James Betker
63cf3d3126 Injector auto-registration
I love it!
2020-12-29 20:58:02 -07:00