DL-Art-School

Author	SHA1	Message	Date
James Betker	82d0e7720e	Add choke to lucidrains_dvae	2021-11-23 18:53:37 -07:00
James Betker	934395d4b8	A few fixes for gpt_asr_hf2	2021-11-23 09:29:29 -07:00
James Betker	01e635168b	whoops	2021-11-22 17:24:13 -07:00
James Betker	973f47c525	misc nonfunctional	2021-11-22 17:16:39 -07:00
James Betker	3125ca38f5	Further wandb logs	2021-11-22 16:40:19 -07:00
James Betker	0604060580	Finish up mods for next version of GptAsrHf	2021-11-20 21:33:49 -07:00
James Betker	14f3155ec4	misc	2021-11-20 17:45:14 -07:00
James Betker	555b7e52ad	Add rev2 of GptAsrHf	2021-11-18 20:02:24 -07:00
James Betker	1287915f3c	Fix dvae test failure	2021-11-18 00:58:36 -07:00
James Betker	019acfa4c5	Allow flat dvae	2021-11-18 00:53:42 -07:00
James Betker	f3db41f125	Fix code logging	2021-11-18 00:34:37 -07:00
James Betker	79367f753d	Fix error & add nonfinite warning	2021-11-09 23:58:41 -07:00
James Betker	c584320cf3	Fix gpt_asr_hf distillation	2021-11-07 21:53:21 -07:00
James Betker	a367ea3fda	Add script for computing attention for gpt_asr	2021-11-07 18:42:06 -07:00
James Betker	756b4dad09	Working gpt_asr_hf inference - and it's a beast!	2021-11-06 21:47:15 -06:00
James Betker	596a62fe01	Apply fix to gpt_asr_hf and prep it for inference Fix is that we were predicting two characters in advance, not next character	2021-11-04 10:09:24 -06:00
James Betker	993bd52d42	Add spec_augment injector	2021-11-01 18:43:11 -06:00
James Betker	4cff774b0e	Reduce complexity of the encoder for gpt_asr_hf	2021-11-01 17:02:28 -06:00
James Betker	da55ca0438	gpt_asr using the huggingfaces transformer	2021-11-01 17:00:22 -06:00
James Betker	83cccef9d8	Condition on full signal	2021-10-30 19:58:34 -06:00
James Betker	df45a9dec2	Fix inference mode for lucidrains_gpt	2021-10-30 16:59:18 -06:00
James Betker	92fe8b4dd9	ffffpt2	2021-10-29 17:29:49 -06:00
James Betker	95ca88efce	Fix feedforward	2021-10-29 17:27:51 -06:00
James Betker	b476516340	Check in backing changes (which may have broken something?)	2021-10-29 17:22:33 -06:00
James Betker	986fc9628d	Check in GPT with new inference methods (but not the backing code..)	2021-10-29 17:21:40 -06:00
James Betker	58494b0888	Add support for distilling gpt_asr	2021-10-27 13:10:07 -06:00
James Betker	5d714bc566	Add deepspeech model and support for decoding with it	2021-10-27 13:09:46 -06:00
James Betker	3a9d1c53ea	Rework conditioning inputs provided	2021-10-26 10:46:33 -06:00
James Betker	43e389aac6	Add time_embed_dim_multiplier	2021-10-26 08:55:55 -06:00
James Betker	ba6e46c02a	Further simplify diffusion_vocoder and make noise_surfer work	2021-10-26 08:54:30 -06:00
James Betker	0ee1c67ce5	Rework how conditioning inputs are applied to DiffusionVocoder	2021-10-24 09:08:58 -06:00
James Betker	06ea6191a9	Initial implementation of audio_with_noise dataset	2021-10-21 16:45:19 -06:00
James Betker	0dee15f875	base DVAE & vector_quantizer	2021-10-20 21:19:38 -06:00
James Betker	f2a31702b5	Clean stuff up, move more things into arch_util	2021-10-20 21:19:25 -06:00
James Betker	a6f0f854b9	Fix codes when inferring from dvae	2021-10-17 22:51:17 -06:00
James Betker	d016a2fbad	Go back to vanilla flavor of diffusion	2021-10-17 17:32:46 -06:00
James Betker	23da073037	Norm decoder outputs now	2021-10-16 09:07:10 -06:00
James Betker	0edc98f6c4	Throw out the idea of conditioning on discrete codes. Oh well :(	2021-10-16 09:02:01 -06:00
James Betker	62c8c5d93e	Zero out spectrogram code inputs initially.	2021-10-15 12:10:11 -06:00
James Betker	1d0b44ebc2	More tweaks to diffusion-vocoder	2021-10-15 11:51:17 -06:00
James Betker	3b19581f9a	Allow num_resblocks to specified per-level	2021-10-14 11:26:04 -06:00
James Betker	83798887a8	Mods to support unet diffusion vocoder with conditioning	2021-10-13 21:23:18 -06:00
James Betker	33120cb35c	Add norming to discretization_loss	2021-10-06 17:10:50 -06:00
James Betker	f2977d360c	Allow attention_dim in channel attention to be specified, add converter	2021-10-05 17:29:38 -06:00
James Betker	9c0d7288ea	Discretization loss attempt	2021-10-04 20:59:21 -06:00
James Betker	66f99a159c	Rev2	2021-10-03 15:20:50 -06:00
James Betker	09f373e3b1	Add dvae with channel attention	2021-10-03 10:52:01 -06:00
James Betker	0396a9d2ca	Increase baseline codes recording across all dvae models	2021-09-30 08:09:07 -06:00
James Betker	f84ccbdfb2	Fix quantizer with balancing_heuristic	2021-09-29 14:46:05 -06:00
James Betker	4914c526dc	More cleanup	2021-09-29 14:24:49 -06:00
James Betker	6e550edfe3	Attentive dvae	2021-09-29 14:17:29 -06:00
James Betker	55b58fb67f	Clean up codebase Remove stuff that I'm likely not going to use again (or generally failed experiments)	2021-09-29 09:21:44 -06:00
James Betker	4d1a42e944	Add switchnorm to gumbel_quantizer	2021-09-24 18:49:25 -06:00
James Betker	ac57cdc794	Add scheduling to quantizer, enable cudnn_benchmarking to be disabled	2021-09-24 17:01:36 -06:00
James Betker	3e64e847c2	Gumbel quantizer	2021-09-23 23:32:03 -06:00
James Betker	c5297ccec6	Add dvae balancing heuristic	2021-09-23 21:19:36 -06:00
James Betker	e24c619387	Fix	2021-09-23 16:07:58 -06:00
James Betker	6833048bf7	Alterations to diffusion_dvae so it can be used directly on spectrograms	2021-09-23 15:56:25 -06:00
James Betker	5c8d266d4f	chk	2021-09-17 09:15:36 -06:00
James Betker	a6544f1684	More checkpointing fixes	2021-09-16 23:12:43 -06:00
James Betker	94899d88f3	Fix overuse of checkpointing	2021-09-16 23:00:28 -06:00
James Betker	f78ce9d924	Get diffusion_dvae ready for prime time!	2021-09-16 22:43:10 -06:00
James Betker	6f48674647	Support diffusion models with extra return values & inference in diffusion_dvae	2021-09-16 10:53:46 -06:00
James Betker	0382660159	Get diffusion_dvae functional	2021-09-14 17:43:31 -06:00
James Betker	76e2c497f7	Improvements to splitter	2021-09-09 23:34:56 -06:00
James Betker	742f9b4010	Batch spleeter cleaner using GPU	2021-09-09 23:14:32 -06:00
James Betker	73b930c0f6	Add diffusion_dvae Increase split_on_silence interval	2021-09-09 16:22:05 -06:00
James Betker	b8f2e0f452	mydvae	2021-09-06 17:45:30 -06:00
James Betker	3e073cff85	Set kernel_size in diffusion_vocoder	2021-09-01 08:33:46 -06:00
James Betker	dabd87246d	Add unet_diffusion_vocoder	2021-08-31 14:38:33 -06:00
James Betker	909754cc27	Add find_faulty_files.py	2021-08-25 18:00:43 -06:00
James Betker	08b33c8e3a	Support silu activation	2021-08-25 09:03:14 -06:00
James Betker	67bf7f5219	dvae mods Trying to squeeze as much performance out of this net as possible	2021-08-25 08:55:13 -06:00
James Betker	b521d94b01	Make gpt-asr more configurable	2021-08-19 16:33:41 -06:00
James Betker	570ed327ed	Stop dataset - attempt #2	2021-08-18 18:29:38 -06:00
James Betker	17453ccbe8	Revert mods to lrdvae They didn't really change anything	2021-08-17 09:09:29 -06:00
James Betker	8332923f5c	Two more tools to test the audio segmentor	2021-08-17 09:09:11 -06:00
James Betker	1fede41b7b	Audio segmentor	2021-08-16 22:51:53 -06:00
James Betker	729c1fd5a9	Fix up max lengths to save memory	2021-08-15 21:29:28 -06:00
James Betker	9e47e64d5a	Add gpt_segmentor model The idea is to specifically train a model that extracts phrases from audio clips.	2021-08-15 21:23:07 -06:00
James Betker	a826d5f658	Mods to dvae - Add resblock to each layer - Increase filter size for each layer - Use SiLU	2021-08-15 20:54:10 -06:00
James Betker	b8bec22f1a	Fix gpt_asr inference bug	2021-08-15 20:53:42 -06:00
James Betker	a523c4f932	Auto-normalize wav files by data type	2021-08-15 09:09:51 -06:00
James Betker	98057b6516	Make lrdvae use quantized mode in eval()	2021-08-14 23:43:01 -06:00
James Betker	ad3391bd96	Fix nan issue when interpolating audio	2021-08-14 20:42:01 -06:00
James Betker	d6a73acaed	Allow processing of multiple audio sources at once from nv_tacotron_dataset	2021-08-14 16:04:05 -06:00
James Betker	007976082b	GPT_asr for inference	2021-08-14 14:37:17 -06:00
James Betker	e1bdd3f7c7	Fix gpt_asr bug. Initial implementation of beam search	2021-08-13 22:47:00 -06:00
James Betker	cdee31c60b	GPT_ASR	2021-08-13 15:02:18 -06:00
James Betker	f5a9b88ef6	tacotron cleaners: remove quotation marks these don't really have relevance for tts or asr	2021-08-11 16:18:44 -06:00
James Betker	20586a8edc	Fix LRDVAE bug with quantizer integration	2021-08-11 16:17:22 -06:00
James Betker	82fc69abfa	Add "pure" evaluator Which simply computes the training loss against an eval dataset	2021-08-09 14:58:35 -06:00
James Betker	080bea2f19	No, really	2021-08-09 12:02:31 -06:00
James Betker	e1ce4671e4	Apply dropout to gpt_tts, get rid of min_gpt implementation	2021-08-09 12:01:10 -06:00
James Betker	1068f53b78	Add a sampling beam search	2021-08-09 11:56:06 -06:00
James Betker	01cfae28d8	Beam search implementation in one pass? Dayyyum	2021-08-08 23:22:42 -06:00
James Betker	690d7e86d3	Fix nv_tacotron_dataset bug which incorrectly mapped filenames dammit..	2021-08-08 11:38:52 -06:00
James Betker	a2afb25e42	Fix inference, always flow full text tokens through transformer	2021-08-07 20:11:10 -06:00
James Betker	4c678172d6	ugh	2021-08-06 22:10:18 -06:00
James Betker	e723137273	Make gpttts more configurable	2021-08-06 22:08:51 -06:00
James Betker	a7496b661c	combined dvae ftw	2021-08-06 22:01:06 -06:00
James Betker	0237e96b34	Fix dvae bug	2021-08-06 14:17:01 -06:00
James Betker	0799d95af5	Use quantizer from rosinality/vqvae with openai dvae	2021-08-06 14:06:26 -06:00
James Betker	d3ace153af	Add logic for performing inference using gpt_tts with dual-encoder modes	2021-08-06 12:04:12 -06:00
James Betker	b43683b772	Add lucidrains_dvae	2021-08-06 12:03:46 -06:00
James Betker	70dcd1107f	Fix byol_model_wrapper to function with audio inputs	2021-08-05 22:20:22 -06:00
James Betker	89d15c9e74	Move gpt-tts back to lucidrains implementation Much better performance.	2021-08-05 22:15:13 -06:00
James Betker	d120e1aa99	Add audio augmentation to wavfile_dataset, utility to test audio similary	2021-08-05 22:14:49 -06:00
James Betker	c0f61a2e15	Rework how DVAE tokens are ordered It might make more sense to have top tokens, then bottom tokens with top tokens having different discretized values.	2021-08-05 07:07:17 -06:00
James Betker	4017236ba9	Fix up inference for gpt_tts	2021-08-05 06:46:30 -06:00
James Betker	5037220ac7	Mods to support contrastive learning on audio files	2021-08-05 05:57:04 -06:00
James Betker	341f28dd82	It works!	2021-08-04 20:07:51 -06:00
James Betker	36c7c1fbdb	Fix training flow for NEXT TOKEN prediction instead of same token prediction doh	2021-08-04 10:28:09 -06:00
James Betker	d9936df363	Add gpt_tts dataset and implement inference - Adds a script which preprocesses quantized mels given a DVAE - Adds a dataset which can consume preprocessed qmels - Reworks GPT TTS to consume the outputs of that dataset (removes logic to add padding and start/end tokens) - Adds inference to gpt_tts	2021-08-04 00:44:04 -06:00
James Betker	4c98b9703f	Get dalle-style TTS to "work"	2021-08-03 21:08:27 -06:00
James Betker	2814307eee	Alterations to support VQVAE on mel spectrograms	2021-08-01 07:54:21 -06:00
James Betker	0c9e75bc69	Improvements to GptTts	2021-07-31 15:57:57 -06:00
James Betker	31ee9ae262	Checkin	2021-07-30 23:07:35 -06:00
James Betker	dadc54795c	Add gpt_tts	2021-07-27 20:33:30 -06:00
James Betker	398185e109	More work on wave-diffusion	2021-07-27 05:36:17 -06:00
James Betker	49e3b310ea	Allow audio sample rate interpolation for faster training	2021-07-26 17:44:06 -06:00
James Betker	96e90e7047	Add support for a gaussian-diffusion-based wave tacotron	2021-07-26 16:27:31 -06:00
James Betker	97d7cbbc34	Additional work for audio xformer (which doesnt really do a great job)	2021-07-23 10:58:14 -06:00
James Betker	d81386c1be	Mods to support vqvae in audio mode (1d)	2021-07-20 08:36:46 -06:00
James Betker	5584cfcc7a	tacotron2 work	2021-07-14 21:41:57 -06:00
James Betker	fe0c699ced	Various fixes	2021-07-14 00:08:42 -06:00
James Betker	be2745f42d	Add waveglow & inference capabilities to audio generator	2021-07-08 23:07:36 -06:00
James Betker	1ff434218e	tacotron2, ready for prime time!	2021-07-08 22:13:44 -06:00
James Betker	86fd3ad7fd	Initial checkin of nvidia tacotron model & dataset These two are tested, full support for training to come.	2021-07-06 11:11:35 -06:00
James Betker	afa41f1804	Allow hq color jittering and corruptions that are not included in the corruption factor	2021-06-30 09:44:46 -06:00
James Betker	6fd16ea9c8	Add meta-anomaly detection, colorjitter augmentation	2021-06-29 13:41:55 -06:00
James Betker	46e9f62be0	Add unet with latent guide This is a diffusion network that uses both a LQ image and a reference sample HQ image that is compressed into a latent vector to perform upsampling The hope is that we can steer the upsampling network with sample images.	2021-06-26 11:02:58 -06:00
James Betker	0ded106562	Merge remote-tracking branch 'origin/master'	2021-06-25 13:16:28 -06:00
James Betker	a57ed8e960	Various mods to support better jpeg image filtering	2021-06-25 13:16:15 -06:00
James Betker	a0ef07ddb8	Create unet_latent_guide.py	2021-06-25 11:25:14 -06:00
James Betker	e7890dc0ba	Misc fixes for diffusion nets	2021-06-21 10:38:07 -06:00
James Betker	65c474eecf	Various changes to fix testing	2021-06-11 15:31:10 -06:00
James Betker	220f11a5e4	Half channel sizes in cifar_resnet	2021-06-09 17:06:37 -06:00
James Betker	9b5f4abb91	Add fade in for hard switch	2021-06-07 18:15:09 -06:00
James Betker	108c5d829c	Fix dropout norm	2021-06-07 16:13:23 -06:00
James Betker	438217094c	Also debug distribution of switch	2021-06-07 15:36:07 -06:00
James Betker	44b09e5f20	Amplify dropout rate	2021-06-07 15:20:53 -06:00
James Betker	f0d4eb9182	Fixor	2021-06-07 11:58:36 -06:00
James Betker	c456a60466	Another go at fixing nan	2021-06-07 11:51:43 -06:00
James Betker	1c574c5bd1	Attempt to fix nan	2021-06-07 11:43:42 -06:00
James Betker	eda796985b	Try out dropout norm	2021-06-07 11:33:33 -06:00
James Betker	6c6e82406e	Pass a corruption factor through the dataset into the upsampling network The intuition is this will help guide the network to make better informed decisions about how it performs upsampling based on how it perceives the underlying content. (I'm giving up on letting networks detect their own quality - I'm not convinced it is actually feasible)	2021-06-07 09:13:54 -06:00
James Betker	061dbcd458	Another fix to anorm	2021-06-06 15:09:49 -06:00
James Betker	9a6991e461	Fix switch norm average	2021-06-06 15:04:28 -06:00
James Betker	57e1a6a0f2	cifar: add hard routing Also mods switched_routing to support non-pixular inputs	2021-06-06 14:53:43 -06:00

1 2 3 4 5 ...

970 Commits