DL-Art-School

Author	SHA1	Message	Date
James Betker	a6f0f854b9	Fix codes when inferring from dvae	2021-10-17 22:51:17 -06:00
James Betker	d016a2fbad	Go back to vanilla flavor of diffusion	2021-10-17 17:32:46 -06:00
James Betker	23da073037	Norm decoder outputs now	2021-10-16 09:07:10 -06:00
James Betker	0edc98f6c4	Throw out the idea of conditioning on discrete codes. Oh well :(	2021-10-16 09:02:01 -06:00
James Betker	62c8c5d93e	Zero out spectrogram code inputs initially.	2021-10-15 12:10:11 -06:00
James Betker	1d0b44ebc2	More tweaks to diffusion-vocoder	2021-10-15 11:51:17 -06:00
James Betker	3b19581f9a	Allow num_resblocks to specified per-level	2021-10-14 11:26:04 -06:00
James Betker	83798887a8	Mods to support unet diffusion vocoder with conditioning	2021-10-13 21:23:18 -06:00
James Betker	33120cb35c	Add norming to discretization_loss	2021-10-06 17:10:50 -06:00
James Betker	f2977d360c	Allow attention_dim in channel attention to be specified, add converter	2021-10-05 17:29:38 -06:00
James Betker	9c0d7288ea	Discretization loss attempt	2021-10-04 20:59:21 -06:00
James Betker	66f99a159c	Rev2	2021-10-03 15:20:50 -06:00
James Betker	09f373e3b1	Add dvae with channel attention	2021-10-03 10:52:01 -06:00
James Betker	0396a9d2ca	Increase baseline codes recording across all dvae models	2021-09-30 08:09:07 -06:00
James Betker	f84ccbdfb2	Fix quantizer with balancing_heuristic	2021-09-29 14:46:05 -06:00
James Betker	4914c526dc	More cleanup	2021-09-29 14:24:49 -06:00
James Betker	6e550edfe3	Attentive dvae	2021-09-29 14:17:29 -06:00
James Betker	55b58fb67f	Clean up codebase Remove stuff that I'm likely not going to use again (or generally failed experiments)	2021-09-29 09:21:44 -06:00
James Betker	4d1a42e944	Add switchnorm to gumbel_quantizer	2021-09-24 18:49:25 -06:00
James Betker	ac57cdc794	Add scheduling to quantizer, enable cudnn_benchmarking to be disabled	2021-09-24 17:01:36 -06:00
James Betker	3e64e847c2	Gumbel quantizer	2021-09-23 23:32:03 -06:00
James Betker	c5297ccec6	Add dvae balancing heuristic	2021-09-23 21:19:36 -06:00
James Betker	e24c619387	Fix	2021-09-23 16:07:58 -06:00
James Betker	6833048bf7	Alterations to diffusion_dvae so it can be used directly on spectrograms	2021-09-23 15:56:25 -06:00
James Betker	5c8d266d4f	chk	2021-09-17 09:15:36 -06:00
James Betker	a6544f1684	More checkpointing fixes	2021-09-16 23:12:43 -06:00
James Betker	94899d88f3	Fix overuse of checkpointing	2021-09-16 23:00:28 -06:00
James Betker	f78ce9d924	Get diffusion_dvae ready for prime time!	2021-09-16 22:43:10 -06:00
James Betker	6f48674647	Support diffusion models with extra return values & inference in diffusion_dvae	2021-09-16 10:53:46 -06:00
James Betker	0382660159	Get diffusion_dvae functional	2021-09-14 17:43:31 -06:00
James Betker	76e2c497f7	Improvements to splitter	2021-09-09 23:34:56 -06:00
James Betker	742f9b4010	Batch spleeter cleaner using GPU	2021-09-09 23:14:32 -06:00
James Betker	73b930c0f6	Add diffusion_dvae Increase split_on_silence interval	2021-09-09 16:22:05 -06:00
James Betker	b8f2e0f452	mydvae	2021-09-06 17:45:30 -06:00
James Betker	3e073cff85	Set kernel_size in diffusion_vocoder	2021-09-01 08:33:46 -06:00
James Betker	dabd87246d	Add unet_diffusion_vocoder	2021-08-31 14:38:33 -06:00
James Betker	909754cc27	Add find_faulty_files.py	2021-08-25 18:00:43 -06:00
James Betker	08b33c8e3a	Support silu activation	2021-08-25 09:03:14 -06:00
James Betker	67bf7f5219	dvae mods Trying to squeeze as much performance out of this net as possible	2021-08-25 08:55:13 -06:00
James Betker	b521d94b01	Make gpt-asr more configurable	2021-08-19 16:33:41 -06:00
James Betker	570ed327ed	Stop dataset - attempt #2	2021-08-18 18:29:38 -06:00
James Betker	17453ccbe8	Revert mods to lrdvae They didn't really change anything	2021-08-17 09:09:29 -06:00
James Betker	8332923f5c	Two more tools to test the audio segmentor	2021-08-17 09:09:11 -06:00
James Betker	1fede41b7b	Audio segmentor	2021-08-16 22:51:53 -06:00
James Betker	729c1fd5a9	Fix up max lengths to save memory	2021-08-15 21:29:28 -06:00
James Betker	9e47e64d5a	Add gpt_segmentor model The idea is to specifically train a model that extracts phrases from audio clips.	2021-08-15 21:23:07 -06:00
James Betker	a826d5f658	Mods to dvae - Add resblock to each layer - Increase filter size for each layer - Use SiLU	2021-08-15 20:54:10 -06:00
James Betker	b8bec22f1a	Fix gpt_asr inference bug	2021-08-15 20:53:42 -06:00
James Betker	a523c4f932	Auto-normalize wav files by data type	2021-08-15 09:09:51 -06:00
James Betker	98057b6516	Make lrdvae use quantized mode in eval()	2021-08-14 23:43:01 -06:00
James Betker	ad3391bd96	Fix nan issue when interpolating audio	2021-08-14 20:42:01 -06:00
James Betker	d6a73acaed	Allow processing of multiple audio sources at once from nv_tacotron_dataset	2021-08-14 16:04:05 -06:00
James Betker	007976082b	GPT_asr for inference	2021-08-14 14:37:17 -06:00
James Betker	e1bdd3f7c7	Fix gpt_asr bug. Initial implementation of beam search	2021-08-13 22:47:00 -06:00
James Betker	cdee31c60b	GPT_ASR	2021-08-13 15:02:18 -06:00
James Betker	f5a9b88ef6	tacotron cleaners: remove quotation marks these don't really have relevance for tts or asr	2021-08-11 16:18:44 -06:00
James Betker	20586a8edc	Fix LRDVAE bug with quantizer integration	2021-08-11 16:17:22 -06:00
James Betker	82fc69abfa	Add "pure" evaluator Which simply computes the training loss against an eval dataset	2021-08-09 14:58:35 -06:00
James Betker	080bea2f19	No, really	2021-08-09 12:02:31 -06:00
James Betker	e1ce4671e4	Apply dropout to gpt_tts, get rid of min_gpt implementation	2021-08-09 12:01:10 -06:00
James Betker	1068f53b78	Add a sampling beam search	2021-08-09 11:56:06 -06:00
James Betker	01cfae28d8	Beam search implementation in one pass? Dayyyum	2021-08-08 23:22:42 -06:00
James Betker	690d7e86d3	Fix nv_tacotron_dataset bug which incorrectly mapped filenames dammit..	2021-08-08 11:38:52 -06:00
James Betker	a2afb25e42	Fix inference, always flow full text tokens through transformer	2021-08-07 20:11:10 -06:00
James Betker	4c678172d6	ugh	2021-08-06 22:10:18 -06:00
James Betker	e723137273	Make gpttts more configurable	2021-08-06 22:08:51 -06:00
James Betker	a7496b661c	combined dvae ftw	2021-08-06 22:01:06 -06:00
James Betker	0237e96b34	Fix dvae bug	2021-08-06 14:17:01 -06:00
James Betker	0799d95af5	Use quantizer from rosinality/vqvae with openai dvae	2021-08-06 14:06:26 -06:00
James Betker	d3ace153af	Add logic for performing inference using gpt_tts with dual-encoder modes	2021-08-06 12:04:12 -06:00
James Betker	b43683b772	Add lucidrains_dvae	2021-08-06 12:03:46 -06:00
James Betker	70dcd1107f	Fix byol_model_wrapper to function with audio inputs	2021-08-05 22:20:22 -06:00
James Betker	89d15c9e74	Move gpt-tts back to lucidrains implementation Much better performance.	2021-08-05 22:15:13 -06:00
James Betker	d120e1aa99	Add audio augmentation to wavfile_dataset, utility to test audio similary	2021-08-05 22:14:49 -06:00
James Betker	c0f61a2e15	Rework how DVAE tokens are ordered It might make more sense to have top tokens, then bottom tokens with top tokens having different discretized values.	2021-08-05 07:07:17 -06:00
James Betker	4017236ba9	Fix up inference for gpt_tts	2021-08-05 06:46:30 -06:00
James Betker	5037220ac7	Mods to support contrastive learning on audio files	2021-08-05 05:57:04 -06:00
James Betker	341f28dd82	It works!	2021-08-04 20:07:51 -06:00
James Betker	36c7c1fbdb	Fix training flow for NEXT TOKEN prediction instead of same token prediction doh	2021-08-04 10:28:09 -06:00
James Betker	d9936df363	Add gpt_tts dataset and implement inference - Adds a script which preprocesses quantized mels given a DVAE - Adds a dataset which can consume preprocessed qmels - Reworks GPT TTS to consume the outputs of that dataset (removes logic to add padding and start/end tokens) - Adds inference to gpt_tts	2021-08-04 00:44:04 -06:00
James Betker	4c98b9703f	Get dalle-style TTS to "work"	2021-08-03 21:08:27 -06:00
James Betker	2814307eee	Alterations to support VQVAE on mel spectrograms	2021-08-01 07:54:21 -06:00
James Betker	0c9e75bc69	Improvements to GptTts	2021-07-31 15:57:57 -06:00
James Betker	31ee9ae262	Checkin	2021-07-30 23:07:35 -06:00
James Betker	dadc54795c	Add gpt_tts	2021-07-27 20:33:30 -06:00
James Betker	398185e109	More work on wave-diffusion	2021-07-27 05:36:17 -06:00
James Betker	49e3b310ea	Allow audio sample rate interpolation for faster training	2021-07-26 17:44:06 -06:00
James Betker	96e90e7047	Add support for a gaussian-diffusion-based wave tacotron	2021-07-26 16:27:31 -06:00
James Betker	97d7cbbc34	Additional work for audio xformer (which doesnt really do a great job)	2021-07-23 10:58:14 -06:00
James Betker	d81386c1be	Mods to support vqvae in audio mode (1d)	2021-07-20 08:36:46 -06:00
James Betker	5584cfcc7a	tacotron2 work	2021-07-14 21:41:57 -06:00
James Betker	fe0c699ced	Various fixes	2021-07-14 00:08:42 -06:00
James Betker	be2745f42d	Add waveglow & inference capabilities to audio generator	2021-07-08 23:07:36 -06:00
James Betker	1ff434218e	tacotron2, ready for prime time!	2021-07-08 22:13:44 -06:00
James Betker	86fd3ad7fd	Initial checkin of nvidia tacotron model & dataset These two are tested, full support for training to come.	2021-07-06 11:11:35 -06:00
James Betker	afa41f1804	Allow hq color jittering and corruptions that are not included in the corruption factor	2021-06-30 09:44:46 -06:00
James Betker	6fd16ea9c8	Add meta-anomaly detection, colorjitter augmentation	2021-06-29 13:41:55 -06:00
James Betker	46e9f62be0	Add unet with latent guide This is a diffusion network that uses both a LQ image and a reference sample HQ image that is compressed into a latent vector to perform upsampling The hope is that we can steer the upsampling network with sample images.	2021-06-26 11:02:58 -06:00
James Betker	0ded106562	Merge remote-tracking branch 'origin/master'	2021-06-25 13:16:28 -06:00
James Betker	a57ed8e960	Various mods to support better jpeg image filtering	2021-06-25 13:16:15 -06:00

1 2 3 4 5 ...

886 Commits