DL-Art-School

Author	SHA1	Message	Date
James Betker	c737632eae	Train and use a bespoke tokenizer	2021-12-22 15:06:14 -07:00
James Betker	a9629f7022	Try out using the GPT tokenizer rather than nv_tacotron This results in a significant compression of the text domain, I'm curious what the effect on speech quality will be.	2021-12-22 14:03:18 -07:00
James Betker	ced81a760b	restore nv_tacotron	2021-12-22 13:48:53 -07:00
James Betker	7bf4f9f580	duplicate nvtacotron	2021-12-22 13:48:30 -07:00
James Betker	9e8a9bf6ca	Various fixes to gpt_tts_hf	2021-12-16 23:28:44 -07:00
James Betker	31fc693a8a	dafsdf	2021-12-02 22:55:36 -07:00
James Betker	040d998922	maasd	2021-12-02 22:53:48 -07:00
James Betker	cc10e7e7e8	Add tsv loader	2021-12-02 22:43:07 -07:00
James Betker	702607556d	nv_tacotron_dataset: allow it to load conditioning signals	2021-12-02 22:14:44 -07:00
James Betker	0604060580	Finish up mods for next version of GptAsrHf	2021-11-20 21:33:49 -07:00
James Betker	9b3c3b1227	use sets instead of list ops	2021-11-07 20:45:57 -07:00
James Betker	722d3dbdc2	f	2021-11-07 18:52:05 -07:00
James Betker	18b1de9b2c	Add exclusion_lists to unsupervised_audio_dataset	2021-11-07 18:46:47 -07:00
James Betker	fd14746bf8	badtimes	2021-11-03 00:33:38 -06:00
James Betker	2fa80486de	tacotron_dataset: recover gracefully	2021-11-03 00:31:50 -06:00
James Betker	af51d00dee	Load wav files from voxpopuli instead of oggs	2021-11-02 09:32:26 -06:00
James Betker	f7d0901ce6	Decouple MEL from nv_tacotron_dataset	2021-10-31 15:01:38 -06:00
James Betker	b8b268b5f6	Misc	2021-10-31 14:29:23 -06:00
James Betker	579f0a70ee	Move UnsupervisedAudioDataset to use my new mp3 loader	2021-10-28 22:33:12 -06:00
James Betker	5d714bc566	Add deepspeech model and support for decoding with it	2021-10-27 13:09:46 -06:00
James Betker	21b6daa0ed	Introduce clip resampling	2021-10-26 10:42:23 -06:00
James Betker	c3421b7f6d	Dataset work for audio quality processor	2021-10-24 09:09:34 -06:00
James Betker	06ea6191a9	Initial implementation of audio_with_noise dataset	2021-10-21 16:45:19 -06:00
James Betker	d016a2fbad	Go back to vanilla flavor of diffusion	2021-10-17 17:32:46 -06:00
James Betker	6833048bf7	Alterations to diffusion_dvae so it can be used directly on spectrograms	2021-09-23 15:56:25 -06:00
James Betker	359e9e27a7	unsupervised_audio_dataset: try to recover from failures of audio2numpy	2021-09-17 15:25:57 -06:00
James Betker	f78ce9d924	Get diffusion_dvae ready for prime time!	2021-09-16 22:43:10 -06:00
James Betker	1197ae1928	Misc	2021-09-16 10:53:56 -06:00
James Betker	8d9857f33d	More fixes	2021-09-14 20:45:05 -06:00
James Betker	9a9c90660f	Fixes	2021-09-14 18:29:17 -06:00
James Betker	e513052fca	Add unsupervised_audio_dataset	2021-09-14 17:43:16 -06:00
James Betker	b8f2e0f452	mydvae	2021-09-06 17:45:30 -06:00
James Betker	30cd33fe44	another fix	2021-08-31 14:46:46 -06:00
James Betker	8810d3de97	fix wavfile_dataset	2021-08-31 14:45:29 -06:00
James Betker	dabd87246d	Add unet_diffusion_vocoder	2021-08-31 14:38:33 -06:00
James Betker	570ed327ed	Stop dataset - attempt #2	2021-08-18 18:29:38 -06:00
James Betker	8332923f5c	Two more tools to test the audio segmentor	2021-08-17 09:09:11 -06:00
James Betker	93e903af15	Rework wavfile dataset to be usable for things other than augments	2021-08-16 22:52:35 -06:00
James Betker	d7f30232c3	Oh yeah	2021-08-16 22:52:15 -06:00
James Betker	4c01d82265	Fix for voxpopuli	2021-08-16 22:52:05 -06:00
James Betker	1fede41b7b	Audio segmentor	2021-08-16 22:51:53 -06:00
James Betker	2d3372054d	Add support for voxpopuli to nv_tacotron_dataset	2021-08-16 17:13:40 -06:00
James Betker	3580c52eac	Fix up wavfile_dataset to be able to provide a full clip	2021-08-15 20:53:26 -06:00
James Betker	a523c4f932	Auto-normalize wav files by data type	2021-08-15 09:09:51 -06:00
James Betker	c28f657ab8	Allow usage of pre-rendered mels saved to npy files	2021-08-14 23:38:15 -06:00
James Betker	ad3391bd96	Fix nan issue when interpolating audio	2021-08-14 20:42:01 -06:00
James Betker	769f0acc53	Moar fix	2021-08-14 17:23:15 -06:00
James Betker	3d2e724083	Fix audio ranging problem	2021-08-14 17:18:55 -06:00
James Betker	d6a73acaed	Allow processing of multiple audio sources at once from nv_tacotron_dataset	2021-08-14 16:04:05 -06:00
James Betker	007976082b	GPT_asr for inference	2021-08-14 14:37:17 -06:00
James Betker	72622b4d61	Allow saving mel strips as files from the dataset implementation	2021-08-13 22:46:41 -06:00
James Betker	cfd284f425	Fix up some stuff that allows the MEL to be computed on-GPU	2021-08-13 18:35:55 -06:00
James Betker	fff1a59e08	max/min mel invalid fix	2021-08-13 09:36:31 -06:00
James Betker	4b2946e581	More fix	2021-08-12 15:51:23 -06:00
James Betker	4c76257c71	Dont require collation for nv_tacotron	2021-08-12 15:44:55 -06:00
James Betker	5b07d3b623	Found error that I was trying to fix with reload=True	2021-08-12 15:22:34 -06:00
James Betker	430b650a34	......	2021-08-12 10:31:10 -06:00
James Betker	b35d6ae028	Print some metrics from tacotron dataset when it croaks	2021-08-12 09:21:12 -06:00
James Betker	0c4d6b1916	Just offer generic re-load for nv-tacotron	2021-08-12 09:09:12 -06:00
James Betker	154f5aa73c	Fix annoying warning and add to requirements	2021-08-11 17:32:06 -06:00
James Betker	f04a7bdf63	Bug fixes for tacotron dataset on mozilla cv - Support a max mel length (mozilla cv has some tracks that are basically unbounded..) - Don't fail on low sample rates (mozilla cv has some of those)	2021-08-11 16:17:03 -06:00
James Betker	2d3f0cc33c	nv_tacotron_dataset - Allow training on mozilla cv	2021-08-11 13:34:31 -06:00
James Betker	d0c74278bf	Enable multiple wavfile paths to be specified, fix eps bug in mp3 splitter	2021-08-11 08:46:02 -06:00
James Betker	e19c00398e	More improvements to random_mp3_splitter	2021-08-09 21:31:12 -06:00
James Betker	74342b860b	Revert "Undo forced text padding" This reverts commit `83ab5e6a00`.	2021-08-09 11:56:34 -06:00
James Betker	d4e33bf15f	Fixes to the mp3 splitter	2021-08-09 11:55:46 -06:00
James Betker	4100469902	Add a tool to split mp3 files into arbitrary chunks of wav files	2021-08-08 23:23:13 -06:00
James Betker	83ab5e6a00	Undo forced text padding	2021-08-08 11:42:20 -06:00
James Betker	690d7e86d3	Fix nv_tacotron_dataset bug which incorrectly mapped filenames dammit..	2021-08-08 11:38:52 -06:00
James Betker	a2afb25e42	Fix inference, always flow full text tokens through transformer	2021-08-07 20:11:10 -06:00
James Betker	b43683b772	Add lucidrains_dvae	2021-08-06 12:03:46 -06:00
James Betker	62c7570512	Constrain wav_aug a bit more	2021-08-06 08:19:38 -06:00
James Betker	f126040da2	Undo noise first	2021-08-05 23:24:38 -06:00
James Betker	908ef5495f	Add noise first to audio_aug	2021-08-05 23:22:44 -06:00
James Betker	d6007c6de1	dataset fixes	2021-08-05 23:12:59 -06:00
James Betker	d120e1aa99	Add audio augmentation to wavfile_dataset, utility to test audio similary	2021-08-05 22:14:49 -06:00
James Betker	4017236ba9	Fix up inference for gpt_tts	2021-08-05 06:46:30 -06:00
James Betker	5037220ac7	Mods to support contrastive learning on audio files	2021-08-05 05:57:04 -06:00
James Betker	341f28dd82	It works!	2021-08-04 20:07:51 -06:00
James Betker	d9936df363	Add gpt_tts dataset and implement inference - Adds a script which preprocesses quantized mels given a DVAE - Adds a dataset which can consume preprocessed qmels - Reworks GPT TTS to consume the outputs of that dataset (removes logic to add padding and start/end tokens) - Adds inference to gpt_tts	2021-08-04 00:44:04 -06:00
James Betker	dadc54795c	Add gpt_tts	2021-07-27 20:33:30 -06:00
James Betker	49e3b310ea	Allow audio sample rate interpolation for faster training	2021-07-26 17:44:06 -06:00
James Betker	96e90e7047	Add support for a gaussian-diffusion-based wave tacotron	2021-07-26 16:27:31 -06:00
James Betker	d81386c1be	Mods to support vqvae in audio mode (1d)	2021-07-20 08:36:46 -06:00
James Betker	1ff434218e	tacotron2, ready for prime time!	2021-07-08 22:13:44 -06:00
James Betker	86fd3ad7fd	Initial checkin of nvidia tacotron model & dataset These two are tested, full support for training to come.	2021-07-06 11:11:35 -06:00
James Betker	afa41f1804	Allow hq color jittering and corruptions that are not included in the corruption factor	2021-06-30 09:44:46 -06:00
James Betker	6fd16ea9c8	Add meta-anomaly detection, colorjitter augmentation	2021-06-29 13:41:55 -06:00
James Betker	46e9f62be0	Add unet with latent guide This is a diffusion network that uses both a LQ image and a reference sample HQ image that is compressed into a latent vector to perform upsampling The hope is that we can steer the upsampling network with sample images.	2021-06-26 11:02:58 -06:00
James Betker	0ded106562	Merge remote-tracking branch 'origin/master'	2021-06-25 13:16:28 -06:00
James Betker	a57ed8e960	Various mods to support better jpeg image filtering	2021-06-25 13:16:15 -06:00
James Betker	61e7ca39cd	Update image_folder_dataset.py	2021-06-25 11:48:31 -06:00
James Betker	6b32c87dcb	Try to make diffusion fid more deterministic	2021-06-14 09:27:43 -06:00
James Betker	65c474eecf	Various changes to fix testing	2021-06-11 15:31:10 -06:00
James Betker	6c6e82406e	Pass a corruption factor through the dataset into the upsampling network The intuition is this will help guide the network to make better informed decisions about how it performs upsampling based on how it perceives the underlying content. (I'm giving up on letting networks detect their own quality - I'm not convinced it is actually feasible)	2021-06-07 09:13:54 -06:00
James Betker	fb405d9ef1	CIFAR stuff - Extract coarse labels for the CIFAR dataset - Add simple resnet that branches lower layers based on coarse labels - Some other cleanup	2021-06-05 14:16:02 -06:00
James Betker	e6c537824a	Allow validation for ce	2021-06-04 21:21:04 -06:00
James Betker	7c251af7a8	Support cifar100 with resnet	2021-06-04 17:29:07 -06:00
James Betker	6084915af8	Support gaussian diffusion models Adds support for GD models, courtesy of some maths from openai. Also: - Fixes requirement for eval{} even when it isn't being used - Adds support for denormalizing an imagenet norm	2021-06-02 21:47:32 -06:00
James Betker	45bc76ba92	Fixes and mods to support training classifiers on imagenet	2021-06-01 17:25:24 -06:00

1 2 3 4 5 ...

306 Commits