DL-Art-School

Author	SHA1	Message	Date
James Betker	668876799d	unet_diffusion_tts7	2022-02-20 15:22:38 -07:00
James Betker	0872e17e60	unified_voice mods	2022-02-19 20:37:35 -07:00
James Betker	7b12799370	Reformat mel_text_clip for use in eval	2022-02-19 20:37:26 -07:00
James Betker	bcba65c539	DataParallel Fix	2022-02-19 20:36:35 -07:00
James Betker	34001ad765	et	2022-02-18 18:52:33 -07:00
James Betker	baf7b65566	Attempt to make w2v play with DDP AND checkpointing	2022-02-18 18:47:11 -07:00
James Betker	f3776f1992	reset ctc loss from "mean" to "sum"	2022-02-17 22:00:58 -07:00
James Betker	2b20da679c	make spec_augment a parameter	2022-02-17 20:22:05 -07:00
James Betker	a813fbed9c	Update to evaluator	2022-02-17 17:30:33 -07:00
James Betker	e1d71e1bd5	w2v_wrapper: get rid of ctc attention mask	2022-02-15 20:54:40 -07:00
James Betker	79e8f36d30	Convert CLIP models into new folder	2022-02-15 20:53:07 -07:00
James Betker	8f767b8b4f	...	2022-02-15 07:08:17 -07:00
James Betker	29e07913a8	Fix	2022-02-15 06:58:11 -07:00
James Betker	dd585df772	LAMB optimizer	2022-02-15 06:48:13 -07:00
James Betker	2bdb515068	A few mods to make wav2vec2 trainable with DDP on DLAS	2022-02-15 06:28:54 -07:00
James Betker	52b61b9f77	Update scripts and attempt to figure out how UnifiedVoice could be used to produce CTC codes	2022-02-13 20:48:06 -07:00
James Betker	a4f1641eea	Add & refine WER evaluator for w2v	2022-02-13 20:47:29 -07:00
James Betker	e16af944c0	BSO fix	2022-02-12 20:01:04 -07:00
James Betker	29534180b2	w2v fine tuner	2022-02-12 20:00:59 -07:00
James Betker	0c3cc5ebad	use script updates to fix output size disparities	2022-02-12 20:00:46 -07:00
James Betker	15fd60aad3	Allow EMA training to be disabled	2022-02-12 20:00:23 -07:00
James Betker	3252972057	ctc_code_gen mods	2022-02-12 19:59:54 -07:00
James Betker	35170c77b3	fix sweep	2022-02-11 11:43:11 -07:00
James Betker	c6b6d120fe	fix ranking	2022-02-11 11:34:57 -07:00
James Betker	095944569c	deep_update dicts	2022-02-11 11:32:25 -07:00
James Betker	ab1f6e8ac6	deepcopy map	2022-02-11 11:29:32 -07:00
James Betker	496fb81997	use fork instead	2022-02-11 11:22:25 -07:00
James Betker	4abc094b47	fix train bug	2022-02-11 11:18:15 -07:00
James Betker	006add64c5	sweep fix	2022-02-11 11:17:08 -07:00
James Betker	102142d1eb	f	2022-02-11 11:05:13 -07:00
James Betker	40b08a52d0	dafuk	2022-02-11 11:01:31 -07:00
James Betker	f6a7f12cad	Remove broken evaluator	2022-02-11 11:00:29 -07:00
James Betker	46b97049dc	Fix eval	2022-02-11 10:59:32 -07:00
James Betker	5175b7d91a	training sweeper checkin	2022-02-11 10:46:37 -07:00
James Betker	302ac8652d	Undo mask during training	2022-02-11 09:35:12 -07:00
James Betker	618a20412a	new rev of ctc_code_gen with surrogate LM loss	2022-02-10 23:09:57 -07:00
James Betker	d1d1ae32a1	audio diffusion frechet distance measurement!	2022-02-10 22:55:46 -07:00
James Betker	23a310b488	Fix BSO	2022-02-10 20:54:51 -07:00
James Betker	1e28e02f98	BSO improvement to make it work with distributed optimizers	2022-02-10 09:53:13 -07:00
James Betker	836eb08afb	Update BSO to use the proper step size	2022-02-10 09:44:15 -07:00
James Betker	820a29f81e	ctc code gen mods	2022-02-10 09:44:01 -07:00
James Betker	ac9417b956	ctc_code_gen: mask out all padding tokens	2022-02-09 17:26:30 -07:00
James Betker	a930f2576e	Begin a migration to specifying training rate on megasamples instead of arbitrary "steps" This should help me greatly in tuning models. It's also necessary now that batch size isn't really respected; we simply step once the gradient direction becomes unstable.	2022-02-09 17:25:05 -07:00
James Betker	93ca619267	script updates	2022-02-09 14:26:52 -07:00
James Betker	ddb77ef502	ctc_code_gen: use a mean() on the ConditioningEncoder	2022-02-09 14:26:44 -07:00
James Betker	3d946356f8	batch_size_optimizer works. sweet! no more tuning batch sizes.	2022-02-09 14:26:23 -07:00
James Betker	18938248e4	Add batch_size_optimizer support	2022-02-08 23:51:31 -07:00
James Betker	9e9ae328f2	mild updates	2022-02-08 23:51:17 -07:00
James Betker	ff35d13b99	Use non-uniform noise in diffusion_tts6	2022-02-08 07:27:41 -07:00
James Betker	f44b064c5e	Update scripts	2022-02-07 19:43:18 -07:00
James Betker	34fbb78671	Straight CtcCodeGenerator as an encoder	2022-02-07 15:46:46 -07:00
James Betker	c24682c668	Record load times in fast_paired_dataset	2022-02-07 15:45:38 -07:00
James Betker	65a546c4d7	Fix for tts6	2022-02-05 16:00:14 -07:00
James Betker	5ae816bead	ctc gen checkin	2022-02-05 15:59:53 -07:00
James Betker	bb3d1ab03d	More cleanup	2022-02-04 11:06:17 -07:00
James Betker	5cc342de66	Clean up	2022-02-04 11:00:42 -07:00
James Betker	8fb147e8ab	add an autoregressive ctc code generator	2022-02-04 11:00:15 -07:00
James Betker	7f4fc55344	Update SR model	2022-02-03 21:42:53 -07:00
James Betker	de1a1d501a	Move audio injectors into their own file	2022-02-03 21:42:37 -07:00
James Betker	687393de59	Add a better split_on_silence (processing_pipeline) Going to extend this a bit more going forwards to support the entire pipeline.	2022-02-03 20:00:26 -07:00
James Betker	1d29999648	Uupdates to the TTS production scripts	2022-02-03 20:00:01 -07:00
James Betker	bc506d4bcd	Mods to unet_diffusion_tts6 to support super resolution mode	2022-02-03 19:59:39 -07:00
James Betker	4249681c4b	Mods to support a autoregressive CTC code generator	2022-02-03 19:58:54 -07:00
James Betker	8132766d38	tts6	2022-01-31 20:15:06 -07:00
James Betker	fbea6e8eac	Adjustments to diffusion networks	2022-01-30 16:14:06 -07:00
James Betker	e58dab14c3	new diffusion updates from testing	2022-01-29 11:01:01 -07:00
James Betker	935a4e853e	get rid of nil tokens in <2>	2022-01-27 22:45:57 -07:00
James Betker	0152174c0e	Add wandb_step_factor argument	2022-01-27 19:58:58 -07:00
James Betker	e0e36ed98c	Update use_diffuse_tts	2022-01-27 19:57:28 -07:00
James Betker	a77d376ad2	rename unet diffusion tts and add 3	2022-01-27 19:56:24 -07:00
James Betker	7badbf1b4d	update usage scripts	2022-01-25 17:57:26 -07:00
James Betker	8c255811ad	more fixes	2022-01-25 17:57:16 -07:00
James Betker	0f3ca28e39	Allow diffusion model to be trained with masking tokens	2022-01-25 14:26:21 -07:00
James Betker	798ed7730a	i like wasting time	2022-01-24 18:12:08 -07:00
James Betker	fc09cff4b3	angry	2022-01-24 18:09:29 -07:00
James Betker	cc0d9f7216	Fix	2022-01-24 18:05:45 -07:00
James Betker	3a9e3a9db3	consolidate state	2022-01-24 17:59:31 -07:00
James Betker	dfef34ba39	Load ema to cpu memory if specified	2022-01-24 15:08:29 -07:00
James Betker	49edffb6ad	Revise device mapping	2022-01-24 15:08:13 -07:00
James Betker	33511243d5	load model state dicts into the correct device it's not clear to me that this will make a huge difference, but it's a good idea anyways	2022-01-24 14:40:09 -07:00
James Betker	3e16c509f6	Misc fixes	2022-01-24 14:31:43 -07:00
James Betker	e2ed0adbd8	use_diffuse_tts updates	2022-01-24 14:31:28 -07:00
James Betker	e420df479f	Allow steps to specify which state keys to carry forward (reducing memory utilization)	2022-01-24 11:01:27 -07:00
James Betker	62475005e4	Sort data items in descending order, which I suspect will improve performance because we will hit GC less	2022-01-23 19:05:32 -07:00
James Betker	d18aec793a	Revert "(re) attempt diffusion checkpointing logic" This reverts commit `b22eec8fe3`.	2022-01-22 09:14:50 -07:00
James Betker	b22eec8fe3	(re) attempt diffusion checkpointing logic	2022-01-22 08:34:40 -07:00
James Betker	8f48848f91	misc	2022-01-22 08:23:29 -07:00
James Betker	851070075a	text<->cond clip I need that universal clip..	2022-01-22 08:23:14 -07:00
James Betker	8ada52ccdc	Update LR layers to checkpoint better	2022-01-22 08:22:57 -07:00
James Betker	ce929a6b3f	Allow grad scaler to be enabled even in fp32 mode	2022-01-21 23:13:24 -07:00
James Betker	91b4b240ac	dont pickle unique files	2022-01-21 00:02:06 -07:00
James Betker	7fef7fb9ff	Update fast_paired_dataset to report how many audio files it is actually using	2022-01-20 21:49:38 -07:00
James Betker	ed35cfe393	Update inference scripts	2022-01-20 11:28:50 -07:00
James Betker	20312211e0	Fix bug in code alignment	2022-01-20 11:28:12 -07:00
James Betker	8e2439f50d	Decrease resolution requirements to 2048	2022-01-20 11:27:49 -07:00
James Betker	4af8525dc3	Adjust diffusion vocoder to allow training individual levels	2022-01-19 13:37:59 -07:00
James Betker	ac13bfefe8	use_diffuse_tts	2022-01-19 00:35:24 -07:00
James Betker	bcd8cc51e1	Enable collated data for diffusion purposes	2022-01-19 00:35:08 -07:00
James Betker	dc9cd8c206	Update use_gpt_tts to be usable with unified_voice2	2022-01-18 21:14:17 -07:00
James Betker	7b4544b83a	Add an experimental unet_diffusion_tts to perform experiments on	2022-01-18 08:38:24 -07:00
James Betker	b6190e96b2	fast_paired	2022-01-17 15:46:02 -07:00
James Betker	1d30d79e34	De-specify fast-paired-dataset	2022-01-16 21:20:00 -07:00
James Betker	2b36ca5f8e	Revert paired back	2022-01-16 21:10:46 -07:00
James Betker	ad3e7df086	Split the fast random into its own new dataset	2022-01-16 21:10:11 -07:00
James Betker	7331862755	Updated paired to randomly index data, offsetting memory costs and speeding up initialization	2022-01-16 21:09:22 -07:00
James Betker	37e4e737b5	a few fixes	2022-01-16 15:17:17 -07:00
James Betker	35db5ebf41	paired_voice_audio_dataset - aligned codes support	2022-01-15 17:38:26 -07:00
James Betker	3f177cd2b3	requirements	2022-01-15 17:28:59 -07:00
James Betker	b398ecca01	wer fix	2022-01-15 17:28:17 -07:00
James Betker	9100e7fa9b	Add a diffusion network that takes aligned text instead of MELs	2022-01-15 17:28:02 -07:00
James Betker	87c83e4957	update wer script	2022-01-13 17:08:49 -07:00
James Betker	009a1e8404	Add a new diffusion_vocoder that should be trainable faster This new one has a "cheating" top layer, that does not feed down into the unet encoder, but does consume the outputs of the unet. This cheater only operates on half of the input, while the rest of the unet operates on the full input. This limits the dimensionality of this last layer, on the assumption that these last layers consume by far the most computation and memory, but do not require the full input context. Losses are only computed on half of the aggregate input.	2022-01-11 17:26:07 -07:00
James Betker	d4e27ccf62	misc updates	2022-01-11 16:25:40 -07:00
James Betker	91f28580e2	fix unified_voice	2022-01-10 16:17:31 -07:00
James Betker	136744dc1d	Fixes	2022-01-10 14:32:04 -07:00
James Betker	ee3dfac2ae	unified_voice2: decouple positional embeddings and token embeddings from underlying gpt model	2022-01-10 08:14:41 -07:00
James Betker	f503d8d96b	Partially implement performers in transformer_builders	2022-01-09 22:35:03 -07:00
James Betker	ec456b6733	Revert unified_voice back to beginning I'll be doing my work within unified_voice2	2022-01-09 22:34:30 -07:00
James Betker	432073c5ca	Make performer code functional	2022-01-09 22:32:50 -07:00
James Betker	f474a7ac65	unified_voice2	2022-01-09 22:32:34 -07:00
James Betker	c075fe72e2	import performer repo	2022-01-09 22:10:07 -07:00
James Betker	7de3874f15	Make dalle transformer checkpointable	2022-01-09 19:14:35 -07:00
James Betker	70b17da193	Alter unified_voice to use extensible transformer (still WIP)	2022-01-08 22:18:25 -07:00
James Betker	15d9517e26	Allow bi-directional clipping	2022-01-08 22:18:04 -07:00
James Betker	894d245062	More zero_grad fixes	2022-01-08 20:31:19 -07:00
James Betker	8bade38180	Add generic CLIP model based off of x_clip	2022-01-08 19:08:01 -07:00
James Betker	2a9a25e6e7	Fix likely defective nan grad recovery	2022-01-08 18:24:58 -07:00
James Betker	438dd9ed33	fix text-voice-clip bug	2022-01-08 08:55:00 -07:00
James Betker	34774f9948	unified_voice: begin decoupling from HF GPT I'd like to try some different (newer) transformer variants. The way to get there is softly decoupling the transformer portion of this architecture from GPT. This actually should be fairly easy.	2022-01-07 22:51:24 -07:00
James Betker	1f6a5310b8	More fixes to use_gpt_tts	2022-01-07 22:30:55 -07:00
James Betker	68090ac3e9	Finish up the text->voice clip model	2022-01-07 22:28:45 -07:00
James Betker	65ffe38fce	misc	2022-01-06 22:16:17 -07:00
James Betker	6706591d3d	Fix dataset	2022-01-06 15:24:37 -07:00
James Betker	f4484fd155	Add "dataset_debugger" support This allows the datasets themselves compile statistics and report them via tensorboard and wandb.	2022-01-06 12:38:20 -07:00
James Betker	f3cab45658	Revise audio datasets to include interesting statistics in batch Stats include: - How many indices were skipped to retrieve a given index - Whether or not a conditioning input was actually the file itself	2022-01-06 11:15:16 -07:00
James Betker	06c1093090	Remove collating from paired_voice_audio_dataset This will now be done at the model level, which is more efficient	2022-01-06 10:29:39 -07:00
James Betker	e7a705fe6e	Make gpt_asr_hf2 more efficient at inference	2022-01-06 10:27:10 -07:00
James Betker	5e1d1da2e9	Clean paired_voice	2022-01-06 10:26:53 -07:00
James Betker	525addffab	Unified: automatically clip inputs according to specified max length to improve inference time	2022-01-06 10:13:45 -07:00
James Betker	61cd351b71	update unified	2022-01-06 09:48:11 -07:00
James Betker	10fd1110be	Fix (?) use_gpt_tts for unified_voice	2022-01-05 20:09:31 -07:00
James Betker	3c4301f085	Remove dvae_arch_playground	2022-01-05 17:06:45 -07:00
James Betker	a63a17e48f	Remove deepspeech models	2022-01-05 17:05:13 -07:00
James Betker	c584ba05ee	unified_voice improvements - Rename max_symbols_per_phrase to max_text_tokens - Remove max_total_tokens (no longer necessary) - Fix integration with MelEncoder	2022-01-05 17:03:53 -07:00
James Betker	50d267ab1a	misc	2022-01-05 17:01:22 -07:00
James Betker	0fe34f57d1	Use torch resampler	2022-01-05 15:47:22 -07:00
James Betker	38aba6f88d	Another dumdum fix	2022-01-04 15:18:25 -07:00
James Betker	963c6072bb	Add mel_encoder and solo embeddings to unified_voice	2022-01-04 15:15:58 -07:00
James Betker	2165124f19	Add GPT documentation	2022-01-01 21:00:07 -07:00
James Betker	2635412291	doh	2022-01-01 14:29:59 -07:00

1 2 3 4 5 ...

1598 Commits