DL-Art-School

Author	SHA1	Message	Date
James Betker	34fbb78671	Straight CtcCodeGenerator as an encoder	2022-02-07 15:46:46 -07:00
James Betker	65a546c4d7	Fix for tts6	2022-02-05 16:00:14 -07:00
James Betker	5ae816bead	ctc gen checkin	2022-02-05 15:59:53 -07:00
James Betker	bb3d1ab03d	More cleanup	2022-02-04 11:06:17 -07:00
James Betker	5cc342de66	Clean up	2022-02-04 11:00:42 -07:00
James Betker	8fb147e8ab	add an autoregressive ctc code generator	2022-02-04 11:00:15 -07:00
James Betker	7f4fc55344	Update SR model	2022-02-03 21:42:53 -07:00
James Betker	bc506d4bcd	Mods to unet_diffusion_tts6 to support super resolution mode	2022-02-03 19:59:39 -07:00
James Betker	4249681c4b	Mods to support a autoregressive CTC code generator	2022-02-03 19:58:54 -07:00
James Betker	8132766d38	tts6	2022-01-31 20:15:06 -07:00
James Betker	fbea6e8eac	Adjustments to diffusion networks	2022-01-30 16:14:06 -07:00
James Betker	e58dab14c3	new diffusion updates from testing	2022-01-29 11:01:01 -07:00
James Betker	935a4e853e	get rid of nil tokens in <2>	2022-01-27 22:45:57 -07:00
James Betker	a77d376ad2	rename unet diffusion tts and add 3	2022-01-27 19:56:24 -07:00
James Betker	8c255811ad	more fixes	2022-01-25 17:57:16 -07:00
James Betker	0f3ca28e39	Allow diffusion model to be trained with masking tokens	2022-01-25 14:26:21 -07:00
James Betker	d18aec793a	Revert "(re) attempt diffusion checkpointing logic" This reverts commit `b22eec8fe3`.	2022-01-22 09:14:50 -07:00
James Betker	b22eec8fe3	(re) attempt diffusion checkpointing logic	2022-01-22 08:34:40 -07:00
James Betker	8f48848f91	misc	2022-01-22 08:23:29 -07:00
James Betker	851070075a	text<->cond clip I need that universal clip..	2022-01-22 08:23:14 -07:00
James Betker	8e2439f50d	Decrease resolution requirements to 2048	2022-01-20 11:27:49 -07:00
James Betker	4af8525dc3	Adjust diffusion vocoder to allow training individual levels	2022-01-19 13:37:59 -07:00
James Betker	ac13bfefe8	use_diffuse_tts	2022-01-19 00:35:24 -07:00
James Betker	bcd8cc51e1	Enable collated data for diffusion purposes	2022-01-19 00:35:08 -07:00
James Betker	dc9cd8c206	Update use_gpt_tts to be usable with unified_voice2	2022-01-18 21:14:17 -07:00
James Betker	7b4544b83a	Add an experimental unet_diffusion_tts to perform experiments on	2022-01-18 08:38:24 -07:00
James Betker	37e4e737b5	a few fixes	2022-01-16 15:17:17 -07:00
James Betker	9100e7fa9b	Add a diffusion network that takes aligned text instead of MELs	2022-01-15 17:28:02 -07:00
James Betker	009a1e8404	Add a new diffusion_vocoder that should be trainable faster This new one has a "cheating" top layer, that does not feed down into the unet encoder, but does consume the outputs of the unet. This cheater only operates on half of the input, while the rest of the unet operates on the full input. This limits the dimensionality of this last layer, on the assumption that these last layers consume by far the most computation and memory, but do not require the full input context. Losses are only computed on half of the aggregate input.	2022-01-11 17:26:07 -07:00
James Betker	91f28580e2	fix unified_voice	2022-01-10 16:17:31 -07:00
James Betker	136744dc1d	Fixes	2022-01-10 14:32:04 -07:00
James Betker	ee3dfac2ae	unified_voice2: decouple positional embeddings and token embeddings from underlying gpt model	2022-01-10 08:14:41 -07:00
James Betker	f503d8d96b	Partially implement performers in transformer_builders	2022-01-09 22:35:03 -07:00
James Betker	ec456b6733	Revert unified_voice back to beginning I'll be doing my work within unified_voice2	2022-01-09 22:34:30 -07:00
James Betker	f474a7ac65	unified_voice2	2022-01-09 22:32:34 -07:00
James Betker	70b17da193	Alter unified_voice to use extensible transformer (still WIP)	2022-01-08 22:18:25 -07:00
James Betker	15d9517e26	Allow bi-directional clipping	2022-01-08 22:18:04 -07:00
James Betker	438dd9ed33	fix text-voice-clip bug	2022-01-08 08:55:00 -07:00
James Betker	34774f9948	unified_voice: begin decoupling from HF GPT I'd like to try some different (newer) transformer variants. The way to get there is softly decoupling the transformer portion of this architecture from GPT. This actually should be fairly easy.	2022-01-07 22:51:24 -07:00
James Betker	68090ac3e9	Finish up the text->voice clip model	2022-01-07 22:28:45 -07:00
James Betker	65ffe38fce	misc	2022-01-06 22:16:17 -07:00
James Betker	e7a705fe6e	Make gpt_asr_hf2 more efficient at inference	2022-01-06 10:27:10 -07:00
James Betker	525addffab	Unified: automatically clip inputs according to specified max length to improve inference time	2022-01-06 10:13:45 -07:00
James Betker	61cd351b71	update unified	2022-01-06 09:48:11 -07:00
James Betker	10fd1110be	Fix (?) use_gpt_tts for unified_voice	2022-01-05 20:09:31 -07:00
James Betker	3c4301f085	Remove dvae_arch_playground	2022-01-05 17:06:45 -07:00
James Betker	c584ba05ee	unified_voice improvements - Rename max_symbols_per_phrase to max_text_tokens - Remove max_total_tokens (no longer necessary) - Fix integration with MelEncoder	2022-01-05 17:03:53 -07:00
James Betker	38aba6f88d	Another dumdum fix	2022-01-04 15:18:25 -07:00
James Betker	963c6072bb	Add mel_encoder and solo embeddings to unified_voice	2022-01-04 15:15:58 -07:00
James Betker	2165124f19	Add GPT documentation	2022-01-01 21:00:07 -07:00

1 2 3 4 5

205 Commits