vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	7a0956863d	oops	2025-03-31 21:11:43 -05:00
mrq	a1184586ef	should never have trusted mse_loss, it never works	2025-03-31 20:59:13 -05:00
mrq	478aea0e8c	tweaks	2025-03-28 19:49:54 -05:00
mrq	6ae282e090	re-added noise dataloader sampler whatever for the old implementation's other tasks that require it	2025-03-28 15:07:06 -05:00
mrq	90b3509404	I'll just cope and say I cannot apply segmented attention masks to the smaller model as it's too trained on not doing it, and the regression came from dumb python aliasing rules	2025-03-27 13:27:51 -05:00
mrq	09e9438941	ugh	2025-03-25 23:24:01 -05:00
mrq	8641c87611	nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression)	2025-03-25 23:06:16 -05:00
mrq	d1d91295b3	add segmented sliding attention, also found a bug with prom-less segments in the attention mask generation.........	2025-03-21 19:05:49 -05:00
mrq	589cfb0e18	yuge speedup because of a dumb oversight	2025-03-20 17:39:41 -05:00
mrq	8068f24e35	cleaned up parallel nar, i think it's slightly faster but even the smallest model is still slower than ar+nar-len-llama-8...	2025-03-20 15:56:15 -05:00
mrq	9a7458cf17	fixed inferencing since I did delete the len_emb, some more notes on the model since it seems I just had bad experimental settings	2025-03-19 22:41:48 -05:00
mrq	61de653ad9	now causal training should work again	2025-03-19 14:20:19 -05:00
mrq	85b9dd47c1	ugh	2025-03-19 13:31:50 -05:00
mrq	5479d2eacc	more tweaks to the new implementation (properly trim the len stuff to save some params, decoder to d_ffn expansion to 2 to maybe also make it faster, etc.)	2025-03-18 19:34:37 -05:00
mrq	0280e72257	ugh	2025-03-17 21:49:45 -05:00
mrq	b0dba9db07	this may bite me in the ass	2025-03-17 21:46:50 -05:00
mrq	9cfbf94b1c	config-ify the len_loss_factor	2025-03-14 20:30:48 -05:00
mrq	ca8cc15271	more tweaks (vall_e.webui --yaml still breaks things, --model needs to deduce what audio backend now that im supporting other ones again // added easy top-sampler settings back for new implementation)	2025-03-14 20:18:25 -05:00
mrq	ba5f3d19b4	use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes)	2025-03-12 22:47:19 -05:00
mrq	2ccf1b5740	actually do duration prediction	2025-03-11 22:14:54 -05:00
mrq	5c512717a6	len prediction for new model (and remove logit normalization since it kills inferencing)	2025-03-11 20:33:09 -05:00
mrq	5f98543d4d	ughh	2025-03-10 21:18:57 -05:00
mrq	8ac03aac8a	ugh	2025-03-10 21:14:56 -05:00
mrq	5670fcb23f	hopefully the final tweaks needed for this bastard of a model	2025-03-10 20:59:11 -05:00
mrq	00d1fed217	another optimization (within the dataloader because the similar utterance sampler was mondo slow)	2025-03-08 17:10:50 -06:00
mrq	5e9d1a5302	one more time one more time (this normalization isn't a spook)	2025-03-07 19:32:42 -06:00
mrq	93044829af	one more time (could have sworn i tested it with batch size > 1)	2025-03-07 19:14:33 -06:00
mrq	6cea840710	oops	2025-03-07 18:57:25 -06:00
mrq	dbd34b6430	add specialized calc_loss because schizo	2025-03-07 18:44:11 -06:00
mrq	8d848ed549	handle case of dropping cond for segment mask	2025-03-07 14:11:58 -06:00
mrq	6afc2b7526	gut feeling to change the attention mask	2025-03-07 13:51:59 -06:00
mrq	ec87308d75	final tweaks before training this meme 44khz model for the 3rd time	2025-03-06 15:31:15 -06:00
mrq	5cd71ef238	QoL so I can stop having to manually inject different configs	2025-03-06 14:48:14 -06:00
mrq	0d809561c6	accuracy k=1 and k=80 because im probably dumb for k=10 as the default since it does not represent any usecase	2025-03-05 16:35:34 -06:00
mrq	2fb2b732fc	wow that was fast	2025-03-04 23:17:18 -06:00
mrq	0451f75e33	now that the new model seems a little more promising, i can re-document things non-cynically	2025-03-03 13:21:41 -06:00
mrq	3f1070f575	tweaks	2025-03-02 22:36:25 -06:00
mrq	17094b8002	reticulating splines	2025-03-01 17:48:51 -06:00
mrq	b97faa8173	fixes...	2025-02-28 18:53:07 -06:00
mrq	4e7d885542	lol	2025-02-28 18:06:41 -06:00
mrq	a174c33db6	a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow)	2025-02-28 17:56:50 -06:00
mrq	09d82a26fe	ugh	2025-02-28 01:06:38 -06:00
mrq	93feb5660f	do not like that	2025-02-27 23:59:56 -06:00
mrq	f4f435d7f5	when you already had these ideas to stabilize training but you just ignored them	2025-02-27 23:39:20 -06:00
mrq	0a45c9c042	fix attention backend not being used	2025-02-27 21:38:38 -06:00
mrq	b8e9f3d785	maybe this will work	2025-02-27 20:42:12 -06:00
mrq	01e96bafc9	ugh	2025-02-27 19:05:32 -06:00
mrq	ceecac6ffe	I think I made resp_parallel_training=True faster with loss factoring?	2025-02-26 23:13:32 -06:00
mrq	cbd4d7d7f4	ugh	2025-02-26 21:31:10 -06:00
mrq	2ea387c08a	segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working)	2025-02-26 21:26:13 -06:00

50 Commits