vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	7a0956863d	oops	2025-03-31 21:11:43 -05:00
mrq	a1184586ef	should never have trusted mse_loss, it never works	2025-03-31 20:59:13 -05:00
mrq	99f251c768	slight tweaks to condition-less NS/SR	2025-03-30 10:37:40 -05:00
mrq	478aea0e8c	tweaks	2025-03-28 19:49:54 -05:00
mrq	6ae282e090	re-added noise dataloader sampler whatever for the old implementation's other tasks that require it	2025-03-28 15:07:06 -05:00
mrq	90b3509404	I'll just cope and say I cannot apply segmented attention masks to the smaller model as it's too trained on not doing it, and the regression came from dumb python aliasing rules	2025-03-27 13:27:51 -05:00
mrq	2fd82a7a22	cannot get segmented mask to actually work without gradients exploding (need to find a different way to do duration prediction...)	2025-03-27 00:51:41 -05:00
mrq	4d777b5618	add remark that segmented attention actually might be broken (for some reason this only emerged recently, need to investigate)	2025-03-26 12:08:47 -05:00
mrq	09e9438941	ugh	2025-03-25 23:24:01 -05:00
mrq	8641c87611	nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression)	2025-03-25 23:06:16 -05:00
mrq	aa8b32d97e	added more notes (although I could have sworn I have had more notes that i can't recall)	2025-03-25 18:53:06 -05:00
mrq	df5b870908	added remark about not using sliding attention	2025-03-22 12:44:34 -05:00
mrq	02a8bcbe29	fixed errant index error (although it makes me wonder if my segmented masking is still flawed)	2025-03-21 23:41:34 -05:00
mrq	d1d91295b3	add segmented sliding attention, also found a bug with prom-less segments in the attention mask generation.........	2025-03-21 19:05:49 -05:00
mrq	589cfb0e18	yuge speedup because of a dumb oversight	2025-03-20 17:39:41 -05:00
mrq	8068f24e35	cleaned up parallel nar, i think it's slightly faster but even the smallest model is still slower than ar+nar-len-llama-8...	2025-03-20 15:56:15 -05:00
mrq	9a7458cf17	fixed inferencing since I did delete the len_emb, some more notes on the model since it seems I just had bad experimental settings	2025-03-19 22:41:48 -05:00
mrq	61de653ad9	now causal training should work again	2025-03-19 14:20:19 -05:00
mrq	85b9dd47c1	ugh	2025-03-19 13:31:50 -05:00
mrq	81acd565b3	re-enable these	2025-03-18 20:59:33 -05:00
mrq	5479d2eacc	more tweaks to the new implementation (properly trim the len stuff to save some params, decoder to d_ffn expansion to 2 to maybe also make it faster, etc.)	2025-03-18 19:34:37 -05:00
mrq	9a8a8e3195	off by one bateman	2025-03-18 08:40:43 -05:00
mrq	0280e72257	ugh	2025-03-17 21:49:45 -05:00
mrq	b0dba9db07	this may bite me in the ass	2025-03-17 21:46:50 -05:00
mrq	2dfef693c4	comments for clarity	2025-03-16 11:30:23 -05:00
mrq	c5475ebc91	another dataloader optimization	2025-03-15 20:18:58 -05:00
mrq	bee2688dea	ugh	2025-03-15 16:50:21 -05:00
mrq	2053580838	updated dataloader to hopefully reduce RAM usage	2025-03-15 13:14:37 -05:00
mrq	9cfbf94b1c	config-ify the len_loss_factor	2025-03-14 20:30:48 -05:00
mrq	ca8cc15271	more tweaks (vall_e.webui --yaml still breaks things, --model needs to deduce what audio backend now that im supporting other ones again // added easy top-sampler settings back for new implementation)	2025-03-14 20:18:25 -05:00
mrq	6ee505cffd	fixed dac	2025-03-12 23:17:27 -05:00
mrq	ba5f3d19b4	use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes)	2025-03-12 22:47:19 -05:00
mrq	2ccf1b5740	actually do duration prediction	2025-03-11 22:14:54 -05:00
mrq	5c512717a6	len prediction for new model (and remove logit normalization since it kills inferencing)	2025-03-11 20:33:09 -05:00
mrq	5f98543d4d	ughh	2025-03-10 21:18:57 -05:00
mrq	8ac03aac8a	ugh	2025-03-10 21:14:56 -05:00
mrq	5670fcb23f	hopefully the final tweaks needed for this bastard of a model	2025-03-10 20:59:11 -05:00
mrq	00d1fed217	another optimization (within the dataloader because the similar utterance sampler was mondo slow)	2025-03-08 17:10:50 -06:00
mrq	5e9d1a5302	one more time one more time (this normalization isn't a spook)	2025-03-07 19:32:42 -06:00
mrq	93044829af	one more time (could have sworn i tested it with batch size > 1)	2025-03-07 19:14:33 -06:00
mrq	6cea840710	oops	2025-03-07 18:57:25 -06:00
mrq	dbd34b6430	add specialized calc_loss because schizo	2025-03-07 18:44:11 -06:00
mrq	8d848ed549	handle case of dropping cond for segment mask	2025-03-07 14:11:58 -06:00
mrq	89e52b9877	ugh	2025-03-07 13:55:57 -06:00
mrq	6afc2b7526	gut feeling to change the attention mask	2025-03-07 13:51:59 -06:00
mrq	91ede71cf0	ugh	2025-03-06 17:19:27 -06:00
mrq	2dd80a03ff	stuff for interfacing with the loss scaler value (because I want to cap it)	2025-03-06 17:07:29 -06:00
mrq	a30dffcca7	wandb additions (to-do eventually, upload samples as artifacts)	2025-03-06 15:44:40 -06:00
mrq	ec87308d75	final tweaks before training this meme 44khz model for the 3rd time	2025-03-06 15:31:15 -06:00
mrq	5cd71ef238	QoL so I can stop having to manually inject different configs	2025-03-06 14:48:14 -06:00

1 2 3 4 5 ...

846 Commits