vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	d6cd848c32	goodbye nvidia/audio-codec-44khz, crossed fingers for DAC again	2025-04-06 21:05:29 -05:00
mrq	1e22519d94	diagnosed both hf/llama.cpp versions to probably just being a faulty export method (to-do: migrate vall_e.models.base to vall_e.export --hf)	2025-04-05 22:05:39 -05:00
mrq	c34763769a	ugh	2025-04-05 18:58:25 -05:00
mrq	b6692ce3de	ugh	2025-04-05 18:20:46 -05:00
mrq	4a909ceff8	temp fix for vall_e.cpp demask scoring regression	2025-04-05 11:04:26 -05:00
mrq	44260f7445	tweaks	2025-04-05 10:27:07 -05:00
mrq	0ede3bfc12	updated vall_e.cpp, but i could have sworn it worked much better than this......	2025-04-05 01:22:51 -05:00
mrq	28d39ef962	should not be working late	2025-04-03 23:32:58 -05:00
mrq	bfe70e9d56	ugh	2025-04-03 23:26:00 -05:00
mrq	2e93438867	reintroduced sampler_type = speaker because I think this might salvage the nemo model to have better speaker similarities	2025-04-03 19:01:10 -05:00
mrq	caad99ab78	fix for bsz>1 because I forgot the old implementation implicitly handles this	2025-04-02 17:17:37 -05:00
mrq	068dbdb785	ugh	2025-04-02 17:05:16 -05:00
mrq	0e995dbf2c	is this my last cope (falling back to explicit duration prediction, as this regression just won't go away) (also the smaller model was lobotomized because of my ROCm setup having a botched SDPA for who knows why)	2025-04-02 17:01:24 -05:00
mrq	7a0956863d	oops	2025-03-31 21:11:43 -05:00
mrq	a1184586ef	should never have trusted mse_loss, it never works	2025-03-31 20:59:13 -05:00
mrq	99f251c768	slight tweaks to condition-less NS/SR	2025-03-30 10:37:40 -05:00
mrq	478aea0e8c	tweaks	2025-03-28 19:49:54 -05:00
mrq	6ae282e090	re-added noise dataloader sampler whatever for the old implementation's other tasks that require it	2025-03-28 15:07:06 -05:00
mrq	90b3509404	I'll just cope and say I cannot apply segmented attention masks to the smaller model as it's too trained on not doing it, and the regression came from dumb python aliasing rules	2025-03-27 13:27:51 -05:00
mrq	2fd82a7a22	cannot get segmented mask to actually work without gradients exploding (need to find a different way to do duration prediction...)	2025-03-27 00:51:41 -05:00
mrq	4d777b5618	add remark that segmented attention actually might be broken (for some reason this only emerged recently, need to investigate)	2025-03-26 12:08:47 -05:00
mrq	09e9438941	ugh	2025-03-25 23:24:01 -05:00
mrq	8641c87611	nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression)	2025-03-25 23:06:16 -05:00
mrq	aa8b32d97e	added more notes (although I could have sworn I have had more notes that i can't recall)	2025-03-25 18:53:06 -05:00
mrq	df5b870908	added remark about not using sliding attention	2025-03-22 12:44:34 -05:00
mrq	02a8bcbe29	fixed errant index error (although it makes me wonder if my segmented masking is still flawed)	2025-03-21 23:41:34 -05:00
mrq	d1d91295b3	add segmented sliding attention, also found a bug with prom-less segments in the attention mask generation.........	2025-03-21 19:05:49 -05:00
mrq	589cfb0e18	yuge speedup because of a dumb oversight	2025-03-20 17:39:41 -05:00
mrq	8068f24e35	cleaned up parallel nar, i think it's slightly faster but even the smallest model is still slower than ar+nar-len-llama-8...	2025-03-20 15:56:15 -05:00
mrq	9a7458cf17	fixed inferencing since I did delete the len_emb, some more notes on the model since it seems I just had bad experimental settings	2025-03-19 22:41:48 -05:00
mrq	61de653ad9	now causal training should work again	2025-03-19 14:20:19 -05:00
mrq	85b9dd47c1	ugh	2025-03-19 13:31:50 -05:00
mrq	81acd565b3	re-enable these	2025-03-18 20:59:33 -05:00
mrq	5479d2eacc	more tweaks to the new implementation (properly trim the len stuff to save some params, decoder to d_ffn expansion to 2 to maybe also make it faster, etc.)	2025-03-18 19:34:37 -05:00
mrq	9a8a8e3195	off by one bateman	2025-03-18 08:40:43 -05:00
mrq	0280e72257	ugh	2025-03-17 21:49:45 -05:00
mrq	b0dba9db07	this may bite me in the ass	2025-03-17 21:46:50 -05:00
mrq	2dfef693c4	comments for clarity	2025-03-16 11:30:23 -05:00
mrq	c5475ebc91	another dataloader optimization	2025-03-15 20:18:58 -05:00
mrq	bee2688dea	ugh	2025-03-15 16:50:21 -05:00
mrq	2053580838	updated dataloader to hopefully reduce RAM usage	2025-03-15 13:14:37 -05:00
mrq	9cfbf94b1c	config-ify the len_loss_factor	2025-03-14 20:30:48 -05:00
mrq	ca8cc15271	more tweaks (vall_e.webui --yaml still breaks things, --model needs to deduce what audio backend now that im supporting other ones again // added easy top-sampler settings back for new implementation)	2025-03-14 20:18:25 -05:00
mrq	6ee505cffd	fixed dac	2025-03-12 23:17:27 -05:00
mrq	ba5f3d19b4	use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes)	2025-03-12 22:47:19 -05:00
mrq	2ccf1b5740	actually do duration prediction	2025-03-11 22:14:54 -05:00
mrq	5c512717a6	len prediction for new model (and remove logit normalization since it kills inferencing)	2025-03-11 20:33:09 -05:00
mrq	5f98543d4d	ughh	2025-03-10 21:18:57 -05:00
mrq	8ac03aac8a	ugh	2025-03-10 21:14:56 -05:00
mrq	5670fcb23f	hopefully the final tweaks needed for this bastard of a model	2025-03-10 20:59:11 -05:00

1 2 3 4 5 ...

859 Commits