vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	b2b243e7e7	addresses #9	2025-05-05 13:03:44 -05:00
mrq	d9e18037cc	new implementation tweaks and fixes to make it actually better (there were a lot of badwrong things being done that harmed the output quality, will evaluate the model further)	2025-04-18 20:36:44 -05:00
mrq	f144389920	the culprit was initializing the level_weights for killing newly trained models.............	2025-04-10 23:06:16 -05:00
mrq	44260f7445	tweaks	2025-04-05 10:27:07 -05:00
mrq	0e995dbf2c	is this my last cope (falling back to explicit duration prediction, as this regression just won't go away) (also the smaller model was lobotomized because of my ROCm setup having a botched SDPA for who knows why)	2025-04-02 17:01:24 -05:00
mrq	a1184586ef	should never have trusted mse_loss, it never works	2025-03-31 20:59:13 -05:00
mrq	2fd82a7a22	cannot get segmented mask to actually work without gradients exploding (need to find a different way to do duration prediction...)	2025-03-27 00:51:41 -05:00
mrq	8641c87611	nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression)	2025-03-25 23:06:16 -05:00
mrq	8068f24e35	cleaned up parallel nar, i think it's slightly faster but even the smallest model is still slower than ar+nar-len-llama-8...	2025-03-20 15:56:15 -05:00
mrq	81acd565b3	re-enable these	2025-03-18 20:59:33 -05:00
mrq	b0dba9db07	this may bite me in the ass	2025-03-17 21:46:50 -05:00
mrq	ca8cc15271	more tweaks (vall_e.webui --yaml still breaks things, --model needs to deduce what audio backend now that im supporting other ones again // added easy top-sampler settings back for new implementation)	2025-03-14 20:18:25 -05:00
mrq	6ee505cffd	fixed dac	2025-03-12 23:17:27 -05:00
mrq	ba5f3d19b4	use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes)	2025-03-12 22:47:19 -05:00
mrq	5c512717a6	len prediction for new model (and remove logit normalization since it kills inferencing)	2025-03-11 20:33:09 -05:00
mrq	8ac03aac8a	ugh	2025-03-10 21:14:56 -05:00
mrq	93044829af	one more time (could have sworn i tested it with batch size > 1)	2025-03-07 19:14:33 -06:00
mrq	5cd71ef238	QoL so I can stop having to manually inject different configs	2025-03-06 14:48:14 -06:00
mrq	1cd24f3381	a birdie tells me i should probably use a different optimizer (also preliminary support for native sparse attention but I don't know if I'll use it)	2025-03-04 14:53:02 -06:00
mrq	ddc49c89c5	the learning rate scheduler pill is a tough pill to swallow	2025-02-28 22:12:19 -06:00
mrq	a174c33db6	a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow)	2025-02-28 17:56:50 -06:00
mrq	93feb5660f	do not like that	2025-02-27 23:59:56 -06:00
mrq	b8e9f3d785	maybe this will work	2025-02-27 20:42:12 -06:00
mrq	2ea387c08a	segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working)	2025-02-26 21:26:13 -06:00

24 Commits