vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	814146a5e0	more settings bloat because there seems to be instability with the encoder as-is	2025-04-12 12:53:44 -05:00
mrq	f144389920	the culprit was initializing the level_weights for killing newly trained models.............	2025-04-10 23:06:16 -05:00
mrq	bfe70e9d56	ugh	2025-04-03 23:26:00 -05:00
mrq	0e995dbf2c	is this my last cope (falling back to explicit duration prediction, as this regression just won't go away) (also the smaller model was lobotomized because of my ROCm setup having a botched SDPA for who knows why)	2025-04-02 17:01:24 -05:00
mrq	a1184586ef	should never have trusted mse_loss, it never works	2025-03-31 20:59:13 -05:00
mrq	6ae282e090	re-added noise dataloader sampler whatever for the old implementation's other tasks that require it	2025-03-28 15:07:06 -05:00
mrq	2fd82a7a22	cannot get segmented mask to actually work without gradients exploding (need to find a different way to do duration prediction...)	2025-03-27 00:51:41 -05:00
mrq	8641c87611	nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression)	2025-03-25 23:06:16 -05:00
mrq	d1d91295b3	add segmented sliding attention, also found a bug with prom-less segments in the attention mask generation.........	2025-03-21 19:05:49 -05:00
mrq	5479d2eacc	more tweaks to the new implementation (properly trim the len stuff to save some params, decoder to d_ffn expansion to 2 to maybe also make it faster, etc.)	2025-03-18 19:34:37 -05:00
mrq	b0dba9db07	this may bite me in the ass	2025-03-17 21:46:50 -05:00
mrq	2053580838	updated dataloader to hopefully reduce RAM usage	2025-03-15 13:14:37 -05:00
mrq	9cfbf94b1c	config-ify the len_loss_factor	2025-03-14 20:30:48 -05:00
mrq	ca8cc15271	more tweaks (vall_e.webui --yaml still breaks things, --model needs to deduce what audio backend now that im supporting other ones again // added easy top-sampler settings back for new implementation)	2025-03-14 20:18:25 -05:00
mrq	6ee505cffd	fixed dac	2025-03-12 23:17:27 -05:00
mrq	2ccf1b5740	actually do duration prediction	2025-03-11 22:14:54 -05:00
mrq	5c512717a6	len prediction for new model (and remove logit normalization since it kills inferencing)	2025-03-11 20:33:09 -05:00
mrq	5670fcb23f	hopefully the final tweaks needed for this bastard of a model	2025-03-10 20:59:11 -05:00
mrq	6cea840710	oops	2025-03-07 18:57:25 -06:00
mrq	dbd34b6430	add specialized calc_loss because schizo	2025-03-07 18:44:11 -06:00
mrq	8d848ed549	handle case of dropping cond for segment mask	2025-03-07 14:11:58 -06:00
mrq	6afc2b7526	gut feeling to change the attention mask	2025-03-07 13:51:59 -06:00
mrq	2dd80a03ff	stuff for interfacing with the loss scaler value (because I want to cap it)	2025-03-06 17:07:29 -06:00
mrq	5cd71ef238	QoL so I can stop having to manually inject different configs	2025-03-06 14:48:14 -06:00
mrq	1d3290b023	could have sworn this worked before, might have broke it when i decoupled from omegaconf	2025-03-01 19:30:26 -06:00
mrq	ddc49c89c5	the learning rate scheduler pill is a tough pill to swallow	2025-02-28 22:12:19 -06:00
mrq	a174c33db6	a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow)	2025-02-28 17:56:50 -06:00
mrq	f4f435d7f5	when you already had these ideas to stabilize training but you just ignored them	2025-02-27 23:39:20 -06:00
mrq	2ea387c08a	segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working)	2025-02-26 21:26:13 -06:00
mrq	8f5a3997bd	another experimental flag	2025-02-24 13:50:41 -06:00
mrq	ab0abd2b12	fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......)	2025-02-22 09:07:33 -06:00
mrq	a65c8144f4	with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already......	2025-02-13 18:38:40 -06:00
mrq	e8f182b634	cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors)	2025-02-13 09:35:27 -06:00
mrq	04fef5dad5	agony	2025-02-12 00:18:24 -06:00
mrq	e5916ea519	for my sanity it seems having extraneous tokens in the embedding/classifier has the loss/acc a little higher than it should	2025-02-11 14:47:35 -06:00
mrq	7592befc53	updated vall_e.emb.process to allow for batched processing, some typo fixes (it's painfully slow on my 7900XTX...)	2025-02-05 21:13:20 -06:00
mrq	79c504c278	cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec)	2025-02-05 20:54:31 -06:00
mrq	bb2ebe1ca2	fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies	2025-02-04 20:30:07 -06:00
mrq	b445f4abb6	experimental	2025-01-05 19:05:00 -06:00
mrq	2e6a7625e4	experimental	2025-01-05 12:47:03 -06:00
mrq	91caf00212	ugh	2024-12-20 17:13:37 -06:00
mrq	53230efd74	changed prompt_inject_noise to prompt_inject_noise_p so I can have another reason to do this post-training	2024-12-19 19:28:50 -06:00
mrq	09804ecc16	APOLLO tweaks to make it work with deepspeed	2024-12-13 23:03:52 -06:00
mrq	6468e5d124	lol	2024-12-11 19:10:32 -06:00
mrq	8568a93dad	added WER/SIM-O metrics, added APOLLO but I need to test it	2024-12-10 20:13:21 -06:00
mrq	5d80a2d0d4	fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing now	2024-12-07 19:21:05 -06:00
mrq	1f54bf5b40	revert sageattn back to optional dependency because it's not on windows, force resize_modules on by default because I broke something	2024-12-07 17:09:39 -06:00
mrq	f97e8b0c7f	ACTUALLY do KD-loss because of an oversight with masked_select outputting 1D tensors that get softmax'd in total	2024-12-07 09:52:51 -06:00
mrq	34a66e1052	agnostified KD	2024-12-06 23:53:46 -06:00
mrq	42fafbaaca	actually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps)	2024-12-06 21:55:20 -06:00

1 2 3 4 5 ...

253 Commits