vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	3019c88799	separate mask token and stop token because this might cause issues	2025-02-23 11:36:32 -06:00
mrq	6634d07576	added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed	2025-02-23 11:22:13 -06:00
mrq	67a6009555	(finally) added parallel AR for cfg.model.version >= 7 (nvidia/audio-codec-44khz is being a pain and it might require training purely AR first......)	2025-02-23 08:31:03 -06:00
mrq	ab0abd2b12	fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......)	2025-02-22 09:07:33 -06:00
mrq	13c3a08853	nevermind thats slow	2025-02-14 16:35:17 -06:00
mrq	285e493b12	ugh..........	2025-02-14 16:24:34 -06:00
mrq	a65c8144f4	with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already......	2025-02-13 18:38:40 -06:00
mrq	e3becec0e8	more better-er loss calc I suppose	2025-02-13 12:49:53 -06:00
mrq	e8f182b634	cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors)	2025-02-13 09:35:27 -06:00
mrq	319ca09a4f	cleanup	2025-02-12 23:36:32 -06:00
mrq	b52c5c5d80	this seems to work in testing	2025-02-12 16:16:04 -06:00
mrq	e029a8804d	ironically none of this cruft gets the loss lower than the original way	2025-02-12 11:17:00 -06:00
mrq	4b31f5c808	this seems preferable	2025-02-12 00:36:50 -06:00
mrq	04fef5dad5	agony	2025-02-12 00:18:24 -06:00
mrq	075ffef68a	ugh	2025-02-09 13:02:51 -06:00
mrq	47eb498046	more tweaks	2025-02-06 23:26:26 -06:00
mrq	79c504c278	cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec)	2025-02-05 20:54:31 -06:00
mrq	bb2ebe1ca2	fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies	2025-02-04 20:30:07 -06:00
mrq	0841f366e8	I should really just grab modelling_llama wholesale (fix for the adapted attention class)	2025-01-28 21:55:05 -06:00
mrq	e5f9da2221	oops	2025-01-21 11:59:24 -06:00
mrq	69c1d2991f	updated mixtral backend (need this for something else)	2025-01-20 21:50:56 -06:00
mrq	1a26f789a5	added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually work	2025-01-12 21:52:49 -06:00
mrq	3ab11bdc7b	oops	2025-01-05 23:53:17 -06:00
mrq	b445f4abb6	experimental	2025-01-05 19:05:00 -06:00
mrq	2e6a7625e4	experimental	2025-01-05 12:47:03 -06:00
mrq	9b0d2ccbe1		2024-12-26 21:42:17 -06:00
mrq	59f56ad099	cleaup	2024-12-24 23:14:32 -06:00
mrq	82e8592f2a	working vall_e.cpp	2024-12-24 17:54:48 -06:00
mrq	497bdfc67b	more work (the wall is non-causal decoding......)	2024-12-22 20:11:31 -06:00
mrq	5f289db275	ugh	2024-12-22 16:15:24 -06:00
mrq	0d4329d2e3	sanity cleanup	2024-12-22 15:05:45 -06:00
mrq	353e478e68	agony	2024-12-21 22:52:10 -06:00
mrq	91caf00212	ugh	2024-12-20 17:13:37 -06:00
mrq	59bf6b8b33	exposed additional task (ns, sr, vc) (vc is experimental)	2024-12-20 11:15:29 -06:00
mrq	e7e7f48043	livid	2024-12-19 19:25:27 -06:00
mrq	c2c6d912ac	actually do speaker verification	2024-12-17 10:11:14 -06:00
mrq	c2e17e287b	really shoddy voice conversion implementation (it sort of works...)	2024-12-16 22:54:53 -06:00
mrq	8515038968	imagine my disappointment when the epoch finished just for it to throw an exception	2024-12-16 18:28:01 -06:00
mrq	4a65ac9eb7	oops	2024-12-15 17:21:51 -06:00
mrq	9a62e3b824	APOLLO cringe (doesn't want to work with deepspeed)	2024-12-12 00:31:58 -06:00
mrq	cddf8ca814	sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them)	2024-12-11 22:45:38 -06:00
mrq	6468e5d124	lol	2024-12-11 19:10:32 -06:00
mrq	3ef8894290	oops	2024-12-08 15:24:21 -06:00
mrq	1d460b9fe3	logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago)	2024-12-08 14:52:47 -06:00
mrq	5d80a2d0d4	fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing now	2024-12-07 19:21:05 -06:00
mrq	61ed662856	ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode)	2024-12-07 12:31:54 -06:00
mrq	34a66e1052	agnostified KD	2024-12-06 23:53:46 -06:00
mrq	953d3eb030	ugh	2024-12-06 22:35:30 -06:00
mrq	42fafbaaca	actually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps)	2024-12-06 21:55:20 -06:00
mrq	23d402bf01	added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......)	2024-12-05 23:05:52 -06:00
mrq	93d27be539	rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting)	2024-12-04 20:31:44 -06:00
mrq	9dff68c0c5	NAR-len tweaks (remasks a small amount of tokens per step, it seems to help with reducing the number of steps needed some of the time?, disable CFG for the first half to speed things up)	2024-12-04 09:30:29 -06:00
mrq	cf97560e70	minimum CFG of 3 for NAR-len because it seems the model will auto-default to NAR-len now	2024-12-03 19:40:05 -06:00
mrq	ca31da0a95	sageattn (forgot to bother with testing this the other day, seems ifne)	2024-12-03 15:14:57 -06:00
mrq	84a05acb6d	touch ups in docs	2024-12-02 19:10:42 -06:00
mrq	dcaf38b359	fixed training tqdm being stubborn	2024-11-23 09:45:23 -06:00
mrq	41d7c30ea5	added much cleaner non-causal mask generation	2024-11-22 19:43:32 -06:00
mrq	c99a74e834	actually generate a causal mask because it seems sometimes it does not actually generate one because it makes assumptions	2024-11-22 18:30:24 -06:00
mrq	ccee5fc11c	that was actually all pointless since sdpa always had an attention mask fed to it and does not need is_causal to implicitly generate one	2024-11-22 16:51:50 -06:00
mrq	4aa685e749	what has science done	2024-11-22 16:45:40 -06:00
mrq	147219a5e0	huge oversight in the attention masking......... (i realized I have not been providing a non-causal mask to non-causal tasks)	2024-11-22 13:44:43 -06:00
mrq	24d888c47c	temporarily dropping support for xformers because it's breaking when using an attention mask (which i dont remember commenting it out when being passed), default to not use wandb because it's being a pain when doing tests and not actual sessionsS)	2024-11-22 11:29:12 -06:00
mrq	8aafae91fd	dont use timeembedding	2024-11-21 23:14:52 -06:00
mrq	2cef97e43f	cleanup	2024-11-21 23:08:43 -06:00
mrq	67f7bad168	added mixed modality AR+NAR-len to generate a short prefix through the AR, then inference with said prefix through the NAR-len (need to experiment with it more to ensure that the masked off tokens are the only tokens getting updated)	2024-11-20 14:22:12 -06:00
mrq	b1369e7824	better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len)	2024-11-19 18:51:17 -06:00
mrq	190a917b3e	I did it.	2024-11-19 12:24:33 -06:00
mrq	0e621354e7	cleaned up classifier-free guidance logit processing (in order to try and cope with a bad nar-len model)	2024-11-19 10:30:05 -06:00
mrq	5ba80686e1	two weeks of agony concludes	2024-11-18 21:29:28 -06:00
mrq	2b29790173	oops	2024-11-18 14:12:26 -06:00
mrq	6cfdf94bf9	swap priority to use nar-len if available, added notes	2024-11-18 09:40:04 -06:00
mrq	069b27570f	set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)	2024-11-17 17:04:07 -06:00
mrq	88d840218d	default set cfg strength to 3.0 since the reference model is updated	2024-11-17 10:23:40 -06:00
mrq	a3e1fa3518	ugh	2024-11-17 09:28:33 -06:00
mrq	23fdba0c98	tweaks and changes	2024-11-16 15:49:06 -06:00
mrq	2fbeacfe92	ugh	2024-11-14 22:18:33 -06:00
mrq	39096f8ff3	redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........)	2024-11-14 22:17:47 -06:00
mrq	e412e98125	ugh	2024-11-14 07:34:22 -06:00
mrq	c00fc18b62	actually use the right embedding for nar-len	2024-11-13 18:04:04 -06:00
mrq	3ea8a610d6	fix STT	2024-11-13 14:27:15 -06:00
mrq	910033343c	overhauled how the right resp level / classifier gets picked to avoid cringemath	2024-11-13 13:31:17 -06:00
mrq	269648605e	move NAR-len rvq level 0 to separate embedding	2024-11-13 11:38:58 -06:00
mrq	be83ddabaa	better causal-ness for split loss calc, and also do masking for NAR-len for it	2024-11-13 10:17:52 -06:00
mrq	6b76419123	ugh	2024-11-13 09:54:20 -06:00
mrq	ad7cfffc00	NAR-len RVQ-0 was being trained causally.............	2024-11-13 09:43:50 -06:00
mrq	8286aa54c8	do not pass timestep token/embedding since it doesn't seem to matter at all after all, fixed training masking rate to 80% because a paper said so	2024-11-13 09:07:10 -06:00
mrq	0f2584eba7	new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?)	2024-11-12 22:30:09 -06:00
mrq	663f07038d	haha... (do not create a token dropout/noise mask when not training (this sadly didnt fix NAR-len output))	2024-11-12 16:41:58 -06:00
mrq	b09328069e	actually do CFG sampling for base AR+NAR tasks	2024-11-12 13:42:39 -06:00
mrq	2495a7ef67	Fixed STT in the web UI	2024-11-12 12:49:53 -06:00
mrq	8927bad7bc	actually fixed rep pen (for ar and nar, it seems to help with nar unmasking)	2024-11-11 21:40:19 -06:00
mrq	b1f4db39c8	threw in CFG sampling for normal model as well to experiment with	2024-11-11 20:27:38 -06:00
mrq	2f56696506	overhauled inference/sampler kwargs to stop being a bloated mess	2024-11-11 20:21:16 -06:00
mrq	a748e223ce	tweaks	2024-11-11 12:40:41 -06:00
mrq	48490757da	fixes	2024-11-10 20:37:50 -06:00
mrq	9def34cd66	lol	2024-11-10 12:48:41 -06:00
mrq	9cb0b6901b	unified nar.py into ar_nar.py	2024-11-10 12:19:48 -06:00
mrq	a9d2faf2d7	all I can do now until I wait for the model to (re)train for pure NAR	2024-11-09 22:57:34 -06:00
mrq	ad7e290a5e	ugh (ROCm seems to silently clamp any token value >= logits.shape[-1] for loss calculation, while cuda will throw an assert, making it hard to find this dumb fuckup)	2024-11-09 19:40:02 -06:00
mrq	943fe70c10	I don't know why this fixes an assert thrown but it does	2024-11-09 19:04:13 -06:00

1 2 3 4 5 ...

449 Commits