vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	8641c87611	nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression)	2025-03-25 23:06:16 -05:00
mrq	a174c33db6	a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow)	2025-02-28 17:56:50 -06:00
mrq	2ea387c08a	segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working)	2025-02-26 21:26:13 -06:00
mrq	95da4e9405	made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split)	2025-02-26 10:39:13 -06:00
mrq	de27115bb7	there's something wrong with it on my 4xV100 rig......	2025-02-25 15:14:08 -06:00
mrq	db181f8e88	only do auto=equal for nemo as its an FSQ	2025-02-24 21:07:44 -06:00
mrq	8f5a3997bd	another experimental flag	2025-02-24 13:50:41 -06:00
mrq	b640fabab5	borrowed muon since it might better work under deepspeed and not require cruft (even though it really does not like the masked-NAR, also make the masked-NAR faux-causal since it might better help out for cfg.model.version >= 7	2025-02-23 17:23:24 -06:00
mrq	b39aaacd77	oops	2025-02-23 11:55:43 -06:00
mrq	3019c88799	separate mask token and stop token because this might cause issues	2025-02-23 11:36:32 -06:00
mrq	6634d07576	added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed	2025-02-23 11:22:13 -06:00
mrq	67a6009555	(finally) added parallel AR for cfg.model.version >= 7 (nvidia/audio-codec-44khz is being a pain and it might require training purely AR first......)	2025-02-23 08:31:03 -06:00
mrq	a65c8144f4	with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already......	2025-02-13 18:38:40 -06:00
mrq	319ca09a4f	cleanup	2025-02-12 23:36:32 -06:00
mrq	b52c5c5d80	this seems to work in testing	2025-02-12 16:16:04 -06:00
mrq	e029a8804d	ironically none of this cruft gets the loss lower than the original way	2025-02-12 11:17:00 -06:00
mrq	4b31f5c808	this seems preferable	2025-02-12 00:36:50 -06:00
mrq	04fef5dad5	agony	2025-02-12 00:18:24 -06:00
mrq	47eb498046	more tweaks	2025-02-06 23:26:26 -06:00
mrq	79c504c278	cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec)	2025-02-05 20:54:31 -06:00
mrq	69c1d2991f	updated mixtral backend (need this for something else)	2025-01-20 21:50:56 -06:00
mrq	1a26f789a5	added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually work	2025-01-12 21:52:49 -06:00
mrq	3ab11bdc7b	oops	2025-01-05 23:53:17 -06:00
mrq	b445f4abb6	experimental	2025-01-05 19:05:00 -06:00
mrq	2e6a7625e4	experimental	2025-01-05 12:47:03 -06:00
mrq	c2e17e287b	really shoddy voice conversion implementation (it sort of works...)	2024-12-16 22:54:53 -06:00
mrq	8515038968	imagine my disappointment when the epoch finished just for it to throw an exception	2024-12-16 18:28:01 -06:00
mrq	9a62e3b824	APOLLO cringe (doesn't want to work with deepspeed)	2024-12-12 00:31:58 -06:00
mrq	6468e5d124	lol	2024-12-11 19:10:32 -06:00
mrq	3ef8894290	oops	2024-12-08 15:24:21 -06:00
mrq	1d460b9fe3	logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago)	2024-12-08 14:52:47 -06:00
mrq	5d80a2d0d4	fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing now	2024-12-07 19:21:05 -06:00
mrq	34a66e1052	agnostified KD	2024-12-06 23:53:46 -06:00
mrq	23d402bf01	added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......)	2024-12-05 23:05:52 -06:00
mrq	93d27be539	rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting)	2024-12-04 20:31:44 -06:00
mrq	9dff68c0c5	NAR-len tweaks (remasks a small amount of tokens per step, it seems to help with reducing the number of steps needed some of the time?, disable CFG for the first half to speed things up)	2024-12-04 09:30:29 -06:00
mrq	cf97560e70	minimum CFG of 3 for NAR-len because it seems the model will auto-default to NAR-len now	2024-12-03 19:40:05 -06:00
mrq	67f7bad168	added mixed modality AR+NAR-len to generate a short prefix through the AR, then inference with said prefix through the NAR-len (need to experiment with it more to ensure that the masked off tokens are the only tokens getting updated)	2024-11-20 14:22:12 -06:00
mrq	b1369e7824	better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len)	2024-11-19 18:51:17 -06:00
mrq	190a917b3e	I did it.	2024-11-19 12:24:33 -06:00
mrq	0e621354e7	cleaned up classifier-free guidance logit processing (in order to try and cope with a bad nar-len model)	2024-11-19 10:30:05 -06:00
mrq	5ba80686e1	two weeks of agony concludes	2024-11-18 21:29:28 -06:00
mrq	2b29790173	oops	2024-11-18 14:12:26 -06:00
mrq	6cfdf94bf9	swap priority to use nar-len if available, added notes	2024-11-18 09:40:04 -06:00
mrq	069b27570f	set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)	2024-11-17 17:04:07 -06:00
mrq	23fdba0c98	tweaks and changes	2024-11-16 15:49:06 -06:00
mrq	39096f8ff3	redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........)	2024-11-14 22:17:47 -06:00
mrq	e412e98125	ugh	2024-11-14 07:34:22 -06:00
mrq	910033343c	overhauled how the right resp level / classifier gets picked to avoid cringemath	2024-11-13 13:31:17 -06:00
mrq	269648605e	move NAR-len rvq level 0 to separate embedding	2024-11-13 11:38:58 -06:00

1 2 3 4 5

213 Commits