vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	93044829af	one more time (could have sworn i tested it with batch size > 1)	2025-03-07 19:14:33 -06:00
mrq	6cea840710	oops	2025-03-07 18:57:25 -06:00
mrq	dbd34b6430	add specialized calc_loss because schizo	2025-03-07 18:44:11 -06:00
mrq	8d848ed549	handle case of dropping cond for segment mask	2025-03-07 14:11:58 -06:00
mrq	89e52b9877	ugh	2025-03-07 13:55:57 -06:00
mrq	6afc2b7526	gut feeling to change the attention mask	2025-03-07 13:51:59 -06:00
mrq	ec87308d75	final tweaks before training this meme 44khz model for the 3rd time	2025-03-06 15:31:15 -06:00
mrq	5cd71ef238	QoL so I can stop having to manually inject different configs	2025-03-06 14:48:14 -06:00
mrq	0d809561c6	accuracy k=1 and k=80 because im probably dumb for k=10 as the default since it does not represent any usecase	2025-03-05 16:35:34 -06:00
mrq	2fb2b732fc	wow that was fast	2025-03-04 23:17:18 -06:00
mrq	462f71e2f7	ugh	2025-03-04 14:57:00 -06:00
mrq	1cd24f3381	a birdie tells me i should probably use a different optimizer (also preliminary support for native sparse attention but I don't know if I'll use it)	2025-03-04 14:53:02 -06:00
mrq	0451f75e33	now that the new model seems a little more promising, i can re-document things non-cynically	2025-03-03 13:21:41 -06:00
mrq	3f1070f575	tweaks	2025-03-02 22:36:25 -06:00
mrq	17094b8002	reticulating splines	2025-03-01 17:48:51 -06:00
mrq	ddc49c89c5	the learning rate scheduler pill is a tough pill to swallow	2025-02-28 22:12:19 -06:00
mrq	b97faa8173	fixes...	2025-02-28 18:53:07 -06:00
mrq	4e7d885542	lol	2025-02-28 18:06:41 -06:00
mrq	a174c33db6	a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow)	2025-02-28 17:56:50 -06:00
mrq	09d82a26fe	ugh	2025-02-28 01:06:38 -06:00
mrq	93feb5660f	do not like that	2025-02-27 23:59:56 -06:00
mrq	f4f435d7f5	when you already had these ideas to stabilize training but you just ignored them	2025-02-27 23:39:20 -06:00
mrq	0a45c9c042	fix attention backend not being used	2025-02-27 21:38:38 -06:00
mrq	b8e9f3d785	maybe this will work	2025-02-27 20:42:12 -06:00
mrq	01e96bafc9	ugh	2025-02-27 19:05:32 -06:00
mrq	eff180248c	decoupled llama backend to avoid any funny changes from transformers, removed other backends since i dont think i'll ever bother using them	2025-02-27 19:00:37 -06:00
mrq	ceecac6ffe	I think I made resp_parallel_training=True faster with loss factoring?	2025-02-26 23:13:32 -06:00
mrq	cbd4d7d7f4	ugh	2025-02-26 21:31:10 -06:00
mrq	2ea387c08a	segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working)	2025-02-26 21:26:13 -06:00
mrq	95da4e9405	made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split)	2025-02-26 10:39:13 -06:00
mrq	de27115bb7	there's something wrong with it on my 4xV100 rig......	2025-02-25 15:14:08 -06:00
mrq	db181f8e88	only do auto=equal for nemo as its an FSQ	2025-02-24 21:07:44 -06:00
mrq	a5a04c39ef	when the	2025-02-24 21:03:23 -06:00
mrq	918e0dbac1	small slop cleanup	2025-02-24 19:03:53 -06:00
mrq	0f39f4d7a1	lol	2025-02-24 17:51:35 -06:00
mrq	33d5a7109a	its a miracle i was able to get a semblance of audio with the naive AudioEncoder (now it interleaves properly)	2025-02-24 14:39:12 -06:00
mrq	8f5a3997bd	another experimental flag	2025-02-24 13:50:41 -06:00
mrq	b640fabab5	borrowed muon since it might better work under deepspeed and not require cruft (even though it really does not like the masked-NAR, also make the masked-NAR faux-causal since it might better help out for cfg.model.version >= 7	2025-02-23 17:23:24 -06:00
mrq	8f3c3e01ee	oops	2025-02-23 12:09:56 -06:00
mrq	b39aaacd77	oops	2025-02-23 11:55:43 -06:00
mrq	3019c88799	separate mask token and stop token because this might cause issues	2025-02-23 11:36:32 -06:00
mrq	6634d07576	added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed	2025-02-23 11:22:13 -06:00
mrq	67a6009555	(finally) added parallel AR for cfg.model.version >= 7 (nvidia/audio-codec-44khz is being a pain and it might require training purely AR first......)	2025-02-23 08:31:03 -06:00
mrq	ab0abd2b12	fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......)	2025-02-22 09:07:33 -06:00
mrq	13c3a08853	nevermind thats slow	2025-02-14 16:35:17 -06:00
mrq	285e493b12	ugh..........	2025-02-14 16:24:34 -06:00
mrq	a65c8144f4	with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already......	2025-02-13 18:38:40 -06:00
mrq	e3becec0e8	more better-er loss calc I suppose	2025-02-13 12:49:53 -06:00
mrq	e8f182b634	cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors)	2025-02-13 09:35:27 -06:00
mrq	319ca09a4f	cleanup	2025-02-12 23:36:32 -06:00

1 2 3 4 5 ...

439 Commits