vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	2dd80a03ff	stuff for interfacing with the loss scaler value (because I want to cap it)	2025-03-06 17:07:29 -06:00
mrq	1cd24f3381	a birdie tells me i should probably use a different optimizer (also preliminary support for native sparse attention but I don't know if I'll use it)	2025-03-04 14:53:02 -06:00
mrq	3f1070f575	tweaks	2025-03-02 22:36:25 -06:00
mrq	ddc49c89c5	the learning rate scheduler pill is a tough pill to swallow	2025-02-28 22:12:19 -06:00
mrq	a174c33db6	a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow)	2025-02-28 17:56:50 -06:00
mrq	09d82a26fe	ugh	2025-02-28 01:06:38 -06:00
mrq	f4f435d7f5	when you already had these ideas to stabilize training but you just ignored them	2025-02-27 23:39:20 -06:00
mrq	95da4e9405	made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split)	2025-02-26 10:39:13 -06:00
mrq	cbf6b84e27	fixed grad norm and loss scale not reporting for local trainer	2025-02-23 19:08:26 -06:00
mrq	b640fabab5	borrowed muon since it might better work under deepspeed and not require cruft (even though it really does not like the masked-NAR, also make the masked-NAR faux-causal since it might better help out for cfg.model.version >= 7	2025-02-23 17:23:24 -06:00
mrq	3019c88799	separate mask token and stop token because this might cause issues	2025-02-23 11:36:32 -06:00
mrq	6634d07576	added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed	2025-02-23 11:22:13 -06:00
mrq	a65c8144f4	with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already......	2025-02-13 18:38:40 -06:00
mrq	d4a6709fb4	stopgap cringe to get this training session working (it does not seem fruitful)	2025-02-11 13:45:09 -06:00
mrq	353e478e68	agony	2024-12-21 22:52:10 -06:00
mrq	d85273609e	corrected export.py's --hf	2024-12-20 15:17:13 -06:00
mrq	c2c6d912ac	actually do speaker verification	2024-12-17 10:11:14 -06:00
mrq	8515038968	imagine my disappointment when the epoch finished just for it to throw an exception	2024-12-16 18:28:01 -06:00
mrq	2ba6b483dc	ugh	2024-12-14 22:43:51 -06:00
mrq	3dd31e74d1	finally figured out a clean way to handle "resuming" the tqdm bar	2024-12-14 18:44:43 -06:00
mrq	35389481ee	move lazy-stored ortho matrix to the grad device for apollo because agony	2024-12-13 23:22:26 -06:00
mrq	09804ecc16	APOLLO tweaks to make it work with deepspeed	2024-12-13 23:03:52 -06:00
mrq	64c67160a3	tweaks	2024-12-13 19:00:35 -06:00
mrq	f41251f648	more fixes for local engine backend	2024-12-12 14:38:42 -06:00
mrq	9a62e3b824	APOLLO cringe (doesn't want to work with deepspeed)	2024-12-12 00:31:58 -06:00
mrq	b81a98799b	uplifting transformer's WavLM stuff to do speaker verification instead	2024-12-11 19:30:05 -06:00
mrq	6f1ee0c6fa	Added CER, transcription/similarity model args in demo	2024-12-10 21:00:51 -06:00
mrq	8568a93dad	added WER/SIM-O metrics, added APOLLO but I need to test it	2024-12-10 20:13:21 -06:00
mrq	34a66e1052	agnostified KD	2024-12-06 23:53:46 -06:00
mrq	42fafbaaca	actually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps)	2024-12-06 21:55:20 -06:00
mrq	dcaf38b359	fixed training tqdm being stubborn	2024-11-23 09:45:23 -06:00
mrq	88d840218d	default set cfg strength to 3.0 since the reference model is updated	2024-11-17 10:23:40 -06:00
mrq	29e45be0b4	tweaks to bucket sampling	2024-11-13 11:09:24 -06:00
mrq	b2eca271a8	ugh	2024-11-13 10:35:44 -06:00
mrq	ad7cfffc00	NAR-len RVQ-0 was being trained causally.............	2024-11-13 09:43:50 -06:00
mrq	976ee87f6f	resume iteration step in tqdm trainer, warn to logger if the sampler state dict was invalidated	2024-11-13 09:09:28 -06:00
mrq	0f2584eba7	new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?)	2024-11-12 22:30:09 -06:00
mrq	2f56696506	overhauled inference/sampler kwargs to stop being a bloated mess	2024-11-11 20:21:16 -06:00
mrq	354f8e059d	store dataset hash alongside state dict so it can be ignored if mismatched	2024-11-11 18:16:56 -06:00
mrq	f7b8b1e825	dropped subtrain dataloader since its useless to duplicate	2024-11-11 17:00:49 -06:00
mrq	cf9df71f2c	use homwbrewed caching system for dataloader paths / durations (I'm pretty sure I am now triggering OOM killers with my entire dataset used)	2024-11-11 16:32:08 -06:00
mrq	a9d2faf2d7	all I can do now until I wait for the model to (re)train for pure NAR	2024-11-09 22:57:34 -06:00
mrq	8eb9a4056b	modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling	2024-10-22 18:12:39 -05:00
mrq	fc8dfd8617	made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar)	2024-10-18 16:55:00 -05:00
mrq	75b90be325	cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified	2024-10-17 17:06:48 -05:00
mrq	a507b769a1	sped up inferencing by not doing .tolist() for rep pen / length pen (and a bug fix in the web UI from prev commit)	2024-10-04 22:18:20 -05:00
mrq	769f67dcfe	actually fix validation of phonemes in the symmap	2024-09-21 12:19:34 -05:00
mrq	b5bec0c9ce	oops, turns out these are not split by speaker names already........ (also added sampling the dataset in the webui for easy viewing)	2024-09-18 20:19:46 -05:00
mrq	84647f588a	more tweaks	2024-09-18 16:43:57 -05:00
mrq	ebac1db16c	maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more	2024-09-17 22:57:04 -05:00

1 2 3

143 Commits