vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	ceecac6ffe	I think I made resp_parallel_training=True faster with loss factoring?	2025-02-26 23:13:32 -06:00
mrq	06ef3daf3c	require minimum of 1 second durations for training because of my slop code auto-transposing that I don't wanna fix right now	2025-02-26 22:00:33 -06:00
mrq	cbd4d7d7f4	ugh	2025-02-26 21:31:10 -06:00
mrq	2ea387c08a	segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working)	2025-02-26 21:26:13 -06:00
mrq	7d2e64630c	lol	2025-02-26 10:49:06 -06:00
mrq	95da4e9405	made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split)	2025-02-26 10:39:13 -06:00
mrq	de27115bb7	there's something wrong with it on my 4xV100 rig......	2025-02-25 15:14:08 -06:00
mrq	db181f8e88	only do auto=equal for nemo as its an FSQ	2025-02-24 21:07:44 -06:00
mrq	a5a04c39ef	when the	2025-02-24 21:03:23 -06:00
mrq	918e0dbac1	small slop cleanup	2025-02-24 19:03:53 -06:00
mrq	3330b5bb00	maybe fix NaNs being thrown for immature models at fp16 for training evals	2025-02-24 18:25:54 -06:00
mrq	0f39f4d7a1	lol	2025-02-24 17:51:35 -06:00
mrq	33d5a7109a	its a miracle i was able to get a semblance of audio with the naive AudioEncoder (now it interleaves properly)	2025-02-24 14:39:12 -06:00
mrq	6e7b269147	ugh	2025-02-24 13:54:21 -06:00
mrq	8f5a3997bd	another experimental flag	2025-02-24 13:50:41 -06:00
mrq	f593ee98fc	ugh	2025-02-23 21:20:36 -06:00
mrq	cbf6b84e27	fixed grad norm and loss scale not reporting for local trainer	2025-02-23 19:08:26 -06:00
mrq	b640fabab5	borrowed muon since it might better work under deepspeed and not require cruft (even though it really does not like the masked-NAR, also make the masked-NAR faux-causal since it might better help out for cfg.model.version >= 7	2025-02-23 17:23:24 -06:00
mrq	d33ccd188a	ugh	2025-02-23 12:31:07 -06:00
mrq	8f3c3e01ee	oops	2025-02-23 12:09:56 -06:00
mrq	b39aaacd77	oops	2025-02-23 11:55:43 -06:00
mrq	3019c88799	separate mask token and stop token because this might cause issues	2025-02-23 11:36:32 -06:00
mrq	6634d07576	added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed	2025-02-23 11:22:13 -06:00
mrq	67a6009555	(finally) added parallel AR for cfg.model.version >= 7 (nvidia/audio-codec-44khz is being a pain and it might require training purely AR first......)	2025-02-23 08:31:03 -06:00
mrq	15b3c20e19	also throw exception for zero'd out tensor during training (I am very paranoid now)	2025-02-22 14:09:41 -06:00
mrq	ab0abd2b12	fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......)	2025-02-22 09:07:33 -06:00
mrq	50506e5ebc	oops	2025-02-20 20:55:58 -06:00
mrq	fc1ec2019d	added option to buffer process jobs across multiple speakers to maybe squeeze out some throughput speeds for vall_e.emb.process (in the event of lots of speakers with low file counts, such as Emilia)	2025-02-20 14:56:32 -06:00
mrq	ce1ca0124a	lol...	2025-02-20 13:40:36 -06:00
mrq	92139b6da9	additional cruft, added a note in documentation to be aware of NUMA node topology when running vall_e.emb.process with more than one process	2025-02-18 19:56:30 -06:00
mrq	596c2df11c	added arg to skip processing speakers with not enough utterances for whenever I get around to processing my subest of Emilia for nvidia/audio-codec-44khz (because Emilia has a ton of low-utternace speaker counts and right now my focus with the nemo model is on getting it to actually speak without much problems rather than feed it a gorillion speakers)	2025-02-18 10:49:21 -06:00
mrq	8331eee6fa	added arg to limit vall_e.emb.process batch size since there's some speaker groups in LibriLight/Speech/whatever that have 10K utterances and I'm going impatient	2025-02-18 10:19:17 -06:00
mrq	8f86cf0e4e	possible logic optimization so I don't spend another 15 minutes simply iterating back to the point I was at in vall_e.emb.process	2025-02-16 11:34:05 -06:00
mrq	0dc49ef4d5	documentation update while I wait for more audio (between 4 and 8 seconds per utterance) quantize for nvidia/audio-codec-44khz (I was foolish to think I can get something servicable with just 4 seconds max for an utterance)	2025-02-15 17:42:06 -06:00
mrq	13c3a08853	nevermind thats slow	2025-02-14 16:35:17 -06:00
mrq	285e493b12	ugh..........	2025-02-14 16:24:34 -06:00
mrq	a65c8144f4	with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already......	2025-02-13 18:38:40 -06:00
mrq	e3becec0e8	more better-er loss calc I suppose	2025-02-13 12:49:53 -06:00
mrq	e8f182b634	cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors)	2025-02-13 09:35:27 -06:00
mrq	319ca09a4f	cleanup	2025-02-12 23:36:32 -06:00
mrq	b52c5c5d80	this seems to work in testing	2025-02-12 16:16:04 -06:00
mrq	e029a8804d	ironically none of this cruft gets the loss lower than the original way	2025-02-12 11:17:00 -06:00
mrq	4b31f5c808	this seems preferable	2025-02-12 00:36:50 -06:00
mrq	04fef5dad5	agony	2025-02-12 00:18:24 -06:00
mrq	1c0ed6abac	added notes on this unfruitful experiment	2025-02-11 16:21:43 -06:00
mrq	e5916ea519	for my sanity it seems having extraneous tokens in the embedding/classifier has the loss/acc a little higher than it should	2025-02-11 14:47:35 -06:00
mrq	d4a6709fb4	stopgap cringe to get this training session working (it does not seem fruitful)	2025-02-11 13:45:09 -06:00
mrq	c0b46b82eb	tweaks	2025-02-10 21:48:29 -06:00
mrq	d6a679ca5c	tweaks	2025-02-10 20:53:08 -06:00
mrq	276a2342a4	tweaks to processing script	2025-02-10 19:18:13 -06:00

1 2 3 4 5 ...

775 Commits