vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	db181f8e88	only do auto=equal for nemo as its an FSQ	2025-02-24 21:07:44 -06:00
mrq	a5a04c39ef	when the	2025-02-24 21:03:23 -06:00
mrq	918e0dbac1	small slop cleanup	2025-02-24 19:03:53 -06:00
mrq	0f39f4d7a1	lol	2025-02-24 17:51:35 -06:00
mrq	33d5a7109a	its a miracle i was able to get a semblance of audio with the naive AudioEncoder (now it interleaves properly)	2025-02-24 14:39:12 -06:00
mrq	8f5a3997bd	another experimental flag	2025-02-24 13:50:41 -06:00
mrq	b640fabab5	borrowed muon since it might better work under deepspeed and not require cruft (even though it really does not like the masked-NAR, also make the masked-NAR faux-causal since it might better help out for cfg.model.version >= 7	2025-02-23 17:23:24 -06:00
mrq	8f3c3e01ee	oops	2025-02-23 12:09:56 -06:00
mrq	b39aaacd77	oops	2025-02-23 11:55:43 -06:00
mrq	3019c88799	separate mask token and stop token because this might cause issues	2025-02-23 11:36:32 -06:00
mrq	6634d07576	added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed	2025-02-23 11:22:13 -06:00
mrq	67a6009555	(finally) added parallel AR for cfg.model.version >= 7 (nvidia/audio-codec-44khz is being a pain and it might require training purely AR first......)	2025-02-23 08:31:03 -06:00
mrq	ab0abd2b12	fixes fixes fixes (a quarter of my recently processed audio returned zero'd tensors......)	2025-02-22 09:07:33 -06:00
mrq	13c3a08853	nevermind thats slow	2025-02-14 16:35:17 -06:00
mrq	285e493b12	ugh..........	2025-02-14 16:24:34 -06:00
mrq	a65c8144f4	with the amount of tweaks I keep making I could have probably had the nvidia/audio-codec-44khz model realized already......	2025-02-13 18:38:40 -06:00
mrq	e3becec0e8	more better-er loss calc I suppose	2025-02-13 12:49:53 -06:00
mrq	e8f182b634	cleaned up loss calc code (it REALLY hates ignore_loss_for_inputs, but is fine with splitting with loss factors)	2025-02-13 09:35:27 -06:00
mrq	319ca09a4f	cleanup	2025-02-12 23:36:32 -06:00
mrq	b52c5c5d80	this seems to work in testing	2025-02-12 16:16:04 -06:00
mrq	e029a8804d	ironically none of this cruft gets the loss lower than the original way	2025-02-12 11:17:00 -06:00
mrq	4b31f5c808	this seems preferable	2025-02-12 00:36:50 -06:00
mrq	04fef5dad5	agony	2025-02-12 00:18:24 -06:00
mrq	075ffef68a	ugh	2025-02-09 13:02:51 -06:00
mrq	47eb498046	more tweaks	2025-02-06 23:26:26 -06:00
mrq	79c504c278	cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec)	2025-02-05 20:54:31 -06:00
mrq	bb2ebe1ca2	fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies	2025-02-04 20:30:07 -06:00
mrq	0841f366e8	I should really just grab modelling_llama wholesale (fix for the adapted attention class)	2025-01-28 21:55:05 -06:00
mrq	e5f9da2221	oops	2025-01-21 11:59:24 -06:00
mrq	69c1d2991f	updated mixtral backend (need this for something else)	2025-01-20 21:50:56 -06:00
mrq	1a26f789a5	added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually work	2025-01-12 21:52:49 -06:00
mrq	3ab11bdc7b	oops	2025-01-05 23:53:17 -06:00
mrq	b445f4abb6	experimental	2025-01-05 19:05:00 -06:00
mrq	2e6a7625e4	experimental	2025-01-05 12:47:03 -06:00
mrq	9b0d2ccbe1		2024-12-26 21:42:17 -06:00
mrq	59f56ad099	cleaup	2024-12-24 23:14:32 -06:00
mrq	82e8592f2a	working vall_e.cpp	2024-12-24 17:54:48 -06:00
mrq	497bdfc67b	more work (the wall is non-causal decoding......)	2024-12-22 20:11:31 -06:00
mrq	5f289db275	ugh	2024-12-22 16:15:24 -06:00
mrq	0d4329d2e3	sanity cleanup	2024-12-22 15:05:45 -06:00
mrq	353e478e68	agony	2024-12-21 22:52:10 -06:00
mrq	91caf00212	ugh	2024-12-20 17:13:37 -06:00
mrq	59bf6b8b33	exposed additional task (ns, sr, vc) (vc is experimental)	2024-12-20 11:15:29 -06:00
mrq	e7e7f48043	livid	2024-12-19 19:25:27 -06:00
mrq	c2c6d912ac	actually do speaker verification	2024-12-17 10:11:14 -06:00
mrq	c2e17e287b	really shoddy voice conversion implementation (it sort of works...)	2024-12-16 22:54:53 -06:00
mrq	8515038968	imagine my disappointment when the epoch finished just for it to throw an exception	2024-12-16 18:28:01 -06:00
mrq	4a65ac9eb7	oops	2024-12-15 17:21:51 -06:00
mrq	9a62e3b824	APOLLO cringe (doesn't want to work with deepspeed)	2024-12-12 00:31:58 -06:00
mrq	cddf8ca814	sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them)	2024-12-11 22:45:38 -06:00

1 2 3 4 5 ...

408 Commits