vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	5cd71ef238	QoL so I can stop having to manually inject different configs	2025-03-06 14:48:14 -06:00
mrq	2fb2b732fc	wow that was fast	2025-03-04 23:17:18 -06:00
mrq	0451f75e33	now that the new model seems a little more promising, i can re-document things non-cynically	2025-03-03 13:21:41 -06:00
mrq	3f1070f575	tweaks	2025-03-02 22:36:25 -06:00
mrq	4afa4ccce5	at wits end (parhaps the semantic token approach is the toughest pill to swallow)	2025-03-01 21:03:25 -06:00
mrq	a174c33db6	a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow)	2025-02-28 17:56:50 -06:00
mrq	eff180248c	decoupled llama backend to avoid any funny changes from transformers, removed other backends since i dont think i'll ever bother using them	2025-02-27 19:00:37 -06:00
mrq	95da4e9405	made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split)	2025-02-26 10:39:13 -06:00
mrq	92139b6da9	additional cruft, added a note in documentation to be aware of NUMA node topology when running vall_e.emb.process with more than one process	2025-02-18 19:56:30 -06:00
mrq	0dc49ef4d5	documentation update while I wait for more audio (between 4 and 8 seconds per utterance) quantize for nvidia/audio-codec-44khz (I was foolish to think I can get something servicable with just 4 seconds max for an utterance)	2025-02-15 17:42:06 -06:00
mrq	04fef5dad5	agony	2025-02-12 00:18:24 -06:00
mrq	1c0ed6abac	added notes on this unfruitful experiment	2025-02-11 16:21:43 -06:00
mrq	9fa87c417a	added option to use raw text rather than the IPA phonemes (it requires a model trained on raw text)	2025-01-06 00:10:43 -06:00
mrq	9b0d2ccbe1		2024-12-26 21:42:17 -06:00
mrq	59bf6b8b33	exposed additional task (ns, sr, vc) (vc is experimental)	2024-12-20 11:15:29 -06:00
mrq	8515038968	imagine my disappointment when the epoch finished just for it to throw an exception	2024-12-16 18:28:01 -06:00
mrq	f41251f648	more fixes for local engine backend	2024-12-12 14:38:42 -06:00
mrq	8568a93dad	added WER/SIM-O metrics, added APOLLO but I need to test it	2024-12-10 20:13:21 -06:00
mrq	a6c745bafb	chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted	2024-12-09 14:26:19 -06:00
mrq	a032ff588f	doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O)	2024-12-07 22:34:25 -06:00
mrq	93d27be539	rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting)	2024-12-04 20:31:44 -06:00
mrq	9dff68c0c5	NAR-len tweaks (remasks a small amount of tokens per step, it seems to help with reducing the number of steps needed some of the time?, disable CFG for the first half to speed things up)	2024-12-04 09:30:29 -06:00
mrq	ca31da0a95	sageattn (forgot to bother with testing this the other day, seems ifne)	2024-12-03 15:14:57 -06:00
mrq	31ab90d84a	cringe code to convert to LlamaForCausalLM-happy weights + tokenizer dict (still need to write logic to actually use these weights for proper inferencing)	2024-12-03 10:18:58 -06:00
mrq	84a05acb6d	touch ups in docs	2024-12-02 19:10:42 -06:00
mrq	67f7bad168	added mixed modality AR+NAR-len to generate a short prefix through the AR, then inference with said prefix through the NAR-len (need to experiment with it more to ensure that the masked off tokens are the only tokens getting updated)	2024-11-20 14:22:12 -06:00
mrq	efeb55e1b7	documentation update	2024-11-19 19:19:34 -06:00
mrq	190a917b3e	I did it.	2024-11-19 12:24:33 -06:00
mrq	5ba80686e1	two weeks of agony concludes	2024-11-18 21:29:28 -06:00
mrq	6cfdf94bf9	swap priority to use nar-len if available, added notes	2024-11-18 09:40:04 -06:00
mrq	23fdba0c98	tweaks and changes	2024-11-16 15:49:06 -06:00
mrq	39096f8ff3	redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........)	2024-11-14 22:17:47 -06:00
mrq	2495a7ef67	Fixed STT in the web UI	2024-11-12 12:49:53 -06:00
mrq	354f8e059d	store dataset hash alongside state dict so it can be ignored if mismatched	2024-11-11 18:16:56 -06:00
mrq	f7b8b1e825	dropped subtrain dataloader since its useless to duplicate	2024-11-11 17:00:49 -06:00
mrq	9cb0b6901b	unified nar.py into ar_nar.py	2024-11-10 12:19:48 -06:00
mrq	c6a38693a2	This better work	2024-11-09 18:04:59 -06:00
mrq	8b3d1cf70a	Something's Wrong	2024-11-09 15:07:43 -06:00
mrq	dcd5fecff3	some cleanup while I wait for the NAR-len to train to an acceptable state (currently it performs okay, but only on audo after 3 seconds or so)	2024-11-09 12:12:46 -06:00
mrq	c127c4e488	'borrowed' a sampling scheduler for NAR-len's RVQ level 0 (better than before, but still not good enough)	2024-11-07 21:19:14 -06:00
mrq	e108c54daf	new NAR-len training paradigm......	2024-11-07 11:32:11 -06:00
mrq	5698188824	あたしって、ほんとバカ	2024-11-07 09:10:18 -06:00
mrq	105ed51159	I guess I'll fall for the NAR-len meme again (I don't know where my previous weights are, so I need to train it again to test something)	2024-11-06 19:17:12 -06:00
mrq	bcabde3454	more notes	2024-11-06 13:51:28 -06:00
mrq	e58a9469a3	move layerskip to experimental settings.......	2024-11-05 20:37:06 -06:00
mrq	d5aa8186f0	more doc	2024-11-05 16:53:00 -06:00
mrq	9901c4f8ca	documentation under ./docs/	2024-11-05 16:11:01 -06:00

47 Commits