vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	c0b46b82eb	tweaks	2025-02-10 21:48:29 -06:00
mrq	d6a679ca5c	tweaks	2025-02-10 20:53:08 -06:00
mrq	276a2342a4	tweaks to processing script	2025-02-10 19:18:13 -06:00
mrq	b3f9b76fd9	invalidate a path if loading via metadata and entry is not in hdf5 (to avoid reparsing my metadata since I'm using a partial copy of my dataset at the moment)	2025-02-10 14:43:15 -06:00
mrq	075ffef68a	ugh	2025-02-09 13:02:51 -06:00
mrq	953015748f	ugh	2025-02-07 20:49:28 -06:00
mrq	ed94b261dc	could have sworn i had 'vall_e.emb.process --dtype' working, also possible RAM optimization so I can stop locking up my server when firing four encoding processes	2025-02-07 18:52:19 -06:00
mrq	67a9401cce	oops	2025-02-06 15:14:14 -06:00
mrq	712ce4af5d	maybe fixed errors with DAC backend, added option to limit by duration in emb.process (because I only really need short utternaces right now and I'm not ready to spend a week on processing everything again)	2025-02-06 12:37:18 -06:00
mrq	299cc88821	re-added amp encoding/decoding for audio, possible bad idea to ignore using amp instead if requested	2025-02-05 21:55:06 -06:00
mrq	7592befc53	updated vall_e.emb.process to allow for batched processing, some typo fixes (it's painfully slow on my 7900XTX...)	2025-02-05 21:13:20 -06:00
mrq	79c504c278	cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec)	2025-02-05 20:54:31 -06:00
mrq	84174c1c1b	oops	2025-02-05 10:25:03 -06:00
mrq	bb2ebe1ca2	fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies	2025-02-04 20:30:07 -06:00
mrq	c2c6d912ac	actually do speaker verification	2024-12-17 10:11:14 -06:00
mrq	cd4a5f427c	KO/ZH model soon	2024-12-15 17:01:14 -06:00
mrq	20b87bfbd0	store metrics and only recalculate them if the output file is newer than the metrics file	2024-12-11 20:55:43 -06:00
mrq	0c69e798f7	template cleanup	2024-12-11 20:06:55 -06:00
mrq	7e54e897f7	also shifted to transformer's pipeline for transcribing	2024-12-11 19:57:53 -06:00
mrq	b81a98799b	uplifting transformer's WavLM stuff to do speaker verification instead	2024-12-11 19:30:05 -06:00
mrq	6f1ee0c6fa	Added CER, transcription/similarity model args in demo	2024-12-10 21:00:51 -06:00
mrq	8568a93dad	added WER/SIM-O metrics, added APOLLO but I need to test it	2024-12-10 20:13:21 -06:00
mrq	a6c745bafb	chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted	2024-12-09 14:26:19 -06:00
mrq	a032ff588f	doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O)	2024-12-07 22:34:25 -06:00
mrq	48490757da	fixes	2024-11-10 20:37:50 -06:00
mrq	bbc2de3713	ugh	2024-11-05 11:50:05 -06:00
mrq	3826f9bae4	saner mask creation? (it doesnt matter, kv cache wont work)	2024-11-02 21:00:21 -05:00
mrq	bef43a0c18	added experimental entropix sampling support	2024-10-11 21:18:26 -05:00
mrq	2ea978f318	added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval	2024-10-10 13:40:25 -05:00
mrq	52299127ab	fix vall_e.emb.process	2024-10-08 20:00:34 -05:00
mrq	0656a762af	fix vall_e.emb.transcriber	2024-10-08 19:24:43 -05:00
mrq	10df2ef5f3	fixed oversight where input audio does not resample (lol...)	2024-09-27 20:27:53 -05:00
mrq	c8d4716a9f	ugh	2024-09-18 21:40:57 -05:00
mrq	fa9d3f6c06	lang fixes / reworked phoneme symmap validation	2024-09-18 19:36:03 -05:00
mrq	84647f588a	more tweaks	2024-09-18 16:43:57 -05:00
mrq	ebac1db16c	maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more	2024-09-17 22:57:04 -05:00
mrq	6ceed866b5	faster	2024-09-17 22:44:36 -05:00
mrq	f00283440c	faster	2024-09-17 22:26:31 -05:00
mrq	be22b65300	solved my problem	2024-09-17 21:58:44 -05:00
mrq	8f41d1b324	more tweaks	2024-09-17 16:26:30 -05:00
mrq	804ddb5182	optimizations (6 hours to do cosine similarities on a speaker set of just 17k utterances................)	2024-09-17 15:51:45 -05:00
mrq	a9fbe81f98	oops	2024-09-17 15:25:12 -05:00
mrq	c440c4fe7e	relegated processing similarity data into vall_e.emb.similarity since it's easier, seems to work?	2024-09-17 14:37:21 -05:00
mrq	56f25f7a9b	more stuff for similar-speaker prompt sampling (to-do: actually test if this works...)	2024-09-16 23:10:29 -05:00
mrq	69f140ba45	fix oversight with phonemizing french because espeak defines french as fr-fr instead of fr (even though spain spanish is es and not es-sp or some shit, but portugal portuguese is pt-pt)	2024-09-13 12:53:36 -05:00
mrq	4f3c7a37c8	also do text similarities (dont know what use I'll have for this)	2024-09-10 16:45:59 -05:00
mrq	1c615a0f52	helper script (vall_e.emb.similar) to figure out the best way to compute similarity scores for audio (iunno how to go about it desu)	2024-09-10 16:34:23 -05:00
mrq	32287710a2	moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)	2024-08-29 13:27:16 -05:00
mrq	054d28573a	my DAC dataset again managed to only have some utterances with only 8 of 9 RVQ levels, this fixes an oversight from it	2024-08-09 21:18:01 -05:00
mrq	79a6781c9e	fix vall_e.data --action=hdf5 actually transcribing because past me completely forgot it tried to already put the transcribe/process dataset scripts inside the module before	2024-08-08 07:51:42 -05:00

1 2 3

102 Commits