vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	0656a762af	fix vall_e.emb.transcriber	2024-10-08 19:24:43 -05:00
mrq	acdce66d4e	readme tweaks, set the (unused) default model download URL back to the base ar+nar-llama-8 model, as ar+nar-tts+stt-llama-8 was renamed back to it since it performs well	2024-10-05 22:53:53 -05:00
mrq	84c7419001	faster	2024-10-04 22:30:47 -05:00
mrq	a507b769a1	sped up inferencing by not doing .tolist() for rep pen / length pen (and a bug fix in the web UI from prev commit)	2024-10-04 22:18:20 -05:00
mrq	4a8e3ccf06	README tweaks, added --input-prompt-prefix as an experiment (its literally better to just not do this, but i'll retain it in case i have a revelation on how to improve it)	2024-10-04 18:57:19 -05:00
mrq	a9fa0898a9	tweaked demo page script to sample speakers instead	2024-09-28 10:50:26 -05:00
mrq	2f1dca3089	added language selection in web UI, tweaked demo script	2024-09-28 09:49:45 -05:00
mrq	10df2ef5f3	fixed oversight where input audio does not resample (lol...)	2024-09-27 20:27:53 -05:00
mrq	039482a48e	don't do eval on stt because it's so slow and I don't even bother doing any metrics against it anyways (to-do: make this a flag)	2024-09-26 18:56:57 -05:00
mrq	ff7a1b4163	coerce into path for other sampler_types (it's required for sampling for similar utterances)	2024-09-26 18:37:56 -05:00
mrq	f24547ad4e	add top_k sampling / offset for prompt similar utterance sampling	2024-09-26 16:26:40 -05:00
mrq	9da630f73a	swap order of demo entries, as the model prioritizes adhering to the speaker prompt more (instead of trying to match the ground truth magically)	2024-09-25 23:31:24 -05:00
mrq	e84d466261	vall_e.plot tweaks	2024-09-24 20:05:10 -05:00
mrq	2266d34818	oops	2024-09-21 16:06:01 -05:00
mrq	c5e9142863	added option to retokenize phonemes for hdf5 (to save having to remake my hdf5 file)	2024-09-21 13:08:01 -05:00
mrq	536c11c4ac	actually validated and fixed sampling similar utterances for the prompt (hopefully nothing else is needed)	2024-09-21 12:59:51 -05:00
mrq	d31f27119a	regex replace out the (lang) markers in espeak, updated tokenizer vocab as lazily as possible to not have unk tokens	2024-09-21 12:29:28 -05:00
mrq	769f67dcfe	actually fix validation of phonemes in the symmap	2024-09-21 12:19:34 -05:00
mrq	c8d4716a9f	ugh	2024-09-18 21:40:57 -05:00
mrq	fe241f6a99	support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder)	2024-09-18 21:34:43 -05:00
mrq	b5bec0c9ce	oops, turns out these are not split by speaker names already........ (also added sampling the dataset in the webui for easy viewing)	2024-09-18 20:19:46 -05:00
mrq	fa9d3f6c06	lang fixes / reworked phoneme symmap validation	2024-09-18 19:36:03 -05:00
mrq	84647f588a	more tweaks	2024-09-18 16:43:57 -05:00
mrq	ebac1db16c	maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more	2024-09-17 22:57:04 -05:00
mrq	6ceed866b5	faster	2024-09-17 22:44:36 -05:00
mrq	f00283440c	faster	2024-09-17 22:26:31 -05:00
mrq	be22b65300	solved my problem	2024-09-17 21:58:44 -05:00
mrq	8f41d1b324	more tweaks	2024-09-17 16:26:30 -05:00
mrq	804ddb5182	optimizations (6 hours to do cosine similarities on a speaker set of just 17k utterances................)	2024-09-17 15:51:45 -05:00
mrq	a9fbe81f98	oops	2024-09-17 15:25:12 -05:00
mrq	c440c4fe7e	relegated processing similarity data into vall_e.emb.similarity since it's easier, seems to work?	2024-09-17 14:37:21 -05:00
mrq	56f25f7a9b	more stuff for similar-speaker prompt sampling (to-do: actually test if this works...)	2024-09-16 23:10:29 -05:00
mrq	69f140ba45	fix oversight with phonemizing french because espeak defines french as fr-fr instead of fr (even though spain spanish is es and not es-sp or some shit, but portugal portuguese is pt-pt)	2024-09-13 12:53:36 -05:00
mrq	4f3c7a37c8	also do text similarities (dont know what use I'll have for this)	2024-09-10 16:45:59 -05:00
mrq	1c615a0f52	helper script (vall_e.emb.similar) to figure out the best way to compute similarity scores for audio (iunno how to go about it desu)	2024-09-10 16:34:23 -05:00
mrq	17487ad70a	weird quirk in process_emilia.py where language gets mutated, somehow (I hate python)	2024-09-10 14:00:27 -05:00
mrq	d059f6f56d	added helper script to process Emilia (amphion/Emilia-Dataset), clean up espeak phonemes for non-English transcriptions with English words (because for some reason espeak injects (en){word}(lang) markers and it's annoying)	2024-09-09 09:57:32 -05:00
mrq	31e8b7edb8	tweaks and fixes for lora stuffs	2024-09-08 18:05:21 -05:00
mrq	54203c059d	validated rep pen for STT (sometimes needed to wrangle the model)	2024-09-08 08:30:30 -05:00
mrq	6a967f91b9	oops	2024-09-07 22:13:49 -05:00
mrq	5d66a7db52	webui cleanup, more tweaks, default to safetensors in config	2024-09-07 21:45:05 -05:00
mrq	a6ad0577b8	cleanup the resultant text from STT	2024-09-06 18:44:25 -05:00
mrq	fa93061b3e	more fixes, moved sampler state dict to a better place, eval works again	2024-09-06 16:59:56 -05:00
mrq	4bd9bb39c8	webui for STT (still need to bake the model to handle it better, a few hours so far has it generate what looks like a normal transcription but does not correlate to the audio right now)	2024-09-06 15:13:04 -05:00
mrq	d33a906119	cleanup for AR_NAR inferencing to allow both TTS and STT tasks simultaneously (need to have training eval do this to though)	2024-09-06 14:30:12 -05:00
mrq	341e19162b	fixes, again	2024-09-06 11:41:41 -05:00
mrq	94cf81d38c	tweak	2024-09-05 23:21:18 -05:00
mrq	413097f5f7	fixes	2024-09-05 21:42:59 -05:00
mrq	54547b74d8	experimental implementation of STT (need to actually test on a model, test trainer seems to work)	2024-09-05 20:43:20 -05:00
mrq	d319d33368	haha	2024-09-04 14:52:26 -05:00

1 2 3 4 5 ...

529 Commits