vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	8838babcba	sanity checks (and I realized that the model actually had langs set to 4 in the yaml for KO/ZH so................	2024-12-19 19:08:57 -06:00
mrq	7617b6485f	instead just compute a bunch of stuff on the transcriptions to store later in different names so I can just retrieve what I want, also added tongue twisters for nefarious reasons	2024-12-18 23:43:11 -06:00
mrq	4775edaa41	added text cleaning/normalization for wer purposes but it amounts to nothing desu	2024-12-18 19:58:53 -06:00
mrq	9090c34f10	cringe script to process seed-tts-eval's eval dataset into something i can easily use	2024-12-17 22:47:12 -06:00
mrq	ed152f78df	tweaks to prompt duration to allow me to divorce how i use it for training with how I'm using it for the demo page, and demo page tweaks to make my life easier	2024-12-17 19:33:04 -06:00
mrq	7129582303	actually do proper wer/cer calculation by un-normalizing the scores	2024-12-17 14:22:30 -06:00
mrq	c2c6d912ac	actually do speaker verification	2024-12-17 10:11:14 -06:00
mrq	c2e17e287b	really shoddy voice conversion implementation (it sort of works...)	2024-12-16 22:54:53 -06:00
mrq	f41251f648	more fixes for local engine backend	2024-12-12 14:38:42 -06:00
mrq	cddf8ca814	sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them)	2024-12-11 22:45:38 -06:00
mrq	20b87bfbd0	store metrics and only recalculate them if the output file is newer than the metrics file	2024-12-11 20:55:43 -06:00
mrq	0c69e798f7	template cleanup	2024-12-11 20:06:55 -06:00
mrq	7e54e897f7	also shifted to transformer's pipeline for transcribing	2024-12-11 19:57:53 -06:00
mrq	b81a98799b	uplifting transformer's WavLM stuff to do speaker verification instead	2024-12-11 19:30:05 -06:00
mrq	6468e5d124	lol	2024-12-11 19:10:32 -06:00
mrq	6f1ee0c6fa	Added CER, transcription/similarity model args in demo	2024-12-10 21:00:51 -06:00
mrq	8568a93dad	added WER/SIM-O metrics, added APOLLO but I need to test it	2024-12-10 20:13:21 -06:00
mrq	1d460b9fe3	logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago)	2024-12-08 14:52:47 -06:00
mrq	a032ff588f	doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O)	2024-12-07 22:34:25 -06:00
mrq	5d80a2d0d4	fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing now	2024-12-07 19:21:05 -06:00
mrq	84a05acb6d	touch ups in docs	2024-12-02 19:10:42 -06:00
mrq	6aee08f9c0	moved stuff in the web UI around (un-experimented the max NAR-len steps because its kind of important to adjust this value for better sounding audio / quicker generated audio)	2024-11-20 20:37:33 -06:00
mrq	b1369e7824	better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len)	2024-11-19 18:51:17 -06:00
mrq	069b27570f	set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)	2024-11-17 17:04:07 -06:00
mrq	88d840218d	default set cfg strength to 3.0 since the reference model is updated	2024-11-17 10:23:40 -06:00
mrq	0f2584eba7	new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?)	2024-11-12 22:30:09 -06:00
mrq	9e65e05e83	more windows specific fixes, limit gradio to <5.0.0 on linux (it works on windows, but not on my linux machine tm)	2024-11-04 18:00:33 -06:00
mrq	c83670c38c	Windows specific fixes (to-do: find libespeak-ng.dll automatically because it cannot be trusted to do it by default)	2024-11-03 19:19:15 -06:00
mrq	d229725c76	more adjustments (adjustments of early-exit entropy/varentropy thresholds, default rep pen being 1.5, experimental refine-on-stop, etc.)	2024-11-03 18:31:28 -06:00
mrq	aee08b7307	changed layerskip float16 training warning (since it didnt seem to fry on my 4xV100 system)	2024-11-03 09:58:29 -06:00
mrq	a22534e8f4	layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this)	2024-10-30 20:05:45 -05:00
mrq	4049f51ba9	added option to load lora directly from the model file itself with --lora	2024-10-26 00:13:10 -05:00
mrq	8920e5e86b	actually have beam_width in the webUI work	2024-10-22 22:06:22 -05:00
mrq	8eb9a4056b	modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling	2024-10-22 18:12:39 -05:00
mrq	1a02cd5bce	modify demo template to say F5 instead of YourTTS, swap LoRA comparison around to make the lora'd the base file, and the no-lora the suffix'd file	2024-10-21 19:52:02 -05:00
mrq	02dfc60ac3	ugh	2024-10-18 17:23:22 -05:00
mrq	c8f31db1de	default to greedy sample AR (i should probably test this more but it seems to pass my harvard sentences and tongue twisters)	2024-10-18 16:58:56 -05:00
mrq	07f4935a75	more tweaks	2024-10-18 13:19:36 -05:00
mrq	0dfab973e7	oops	2024-10-18 09:40:06 -05:00
mrq	70cf694cfd	output attention scores for SDPA/flash, since naive attention seems broken	2024-10-12 12:09:17 -05:00
mrq	541e45263c	ugh	2024-10-12 11:29:16 -05:00
mrq	04e983b86b	modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now	2024-10-12 11:27:55 -05:00
mrq	bef43a0c18	added experimental entropix sampling support	2024-10-11 21:18:26 -05:00
mrq	85d85c1351	more arg creep for demo page	2024-10-10 19:40:01 -05:00
mrq	301468f519	<<	2024-10-10 19:13:52 -05:00
mrq	75a4c866d6	more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI)	2024-10-10 19:04:12 -05:00
mrq	96d05be73c	demo page tweaks	2024-10-10 13:52:37 -05:00
mrq	2ea978f318	added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval	2024-10-10 13:40:25 -05:00
mrq	a9fa0898a9	tweaked demo page script to sample speakers instead	2024-09-28 10:50:26 -05:00
mrq	2f1dca3089	added language selection in web UI, tweaked demo script	2024-09-28 09:49:45 -05:00

1 2

61 Commits