vall-e

Author	SHA1	Message	Date
mrq	cddf8ca814	sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them)	2024-12-11 22:45:38 -06:00
mrq	20b87bfbd0	store metrics and only recalculate them if the output file is newer than the metrics file	2024-12-11 20:55:43 -06:00
mrq	0c69e798f7	template cleanup	2024-12-11 20:06:55 -06:00
mrq	7e54e897f7	also shifted to transformer's pipeline for transcribing	2024-12-11 19:57:53 -06:00
mrq	b81a98799b	uplifting transformer's WavLM stuff to do speaker verification instead	2024-12-11 19:30:05 -06:00
mrq	6468e5d124	lol	2024-12-11 19:10:32 -06:00
mrq	6f1ee0c6fa	Added CER, transcription/similarity model args in demo	2024-12-10 21:00:51 -06:00
mrq	8568a93dad	added WER/SIM-O metrics, added APOLLO but I need to test it	2024-12-10 20:13:21 -06:00
mrq	1d460b9fe3	logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago)	2024-12-08 14:52:47 -06:00
mrq	a032ff588f	doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O)	2024-12-07 22:34:25 -06:00
mrq	5d80a2d0d4	fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing now	2024-12-07 19:21:05 -06:00
mrq	84a05acb6d	touch ups in docs	2024-12-02 19:10:42 -06:00
mrq	6aee08f9c0	moved stuff in the web UI around (un-experimented the max NAR-len steps because its kind of important to adjust this value for better sounding audio / quicker generated audio)	2024-11-20 20:37:33 -06:00
mrq	b1369e7824	better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len)	2024-11-19 18:51:17 -06:00
mrq	069b27570f	set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)	2024-11-17 17:04:07 -06:00
mrq	88d840218d	default set cfg strength to 3.0 since the reference model is updated	2024-11-17 10:23:40 -06:00
mrq	0f2584eba7	new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?)	2024-11-12 22:30:09 -06:00
mrq	9e65e05e83	more windows specific fixes, limit gradio to <5.0.0 on linux (it works on windows, but not on my linux machine tm)	2024-11-04 18:00:33 -06:00
mrq	c83670c38c	Windows specific fixes (to-do: find libespeak-ng.dll automatically because it cannot be trusted to do it by default)	2024-11-03 19:19:15 -06:00
mrq	d229725c76	more adjustments (adjustments of early-exit entropy/varentropy thresholds, default rep pen being 1.5, experimental refine-on-stop, etc.)	2024-11-03 18:31:28 -06:00
mrq	aee08b7307	changed layerskip float16 training warning (since it didnt seem to fry on my 4xV100 system)	2024-11-03 09:58:29 -06:00
mrq	a22534e8f4	layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this)	2024-10-30 20:05:45 -05:00
mrq	4049f51ba9	added option to load lora directly from the model file itself with --lora	2024-10-26 00:13:10 -05:00
mrq	8920e5e86b	actually have beam_width in the webUI work	2024-10-22 22:06:22 -05:00
mrq	8eb9a4056b	modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling	2024-10-22 18:12:39 -05:00
mrq	1a02cd5bce	modify demo template to say F5 instead of YourTTS, swap LoRA comparison around to make the lora'd the base file, and the no-lora the suffix'd file	2024-10-21 19:52:02 -05:00
mrq	02dfc60ac3	ugh	2024-10-18 17:23:22 -05:00
mrq	c8f31db1de	default to greedy sample AR (i should probably test this more but it seems to pass my harvard sentences and tongue twisters)	2024-10-18 16:58:56 -05:00
mrq	07f4935a75	more tweaks	2024-10-18 13:19:36 -05:00
mrq	0dfab973e7	oops	2024-10-18 09:40:06 -05:00
mrq	70cf694cfd	output attention scores for SDPA/flash, since naive attention seems broken	2024-10-12 12:09:17 -05:00
mrq	541e45263c	ugh	2024-10-12 11:29:16 -05:00
mrq	04e983b86b	modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now	2024-10-12 11:27:55 -05:00
mrq	bef43a0c18	added experimental entropix sampling support	2024-10-11 21:18:26 -05:00
mrq	85d85c1351	more arg creep for demo page	2024-10-10 19:40:01 -05:00
mrq	301468f519	<<	2024-10-10 19:13:52 -05:00
mrq	75a4c866d6	more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI)	2024-10-10 19:04:12 -05:00
mrq	96d05be73c	demo page tweaks	2024-10-10 13:52:37 -05:00
mrq	2ea978f318	added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval	2024-10-10 13:40:25 -05:00
mrq	a9fa0898a9	tweaked demo page script to sample speakers instead	2024-09-28 10:50:26 -05:00
mrq	2f1dca3089	added language selection in web UI, tweaked demo script	2024-09-28 09:49:45 -05:00
mrq	039482a48e	don't do eval on stt because it's so slow and I don't even bother doing any metrics against it anyways (to-do: make this a flag)	2024-09-26 18:56:57 -05:00
mrq	9da630f73a	swap order of demo entries, as the model prioritizes adhering to the speaker prompt more (instead of trying to match the ground truth magically)	2024-09-25 23:31:24 -05:00
mrq	32287710a2	moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)	2024-08-29 13:27:16 -05:00
mrq	3a65cc4b22	fix issue with sft and shared tensors...	2024-08-04 19:56:21 -05:00
mrq	ad024f400f	actually pass language into dataset process script, fix coercing japanese into hiragana because espeak does not like kanji	2024-07-21 23:21:37 -05:00
mrq	3e5ca3a201	more demo page tweaks	2024-07-21 19:31:13 -05:00
mrq	7366f36f81	oops	2024-07-21 19:17:25 -05:00
mrq	e19aa643a6	cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training	2024-07-21 19:12:03 -05:00
mrq	ba7ee8c0ee	added demo link to readme	2024-07-19 21:22:30 -05:00

1 2

52 Commits