vall-e

Author	SHA1	Message	Date
mrq	ec79230965	shuffled web UI options hidden by cfg.experimental to its own tab, expose early exit selection to inferencing (it kinda works naively, still need to implement self-speculation)	2024-11-01 21:30:06 -05:00
mrq	ef1c17430f	skip step on nan loss (ironically I have not had a nan loss after adding this), throw exception with invalid cfg.dataset.sample_type and sample_order combination (because I was tricked by this in my yaml and had inconsistent vram usage)	2024-11-01 20:54:53 -05:00
mrq	fb8faa295b	actually float16(+AMP) and layerskip is bad and will kill the model......	2024-11-01 18:36:44 -05:00
mrq	edf1e66bf9	layerskip_r=6 fries the model so hard the loss is sub-1...	2024-11-01 17:06:07 -05:00
mrq	9b6c57bc57	third time's the charm (for some reason it escaped me that I should treat early exit loss as an aux_loss to be used with the normal loss, as if I was training a MoE's router)	2024-11-01 12:50:37 -05:00
mrq	76ebef45dc	off-by-one...	2024-10-31 13:24:48 -05:00
mrq	b63293cbbe	ugh	2024-10-30 22:49:11 -05:00
mrq	a22534e8f4	layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this)	2024-10-30 20:05:45 -05:00
mrq	4049f51ba9	added option to load lora directly from the model file itself with --lora	2024-10-26 00:13:10 -05:00
mrq	023c3af331	updated readme to reflect changes	2024-10-25 22:17:05 -05:00
mrq	ccf71dc1b6	added option to load from a model state dict directly instead of a yaml (to-do: do this for LoRAs too), automatically download the default model if none is provided	2024-10-25 22:15:15 -05:00
mrq	a96f5aee32	adjusted how i want to pass eval kwargs	2024-10-25 20:38:09 -05:00
mrq	92e6bff6dc	actually ar temp 0.5 with rep pen 1.125 seems to have the benefits of better outputs without it degrading some of the time but not all the time	2024-10-23 00:03:35 -05:00
mrq	8920e5e86b	actually have beam_width in the webUI work	2024-10-22 22:06:22 -05:00
mrq	910571ad34	too brainlet to diagnose why low temp / greedy sampling is randomly unstable some of the time	2024-10-22 20:13:54 -05:00
mrq	8eb9a4056b	modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling	2024-10-22 18:12:39 -05:00
mrq	1a02cd5bce	modify demo template to say F5 instead of YourTTS, swap LoRA comparison around to make the lora'd the base file, and the no-lora the suffix'd file	2024-10-21 19:52:02 -05:00
mrq	02dfc60ac3	ugh	2024-10-18 17:23:22 -05:00
mrq	71731ed785	added prefixing with silence (was to test something, currently hidden under cfg.experimental=True)	2024-10-18 17:19:52 -05:00
mrq	6b04c13c56	print warning if audio promtpless inferencing with low AR temp (it really doesn't like low temps / greedy sampling)	2024-10-18 17:01:40 -05:00
mrq	c8f31db1de	default to greedy sample AR (i should probably test this more but it seems to pass my harvard sentences and tongue twisters)	2024-10-18 16:58:56 -05:00
mrq	fc8dfd8617	made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar)	2024-10-18 16:55:00 -05:00
mrq	07f4935a75	more tweaks	2024-10-18 13:19:36 -05:00
mrq	0dfab973e7	oops	2024-10-18 09:40:06 -05:00
mrq	75b90be325	cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified	2024-10-17 17:06:48 -05:00
mrq	8b6095f681	saner defaults, maybe	2024-10-17 14:37:21 -05:00
mrq	f88097ccf6	add config option to set the rate of sampling randomly vs similar speakers during training	2024-10-16 14:27:58 -05:00
mrq	48461833c2	ugh	2024-10-15 19:30:43 -05:00
mrq	eea70f5698	kludge fix for an oversight in the model when trying to train for longer input prompt durations......	2024-10-15 19:25:03 -05:00
mrq	84005c5b00	entropix apparently processes the entire sequence of logits but it falls apart when doing that	2024-10-13 12:01:12 -05:00
mrq	c800d28bb8	respect attention defined in the yaml for web UI (which might explain why theres been a discrepancy in outputs for me)	2024-10-13 11:02:24 -05:00
mrq	ed6b7a690f	ugh.........	2024-10-13 00:26:46 -05:00
mrq	d405f243d4	at wits end in trying to output the right attention scores	2024-10-12 23:53:13 -05:00
mrq	70cf694cfd	output attention scores for SDPA/flash, since naive attention seems broken	2024-10-12 12:09:17 -05:00
mrq	541e45263c	ugh	2024-10-12 11:29:16 -05:00
mrq	04e983b86b	modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now	2024-10-12 11:27:55 -05:00
mrq	666e8038fb	ugh	2024-10-12 10:41:35 -05:00
mrq	3d6ef9666b	overridden naive llama attention to get the right score values that entropix needs	2024-10-12 10:05:47 -05:00
mrq	40b089daf3	lol	2024-10-12 09:57:34 -05:00
mrq	d6f7c86a5c	entropix tweaks (it doesn't output garbage but it loves to go for silence)	2024-10-12 09:46:18 -05:00
mrq	d0ab7d755a	added min-p (really does not seem useful since it's very sensitive), more tweaks to entropix	2024-10-11 22:36:06 -05:00
mrq	bef43a0c18	added experimental entropix sampling support	2024-10-11 21:18:26 -05:00
mrq	85d85c1351	more arg creep for demo page	2024-10-10 19:40:01 -05:00
mrq	301468f519	<<	2024-10-10 19:13:52 -05:00
mrq	75a4c866d6	more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI)	2024-10-10 19:04:12 -05:00
mrq	96d05be73c	demo page tweaks	2024-10-10 13:52:37 -05:00
mrq	2ea978f318	added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval	2024-10-10 13:40:25 -05:00
mrq	52299127ab	fix vall_e.emb.process	2024-10-08 20:00:34 -05:00
mrq	0656a762af	fix vall_e.emb.transcriber	2024-10-08 19:24:43 -05:00
mrq	acdce66d4e	readme tweaks, set the (unused) default model download URL back to the base ar+nar-llama-8 model, as ar+nar-tts+stt-llama-8 was renamed back to it since it performs well	2024-10-05 22:53:53 -05:00

1 2 3 4 5 ...

627 Commits