vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	c00fc18b62	actually use the right embedding for nar-len	2024-11-13 18:04:04 -06:00
mrq	3ea8a610d6	fix STT	2024-11-13 14:27:15 -06:00
mrq	910033343c	overhauled how the right resp level / classifier gets picked to avoid cringemath	2024-11-13 13:31:17 -06:00
mrq	269648605e	move NAR-len rvq level 0 to separate embedding	2024-11-13 11:38:58 -06:00
mrq	29e45be0b4	tweaks to bucket sampling	2024-11-13 11:09:24 -06:00
mrq	b2eca271a8	ugh	2024-11-13 10:35:44 -06:00
mrq	be83ddabaa	better causal-ness for split loss calc, and also do masking for NAR-len for it	2024-11-13 10:17:52 -06:00
mrq	6b76419123	ugh	2024-11-13 09:54:20 -06:00
mrq	ad7cfffc00	NAR-len RVQ-0 was being trained causally.............	2024-11-13 09:43:50 -06:00
mrq	976ee87f6f	resume iteration step in tqdm trainer, warn to logger if the sampler state dict was invalidated	2024-11-13 09:09:28 -06:00
mrq	8286aa54c8	do not pass timestep token/embedding since it doesn't seem to matter at all after all, fixed training masking rate to 80% because a paper said so	2024-11-13 09:07:10 -06:00
mrq	caf721c67b	set it to zero because it'll make the stop token hide more often than not	2024-11-12 22:30:50 -06:00
mrq	0f2584eba7	new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?)	2024-11-12 22:30:09 -06:00
mrq	663f07038d	haha... (do not create a token dropout/noise mask when not training (this sadly didnt fix NAR-len output))	2024-11-12 16:41:58 -06:00
mrq	b09328069e	actually do CFG sampling for base AR+NAR tasks	2024-11-12 13:42:39 -06:00
mrq	2495a7ef67	Fixed STT in the web UI	2024-11-12 12:49:53 -06:00
mrq	8927bad7bc	actually fixed rep pen (for ar and nar, it seems to help with nar unmasking)	2024-11-11 21:40:19 -06:00
mrq	ec92613847	actually pass input prompt length size to inference	2024-11-11 20:39:48 -06:00
mrq	b1df6a7bed	reverted rep pen sampler due to a regression	2024-11-11 20:35:08 -06:00
mrq	b1f4db39c8	threw in CFG sampling for normal model as well to experiment with	2024-11-11 20:27:38 -06:00
mrq	2f56696506	overhauled inference/sampler kwargs to stop being a bloated mess	2024-11-11 20:21:16 -06:00
mrq	354f8e059d	store dataset hash alongside state dict so it can be ignored if mismatched	2024-11-11 18:16:56 -06:00
mrq	f7b8b1e825	dropped subtrain dataloader since its useless to duplicate	2024-11-11 17:00:49 -06:00
mrq	cf9df71f2c	use homwbrewed caching system for dataloader paths / durations (I'm pretty sure I am now triggering OOM killers with my entire dataset used)	2024-11-11 16:32:08 -06:00
mrq	a748e223ce	tweaks	2024-11-11 12:40:41 -06:00
mrq	48490757da	fixes	2024-11-10 20:37:50 -06:00
mrq	9def34cd66	lol	2024-11-10 12:48:41 -06:00
mrq	9cb0b6901b	unified nar.py into ar_nar.py	2024-11-10 12:19:48 -06:00
mrq	a9d2faf2d7	all I can do now until I wait for the model to (re)train for pure NAR	2024-11-09 22:57:34 -06:00
mrq	ad7e290a5e	ugh (ROCm seems to silently clamp any token value >= logits.shape[-1] for loss calculation, while cuda will throw an assert, making it hard to find this dumb fuckup)	2024-11-09 19:40:02 -06:00
mrq	943fe70c10	I don't know why this fixes an assert thrown but it does	2024-11-09 19:04:13 -06:00
mrq	f50d92ba6c	Almost made a mistake	2024-11-09 18:12:54 -06:00
mrq	c6a38693a2	This better work	2024-11-09 18:04:59 -06:00
mrq	8b3d1cf70a	Something's Wrong	2024-11-09 15:07:43 -06:00
mrq	dcd5fecff3	some cleanup while I wait for the NAR-len to train to an acceptable state (currently it performs okay, but only on audo after 3 seconds or so)	2024-11-09 12:12:46 -06:00
mrq	69b0b3b854	set timestep tensor to whatever the time embedding's dtype is because it'll gripe under amp	2024-11-09 00:11:16 -06:00
mrq	5a09a5f6e9	I forgot about the time embedding...	2024-11-08 22:46:26 -06:00
mrq	811b15d280	I suppose I just have a shit training method since the sampler is as solid as I can get it...............	2024-11-08 22:05:41 -06:00
mrq	13b54953bd	agony	2024-11-08 13:34:39 -06:00
mrq	c127c4e488	'borrowed' a sampling scheduler for NAR-len's RVQ level 0 (better than before, but still not good enough)	2024-11-07 21:19:14 -06:00
mrq	e108c54daf	new NAR-len training paradigm......	2024-11-07 11:32:11 -06:00
mrq	ed174c589e	ugh	2024-11-07 09:19:21 -06:00
mrq	d13ab00ad8	one more note	2024-11-07 09:11:21 -06:00
mrq	5698188824	あたしって、ほんとバカ	2024-11-07 09:10:18 -06:00
mrq	77ff23e319	repeat extend the prom to fill the initial tokens for nar-len (it somewhat works, the model just needs to train more)	2024-11-06 23:29:53 -06:00
mrq	a3bc26f7ec	ugh	2024-11-06 23:16:28 -06:00
mrq	d606a693ff	eval fix for nar-len	2024-11-06 23:14:16 -06:00
mrq	105ed51159	I guess I'll fall for the NAR-len meme again (I don't know where my previous weights are, so I need to train it again to test something)	2024-11-06 19:17:12 -06:00
mrq	bcabde3454	more notes	2024-11-06 13:51:28 -06:00
mrq	bfc5e1d723	agony	2024-11-05 22:30:49 -06:00
mrq	aefe8fcdad	UGH	2024-11-05 22:13:58 -06:00
mrq	556d9db0d5	web UI support for HF ZeroGPU	2024-11-05 21:38:02 -06:00
mrq	e58a9469a3	move layerskip to experimental settings.......	2024-11-05 20:37:06 -06:00
mrq	bbc2de3713	ugh	2024-11-05 11:50:05 -06:00
mrq	9e65e05e83	more windows specific fixes, limit gradio to <5.0.0 on linux (it works on windows, but not on my linux machine tm)	2024-11-04 18:00:33 -06:00
mrq	c83670c38c	Windows specific fixes (to-do: find libespeak-ng.dll automatically because it cannot be trusted to do it by default)	2024-11-03 19:19:15 -06:00
mrq	d229725c76	more adjustments (adjustments of early-exit entropy/varentropy thresholds, default rep pen being 1.5, experimental refine-on-stop, etc.)	2024-11-03 18:31:28 -06:00
mrq	aee08b7307	changed layerskip float16 training warning (since it didnt seem to fry on my 4xV100 system)	2024-11-03 09:58:29 -06:00
mrq	3826f9bae4	saner mask creation? (it doesnt matter, kv cache wont work)	2024-11-02 21:00:21 -05:00
mrq	ded746e157	very, very naive layerskip speculative sampling (it just checks if the current layer's state is good enough)	2024-11-02 11:49:05 -05:00
mrq	62fe5b0943	ughh	2024-11-01 22:36:48 -05:00
mrq	ec79230965	shuffled web UI options hidden by cfg.experimental to its own tab, expose early exit selection to inferencing (it kinda works naively, still need to implement self-speculation)	2024-11-01 21:30:06 -05:00
mrq	ef1c17430f	skip step on nan loss (ironically I have not had a nan loss after adding this), throw exception with invalid cfg.dataset.sample_type and sample_order combination (because I was tricked by this in my yaml and had inconsistent vram usage)	2024-11-01 20:54:53 -05:00
mrq	fb8faa295b	actually float16(+AMP) and layerskip is bad and will kill the model......	2024-11-01 18:36:44 -05:00
mrq	edf1e66bf9	layerskip_r=6 fries the model so hard the loss is sub-1...	2024-11-01 17:06:07 -05:00
mrq	9b6c57bc57	third time's the charm (for some reason it escaped me that I should treat early exit loss as an aux_loss to be used with the normal loss, as if I was training a MoE's router)	2024-11-01 12:50:37 -05:00
mrq	76ebef45dc	off-by-one...	2024-10-31 13:24:48 -05:00
mrq	b63293cbbe	ugh	2024-10-30 22:49:11 -05:00
mrq	a22534e8f4	layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this)	2024-10-30 20:05:45 -05:00
mrq	4049f51ba9	added option to load lora directly from the model file itself with --lora	2024-10-26 00:13:10 -05:00
mrq	ccf71dc1b6	added option to load from a model state dict directly instead of a yaml (to-do: do this for LoRAs too), automatically download the default model if none is provided	2024-10-25 22:15:15 -05:00
mrq	a96f5aee32	adjusted how i want to pass eval kwargs	2024-10-25 20:38:09 -05:00
mrq	92e6bff6dc	actually ar temp 0.5 with rep pen 1.125 seems to have the benefits of better outputs without it degrading some of the time but not all the time	2024-10-23 00:03:35 -05:00
mrq	8920e5e86b	actually have beam_width in the webUI work	2024-10-22 22:06:22 -05:00
mrq	910571ad34	too brainlet to diagnose why low temp / greedy sampling is randomly unstable some of the time	2024-10-22 20:13:54 -05:00
mrq	8eb9a4056b	modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling	2024-10-22 18:12:39 -05:00
mrq	1a02cd5bce	modify demo template to say F5 instead of YourTTS, swap LoRA comparison around to make the lora'd the base file, and the no-lora the suffix'd file	2024-10-21 19:52:02 -05:00
mrq	02dfc60ac3	ugh	2024-10-18 17:23:22 -05:00
mrq	71731ed785	added prefixing with silence (was to test something, currently hidden under cfg.experimental=True)	2024-10-18 17:19:52 -05:00
mrq	6b04c13c56	print warning if audio promtpless inferencing with low AR temp (it really doesn't like low temps / greedy sampling)	2024-10-18 17:01:40 -05:00
mrq	c8f31db1de	default to greedy sample AR (i should probably test this more but it seems to pass my harvard sentences and tongue twisters)	2024-10-18 16:58:56 -05:00
mrq	fc8dfd8617	made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar)	2024-10-18 16:55:00 -05:00
mrq	07f4935a75	more tweaks	2024-10-18 13:19:36 -05:00
mrq	0dfab973e7	oops	2024-10-18 09:40:06 -05:00
mrq	75b90be325	cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified	2024-10-17 17:06:48 -05:00
mrq	8b6095f681	saner defaults, maybe	2024-10-17 14:37:21 -05:00
mrq	f88097ccf6	add config option to set the rate of sampling randomly vs similar speakers during training	2024-10-16 14:27:58 -05:00
mrq	48461833c2	ugh	2024-10-15 19:30:43 -05:00
mrq	eea70f5698	kludge fix for an oversight in the model when trying to train for longer input prompt durations......	2024-10-15 19:25:03 -05:00
mrq	84005c5b00	entropix apparently processes the entire sequence of logits but it falls apart when doing that	2024-10-13 12:01:12 -05:00
mrq	c800d28bb8	respect attention defined in the yaml for web UI (which might explain why theres been a discrepancy in outputs for me)	2024-10-13 11:02:24 -05:00
mrq	ed6b7a690f	ugh.........	2024-10-13 00:26:46 -05:00
mrq	d405f243d4	at wits end in trying to output the right attention scores	2024-10-12 23:53:13 -05:00
mrq	70cf694cfd	output attention scores for SDPA/flash, since naive attention seems broken	2024-10-12 12:09:17 -05:00
mrq	541e45263c	ugh	2024-10-12 11:29:16 -05:00
mrq	04e983b86b	modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now	2024-10-12 11:27:55 -05:00
mrq	666e8038fb	ugh	2024-10-12 10:41:35 -05:00
mrq	3d6ef9666b	overridden naive llama attention to get the right score values that entropix needs	2024-10-12 10:05:47 -05:00
mrq	40b089daf3	lol	2024-10-12 09:57:34 -05:00
mrq	d6f7c86a5c	entropix tweaks (it doesn't output garbage but it loves to go for silence)	2024-10-12 09:46:18 -05:00

1 2 3 4 5 ...

619 Commits