vall-e

mrq/vall-e

Author	SHA1	Message	Date
mrq	42fafbaaca	actually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps)	2024-12-06 21:55:20 -06:00
mrq	23d402bf01	added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......)	2024-12-05 23:05:52 -06:00
mrq	84a05acb6d	touch ups in docs	2024-12-02 19:10:42 -06:00
mrq	4aa685e749	what has science done	2024-11-22 16:45:40 -06:00
mrq	147219a5e0	huge oversight in the attention masking......... (i realized I have not been providing a non-causal mask to non-causal tasks)	2024-11-22 13:44:43 -06:00
mrq	8aafae91fd	dont use timeembedding	2024-11-21 23:14:52 -06:00
mrq	2cef97e43f	cleanup	2024-11-21 23:08:43 -06:00
mrq	190a917b3e	I did it.	2024-11-19 12:24:33 -06:00
mrq	6cfdf94bf9	swap priority to use nar-len if available, added notes	2024-11-18 09:40:04 -06:00
mrq	069b27570f	set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)	2024-11-17 17:04:07 -06:00
mrq	88d840218d	default set cfg strength to 3.0 since the reference model is updated	2024-11-17 10:23:40 -06:00
mrq	a3e1fa3518	ugh	2024-11-17 09:28:33 -06:00
mrq	23fdba0c98	tweaks and changes	2024-11-16 15:49:06 -06:00
mrq	2fbeacfe92	ugh	2024-11-14 22:18:33 -06:00
mrq	39096f8ff3	redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........)	2024-11-14 22:17:47 -06:00
mrq	e412e98125	ugh	2024-11-14 07:34:22 -06:00
mrq	c00fc18b62	actually use the right embedding for nar-len	2024-11-13 18:04:04 -06:00
mrq	3ea8a610d6	fix STT	2024-11-13 14:27:15 -06:00
mrq	910033343c	overhauled how the right resp level / classifier gets picked to avoid cringemath	2024-11-13 13:31:17 -06:00
mrq	269648605e	move NAR-len rvq level 0 to separate embedding	2024-11-13 11:38:58 -06:00
mrq	be83ddabaa	better causal-ness for split loss calc, and also do masking for NAR-len for it	2024-11-13 10:17:52 -06:00
mrq	6b76419123	ugh	2024-11-13 09:54:20 -06:00
mrq	ad7cfffc00	NAR-len RVQ-0 was being trained causally.............	2024-11-13 09:43:50 -06:00
mrq	8286aa54c8	do not pass timestep token/embedding since it doesn't seem to matter at all after all, fixed training masking rate to 80% because a paper said so	2024-11-13 09:07:10 -06:00
mrq	0f2584eba7	new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?)	2024-11-12 22:30:09 -06:00
mrq	663f07038d	haha... (do not create a token dropout/noise mask when not training (this sadly didnt fix NAR-len output))	2024-11-12 16:41:58 -06:00
mrq	8927bad7bc	actually fixed rep pen (for ar and nar, it seems to help with nar unmasking)	2024-11-11 21:40:19 -06:00
mrq	2f56696506	overhauled inference/sampler kwargs to stop being a bloated mess	2024-11-11 20:21:16 -06:00
mrq	9cb0b6901b	unified nar.py into ar_nar.py	2024-11-10 12:19:48 -06:00
mrq	a9d2faf2d7	all I can do now until I wait for the model to (re)train for pure NAR	2024-11-09 22:57:34 -06:00
mrq	ad7e290a5e	ugh (ROCm seems to silently clamp any token value >= logits.shape[-1] for loss calculation, while cuda will throw an assert, making it hard to find this dumb fuckup)	2024-11-09 19:40:02 -06:00
mrq	943fe70c10	I don't know why this fixes an assert thrown but it does	2024-11-09 19:04:13 -06:00
mrq	f50d92ba6c	Almost made a mistake	2024-11-09 18:12:54 -06:00
mrq	c6a38693a2	This better work	2024-11-09 18:04:59 -06:00
mrq	8b3d1cf70a	Something's Wrong	2024-11-09 15:07:43 -06:00
mrq	69b0b3b854	set timestep tensor to whatever the time embedding's dtype is because it'll gripe under amp	2024-11-09 00:11:16 -06:00
mrq	5a09a5f6e9	I forgot about the time embedding...	2024-11-08 22:46:26 -06:00
mrq	811b15d280	I suppose I just have a shit training method since the sampler is as solid as I can get it...............	2024-11-08 22:05:41 -06:00
mrq	13b54953bd	agony	2024-11-08 13:34:39 -06:00
mrq	c127c4e488	'borrowed' a sampling scheduler for NAR-len's RVQ level 0 (better than before, but still not good enough)	2024-11-07 21:19:14 -06:00
mrq	e108c54daf	new NAR-len training paradigm......	2024-11-07 11:32:11 -06:00
mrq	ed174c589e	ugh	2024-11-07 09:19:21 -06:00
mrq	5698188824	あたしって、ほんとバカ	2024-11-07 09:10:18 -06:00
mrq	105ed51159	I guess I'll fall for the NAR-len meme again (I don't know where my previous weights are, so I need to train it again to test something)	2024-11-06 19:17:12 -06:00
mrq	9e65e05e83	more windows specific fixes, limit gradio to <5.0.0 on linux (it works on windows, but not on my linux machine tm)	2024-11-04 18:00:33 -06:00
mrq	d229725c76	more adjustments (adjustments of early-exit entropy/varentropy thresholds, default rep pen being 1.5, experimental refine-on-stop, etc.)	2024-11-03 18:31:28 -06:00
mrq	aee08b7307	changed layerskip float16 training warning (since it didnt seem to fry on my 4xV100 system)	2024-11-03 09:58:29 -06:00
mrq	3826f9bae4	saner mask creation? (it doesnt matter, kv cache wont work)	2024-11-02 21:00:21 -05:00
mrq	ded746e157	very, very naive layerskip speculative sampling (it just checks if the current layer's state is good enough)	2024-11-02 11:49:05 -05:00
mrq	ec79230965	shuffled web UI options hidden by cfg.experimental to its own tab, expose early exit selection to inferencing (it kinda works naively, still need to implement self-speculation)	2024-11-01 21:30:06 -05:00
mrq	9b6c57bc57	third time's the charm (for some reason it escaped me that I should treat early exit loss as an aux_loss to be used with the normal loss, as if I was training a MoE's router)	2024-11-01 12:50:37 -05:00
mrq	76ebef45dc	off-by-one...	2024-10-31 13:24:48 -05:00
mrq	b63293cbbe	ugh	2024-10-30 22:49:11 -05:00
mrq	a22534e8f4	layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this)	2024-10-30 20:05:45 -05:00
mrq	8eb9a4056b	modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling	2024-10-22 18:12:39 -05:00
mrq	fc8dfd8617	made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar)	2024-10-18 16:55:00 -05:00
mrq	84005c5b00	entropix apparently processes the entire sequence of logits but it falls apart when doing that	2024-10-13 12:01:12 -05:00
mrq	c800d28bb8	respect attention defined in the yaml for web UI (which might explain why theres been a discrepancy in outputs for me)	2024-10-13 11:02:24 -05:00
mrq	d405f243d4	at wits end in trying to output the right attention scores	2024-10-12 23:53:13 -05:00
mrq	04e983b86b	modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now	2024-10-12 11:27:55 -05:00
mrq	666e8038fb	ugh	2024-10-12 10:41:35 -05:00
mrq	d6f7c86a5c	entropix tweaks (it doesn't output garbage but it loves to go for silence)	2024-10-12 09:46:18 -05:00
mrq	d0ab7d755a	added min-p (really does not seem useful since it's very sensitive), more tweaks to entropix	2024-10-11 22:36:06 -05:00
mrq	bef43a0c18	added experimental entropix sampling support	2024-10-11 21:18:26 -05:00
mrq	acdce66d4e	readme tweaks, set the (unused) default model download URL back to the base ar+nar-llama-8 model, as ar+nar-tts+stt-llama-8 was renamed back to it since it performs well	2024-10-05 22:53:53 -05:00
mrq	84c7419001	faster	2024-10-04 22:30:47 -05:00
mrq	a507b769a1	sped up inferencing by not doing .tolist() for rep pen / length pen (and a bug fix in the web UI from prev commit)	2024-10-04 22:18:20 -05:00
mrq	54203c059d	validated rep pen for STT (sometimes needed to wrangle the model)	2024-09-08 08:30:30 -05:00
mrq	6a967f91b9	oops	2024-09-07 22:13:49 -05:00
mrq	4bd9bb39c8	webui for STT (still need to bake the model to handle it better, a few hours so far has it generate what looks like a normal transcription but does not correlate to the audio right now)	2024-09-06 15:13:04 -05:00
mrq	341e19162b	fixes, again	2024-09-06 11:41:41 -05:00
mrq	413097f5f7	fixes	2024-09-05 21:42:59 -05:00
mrq	54547b74d8	experimental implementation of STT (need to actually test on a model, test trainer seems to work)	2024-09-05 20:43:20 -05:00
mrq	b7b99a25f1	added ability to specify attention backend for CLI and webui (because im tired of editing the yaml)	2024-08-26 19:33:51 -05:00
mrq	0d706ec6a1	added fused_attn (triton-based fused attention) and simply just query for flash_attn under rocm	2024-08-26 19:13:34 -05:00
mrq	6b0891448c	pain (some shit to try and get some flash attention for ROCm (gfx1100) through triton fused attention but no good)	2024-08-25 20:07:27 -05:00
mrq	40e1799adc	fixed xformers and flash_attn to actually work now	2024-08-19 01:03:35 -05:00
mrq	29c35528e5	the sooner I accept there's no FA for V100s the sooner I'll go to bed	2024-08-18 23:54:33 -05:00
mrq	d636edd3a2	added flash_attn LlamaAttention (including flash_attn==1.0.9)	2024-08-18 20:51:14 -05:00
mrq	2a1794c084	ughghghhhh	2024-08-09 21:15:01 -05:00
mrq	d04f6911b4	oops	2024-08-08 19:38:55 -05:00
mrq	949339a3fa	do not include SDPA attention if there's no available SDPA backends	2024-08-06 20:42:39 -05:00
mrq	7cdfa3dc0c	updated process_datasets.py, added argparsing so I can mostly stop manually editing things, and some other cleanup	2024-08-05 15:59:25 -05:00
mrq	debcc93e7e	add adapted MixtralAttention for when I make a bad decision to actually train a MoE	2024-08-04 22:03:22 -05:00
mrq	3a65cc4b22	fix issue with sft and shared tensors...	2024-08-04 19:56:21 -05:00
mrq	23f3b56fda	oops	2024-08-04 08:18:57 -05:00
mrq	6a733eb2ed	changed torch.Tensor().to(device, dtype) to just torch.tensor(..., device, dtype) because it's been bothering my autism that I'm creating tensors then converting rather than creating with the right device/dtype, some 'optimization' to compile the model but it doesnt seem to do anything useful	2024-08-03 22:10:21 -05:00
mrq	d0a5c7eca2	more coping with the NAR len	2024-08-03 20:23:36 -05:00
mrq	11fa3da665	some cleanup, fixed the wrapper attention to explicitly use other sdpa backends	2024-08-03 19:51:00 -05:00
mrq	9564ecda43	wrapper attention class for other sdpa backends + xformers seems to have broke...	2024-08-03 15:12:11 -05:00
mrq	9e1989be1b	tweaked initial NAR pass's initial token embeddings to use a different value, or osmething	2024-08-03 09:01:37 -05:00
mrq	26f74c5739	somehow fixed non-unified position IDs for the NAR-len	2024-08-03 08:43:42 -05:00
mrq	66407e5bdb	tweaks for the NAR-len model, maybe	2024-08-03 08:40:39 -05:00
mrq	97c5241bef	fixes, throw an exception when using NAR only model with non-unified position IDs, since for some reason it outputs garbage for the NAR	2024-08-02 22:25:49 -05:00
mrq	b4c895114c	naive model offloading support (handles automatically splitting parts of the model to requested device per memory constraints, either inferred or requested in the yaml, input tensors are automatically migrated to the right device, it SEEMS to work for training under the test trainer when split between GPU and CPU) (this was specifically only because that Flux imagegen model released so I can test it there)	2024-08-01 20:12:06 -05:00
mrq	387358bc8a	fixes for the NAR-len model, and documentation some config options, and a better way to handle resizing modules on state_dict load	2024-07-31 20:35:09 -05:00
mrq	07f8e2ad06	added option to set the causal size (how many tokens to sample per AR step), but requires the model to be trained for this (which explains why recurrent chunk sampling just doesn't work for the retnet tests, obvious in hindsight)	2024-07-30 20:53:51 -05:00
mrq	ebf848d249	possible speedup for samplers that require a list of previous tokens (the DRY sampler made me realize that I should copy the tolist() thing from the rep pen sampler for everything else)	2024-07-29 20:23:26 -05:00
mrq	55b0121b1a	trying (and failing) to nail a weird regression in fancier attentions	2024-07-29 19:53:37 -05:00
mrq	c2f5b916fc	added what I think is DRY sampling	2024-07-29 19:15:07 -05:00

1 2 3 4 5 ...

289 Commits