Commit Graph

204 Commits

Author SHA1 Message Date
mrq
69b0b3b854 set timestep tensor to whatever the time embedding's dtype is because it'll gripe under amp 2024-11-09 00:11:16 -06:00
mrq
5a09a5f6e9 I forgot about the time embedding... 2024-11-08 22:46:26 -06:00
mrq
811b15d280 I suppose I just have a shit training method since the sampler is as solid as I can get it............... 2024-11-08 22:05:41 -06:00
mrq
13b54953bd agony 2024-11-08 13:34:39 -06:00
mrq
c127c4e488 'borrowed' a sampling scheduler for NAR-len's RVQ level 0 (better than before, but still not good enough) 2024-11-07 21:19:14 -06:00
mrq
e108c54daf new NAR-len training paradigm...... 2024-11-07 11:32:11 -06:00
mrq
ed174c589e ugh 2024-11-07 09:19:21 -06:00
mrq
5698188824 あたしって、ほんとバカ 2024-11-07 09:10:18 -06:00
mrq
105ed51159 I guess I'll fall for the NAR-len meme again (I don't know where my previous weights are, so I need to train it again to test something) 2024-11-06 19:17:12 -06:00
mrq
9e65e05e83 more windows specific fixes, limit gradio to <5.0.0 on linux (it works on windows, but not on my linux machine tm) 2024-11-04 18:00:33 -06:00
mrq
d229725c76 more adjustments (adjustments of early-exit entropy/varentropy thresholds, default rep pen being 1.5, experimental refine-on-stop, etc.) 2024-11-03 18:31:28 -06:00
mrq
aee08b7307 changed layerskip float16 training warning (since it didnt seem to fry on my 4xV100 system) 2024-11-03 09:58:29 -06:00
mrq
3826f9bae4 saner mask creation? (it doesnt matter, kv cache wont work) 2024-11-02 21:00:21 -05:00
mrq
ded746e157 very, very naive layerskip speculative sampling (it just checks if the current layer's state is good enough) 2024-11-02 11:49:05 -05:00
mrq
ec79230965 shuffled web UI options hidden by cfg.experimental to its own tab, expose early exit selection to inferencing (it kinda works naively, still need to implement self-speculation) 2024-11-01 21:30:06 -05:00
mrq
9b6c57bc57 third time's the charm (for some reason it escaped me that I should treat early exit loss as an aux_loss to be used with the normal loss, as if I was training a MoE's router) 2024-11-01 12:50:37 -05:00
mrq
76ebef45dc off-by-one... 2024-10-31 13:24:48 -05:00
mrq
b63293cbbe ugh 2024-10-30 22:49:11 -05:00
mrq
a22534e8f4 layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this) 2024-10-30 20:05:45 -05:00
mrq
8eb9a4056b modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling 2024-10-22 18:12:39 -05:00
mrq
fc8dfd8617 made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar) 2024-10-18 16:55:00 -05:00
mrq
84005c5b00 entropix apparently processes the entire sequence of logits but it falls apart when doing that 2024-10-13 12:01:12 -05:00
mrq
c800d28bb8 respect attention defined in the yaml for web UI (which might explain why theres been a discrepancy in outputs for me) 2024-10-13 11:02:24 -05:00
mrq
d405f243d4 at wits end in trying to output the right attention scores 2024-10-12 23:53:13 -05:00
mrq
04e983b86b modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now 2024-10-12 11:27:55 -05:00
mrq
666e8038fb ugh 2024-10-12 10:41:35 -05:00
mrq
d6f7c86a5c entropix tweaks (it doesn't output garbage but it loves to go for silence) 2024-10-12 09:46:18 -05:00
mrq
d0ab7d755a added min-p (really does not seem useful since it's very sensitive), more tweaks to entropix 2024-10-11 22:36:06 -05:00
mrq
bef43a0c18 added experimental entropix sampling support 2024-10-11 21:18:26 -05:00
mrq
acdce66d4e readme tweaks, set the (unused) default model download URL back to the base ar+nar-llama-8 model, as ar+nar-tts+stt-llama-8 was renamed back to it since it performs well 2024-10-05 22:53:53 -05:00
mrq
84c7419001 faster 2024-10-04 22:30:47 -05:00
mrq
a507b769a1 sped up inferencing by not doing .tolist() for rep pen / length pen (and a bug fix in the web UI from prev commit) 2024-10-04 22:18:20 -05:00
mrq
54203c059d validated rep pen for STT (sometimes needed to wrangle the model) 2024-09-08 08:30:30 -05:00
mrq
6a967f91b9 oops 2024-09-07 22:13:49 -05:00
mrq
4bd9bb39c8 webui for STT (still need to bake the model to handle it better, a few hours so far has it generate what looks like a normal transcription but does not correlate to the audio right now) 2024-09-06 15:13:04 -05:00
mrq
341e19162b fixes, again 2024-09-06 11:41:41 -05:00
mrq
413097f5f7 fixes 2024-09-05 21:42:59 -05:00
mrq
54547b74d8 experimental implementation of STT (need to actually test on a model, test trainer seems to work) 2024-09-05 20:43:20 -05:00
mrq
b7b99a25f1 added ability to specify attention backend for CLI and webui (because im tired of editing the yaml) 2024-08-26 19:33:51 -05:00
mrq
0d706ec6a1 added fused_attn (triton-based fused attention) and simply just query for flash_attn under rocm 2024-08-26 19:13:34 -05:00
mrq
6b0891448c pain (some shit to try and get some flash attention for ROCm (gfx1100) through triton fused attention but no good) 2024-08-25 20:07:27 -05:00
mrq
40e1799adc fixed xformers and flash_attn to actually work now 2024-08-19 01:03:35 -05:00
mrq
29c35528e5 the sooner I accept there's no FA for V100s the sooner I'll go to bed 2024-08-18 23:54:33 -05:00
mrq
d636edd3a2 added flash_attn LlamaAttention (including flash_attn==1.0.9) 2024-08-18 20:51:14 -05:00
mrq
2a1794c084 ughghghhhh 2024-08-09 21:15:01 -05:00
mrq
d04f6911b4 oops 2024-08-08 19:38:55 -05:00
mrq
949339a3fa do not include SDPA attention if there's no available SDPA backends 2024-08-06 20:42:39 -05:00
mrq
7cdfa3dc0c updated process_datasets.py, added argparsing so I can mostly stop manually editing things, and some other cleanup 2024-08-05 15:59:25 -05:00
mrq
debcc93e7e add adapted MixtralAttention for when I make a bad decision to actually train a MoE 2024-08-04 22:03:22 -05:00
mrq
3a65cc4b22 fix issue with sft and shared tensors... 2024-08-04 19:56:21 -05:00