|
69b0b3b854
|
set timestep tensor to whatever the time embedding's dtype is because it'll gripe under amp
|
2024-11-09 00:11:16 -06:00 |
|
|
5a09a5f6e9
|
I forgot about the time embedding...
|
2024-11-08 22:46:26 -06:00 |
|
|
811b15d280
|
I suppose I just have a shit training method since the sampler is as solid as I can get it...............
|
2024-11-08 22:05:41 -06:00 |
|
|
13b54953bd
|
agony
|
2024-11-08 13:34:39 -06:00 |
|
|
c127c4e488
|
'borrowed' a sampling scheduler for NAR-len's RVQ level 0 (better than before, but still not good enough)
|
2024-11-07 21:19:14 -06:00 |
|
|
e108c54daf
|
new NAR-len training paradigm......
|
2024-11-07 11:32:11 -06:00 |
|
|
ed174c589e
|
ugh
|
2024-11-07 09:19:21 -06:00 |
|
|
5698188824
|
あたしって、ほんとバカ
|
2024-11-07 09:10:18 -06:00 |
|
|
105ed51159
|
I guess I'll fall for the NAR-len meme again (I don't know where my previous weights are, so I need to train it again to test something)
|
2024-11-06 19:17:12 -06:00 |
|
|
9e65e05e83
|
more windows specific fixes, limit gradio to <5.0.0 on linux (it works on windows, but not on my linux machine tm)
|
2024-11-04 18:00:33 -06:00 |
|
|
d229725c76
|
more adjustments (adjustments of early-exit entropy/varentropy thresholds, default rep pen being 1.5, experimental refine-on-stop, etc.)
|
2024-11-03 18:31:28 -06:00 |
|
|
aee08b7307
|
changed layerskip float16 training warning (since it didnt seem to fry on my 4xV100 system)
|
2024-11-03 09:58:29 -06:00 |
|
|
3826f9bae4
|
saner mask creation? (it doesnt matter, kv cache wont work)
|
2024-11-02 21:00:21 -05:00 |
|
|
ded746e157
|
very, very naive layerskip speculative sampling (it just checks if the current layer's state is good enough)
|
2024-11-02 11:49:05 -05:00 |
|
|
ec79230965
|
shuffled web UI options hidden by cfg.experimental to its own tab, expose early exit selection to inferencing (it kinda works naively, still need to implement self-speculation)
|
2024-11-01 21:30:06 -05:00 |
|
|
9b6c57bc57
|
third time's the charm (for some reason it escaped me that I should treat early exit loss as an aux_loss to be used with the normal loss, as if I was training a MoE's router)
|
2024-11-01 12:50:37 -05:00 |
|
|
76ebef45dc
|
off-by-one...
|
2024-10-31 13:24:48 -05:00 |
|
|
b63293cbbe
|
ugh
|
2024-10-30 22:49:11 -05:00 |
|
|
a22534e8f4
|
layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this)
|
2024-10-30 20:05:45 -05:00 |
|
|
8eb9a4056b
|
modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling
|
2024-10-22 18:12:39 -05:00 |
|
|
fc8dfd8617
|
made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar)
|
2024-10-18 16:55:00 -05:00 |
|
|
84005c5b00
|
entropix apparently processes the entire sequence of logits but it falls apart when doing that
|
2024-10-13 12:01:12 -05:00 |
|
|
c800d28bb8
|
respect attention defined in the yaml for web UI (which might explain why theres been a discrepancy in outputs for me)
|
2024-10-13 11:02:24 -05:00 |
|
|
d405f243d4
|
at wits end in trying to output the right attention scores
|
2024-10-12 23:53:13 -05:00 |
|
|
04e983b86b
|
modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now
|
2024-10-12 11:27:55 -05:00 |
|
|
666e8038fb
|
ugh
|
2024-10-12 10:41:35 -05:00 |
|
|
d6f7c86a5c
|
entropix tweaks (it doesn't output garbage but it loves to go for silence)
|
2024-10-12 09:46:18 -05:00 |
|
|
d0ab7d755a
|
added min-p (really does not seem useful since it's very sensitive), more tweaks to entropix
|
2024-10-11 22:36:06 -05:00 |
|
|
bef43a0c18
|
added experimental entropix sampling support
|
2024-10-11 21:18:26 -05:00 |
|
|
acdce66d4e
|
readme tweaks, set the (unused) default model download URL back to the base ar+nar-llama-8 model, as ar+nar-tts+stt-llama-8 was renamed back to it since it performs well
|
2024-10-05 22:53:53 -05:00 |
|
|
84c7419001
|
faster
|
2024-10-04 22:30:47 -05:00 |
|
|
a507b769a1
|
sped up inferencing by not doing .tolist() for rep pen / length pen (and a bug fix in the web UI from prev commit)
|
2024-10-04 22:18:20 -05:00 |
|
|
54203c059d
|
validated rep pen for STT (sometimes needed to wrangle the model)
|
2024-09-08 08:30:30 -05:00 |
|
|
6a967f91b9
|
oops
|
2024-09-07 22:13:49 -05:00 |
|
|
4bd9bb39c8
|
webui for STT (still need to bake the model to handle it better, a few hours so far has it generate what looks like a normal transcription but does not correlate to the audio right now)
|
2024-09-06 15:13:04 -05:00 |
|
|
341e19162b
|
fixes, again
|
2024-09-06 11:41:41 -05:00 |
|
|
413097f5f7
|
fixes
|
2024-09-05 21:42:59 -05:00 |
|
|
54547b74d8
|
experimental implementation of STT (need to actually test on a model, test trainer seems to work)
|
2024-09-05 20:43:20 -05:00 |
|
|
b7b99a25f1
|
added ability to specify attention backend for CLI and webui (because im tired of editing the yaml)
|
2024-08-26 19:33:51 -05:00 |
|
|
0d706ec6a1
|
added fused_attn (triton-based fused attention) and simply just query for flash_attn under rocm
|
2024-08-26 19:13:34 -05:00 |
|
|
6b0891448c
|
pain (some shit to try and get some flash attention for ROCm (gfx1100) through triton fused attention but no good)
|
2024-08-25 20:07:27 -05:00 |
|
|
40e1799adc
|
fixed xformers and flash_attn to actually work now
|
2024-08-19 01:03:35 -05:00 |
|
|
29c35528e5
|
the sooner I accept there's no FA for V100s the sooner I'll go to bed
|
2024-08-18 23:54:33 -05:00 |
|
|
d636edd3a2
|
added flash_attn LlamaAttention (including flash_attn==1.0.9)
|
2024-08-18 20:51:14 -05:00 |
|
|
2a1794c084
|
ughghghhhh
|
2024-08-09 21:15:01 -05:00 |
|
|
d04f6911b4
|
oops
|
2024-08-08 19:38:55 -05:00 |
|
|
949339a3fa
|
do not include SDPA attention if there's no available SDPA backends
|
2024-08-06 20:42:39 -05:00 |
|
|
7cdfa3dc0c
|
updated process_datasets.py, added argparsing so I can mostly stop manually editing things, and some other cleanup
|
2024-08-05 15:59:25 -05:00 |
|
|
debcc93e7e
|
add adapted MixtralAttention for when I make a bad decision to actually train a MoE
|
2024-08-04 22:03:22 -05:00 |
|
|
3a65cc4b22
|
fix issue with sft and shared tensors...
|
2024-08-04 19:56:21 -05:00 |
|