|
02a8bcbe29
|
fixed errant index error (although it makes me wonder if my segmented masking is still flawed)
|
2025-03-21 23:41:34 -05:00 |
|
|
d1d91295b3
|
add segmented sliding attention, also found a bug with prom-less segments in the attention mask generation.........
|
2025-03-21 19:05:49 -05:00 |
|
|
5479d2eacc
|
more tweaks to the new implementation (properly trim the len stuff to save some params, decoder to d_ffn expansion to 2 to maybe also make it faster, etc.)
|
2025-03-18 19:34:37 -05:00 |
|
|
5c512717a6
|
len prediction for new model (and remove logit normalization since it kills inferencing)
|
2025-03-11 20:33:09 -05:00 |
|
|
5670fcb23f
|
hopefully the final tweaks needed for this bastard of a model
|
2025-03-10 20:59:11 -05:00 |
|
|
00d1fed217
|
another optimization (within the dataloader because the similar utterance sampler was mondo slow)
|
2025-03-08 17:10:50 -06:00 |
|
|
89e52b9877
|
ugh
|
2025-03-07 13:55:57 -06:00 |
|
|
6afc2b7526
|
gut feeling to change the attention mask
|
2025-03-07 13:51:59 -06:00 |
|
|
2fb2b732fc
|
wow that was fast
|
2025-03-04 23:17:18 -06:00 |
|
|
462f71e2f7
|
ugh
|
2025-03-04 14:57:00 -06:00 |
|
|
1cd24f3381
|
a birdie tells me i should probably use a different optimizer (also preliminary support for native sparse attention but I don't know if I'll use it)
|
2025-03-04 14:53:02 -06:00 |
|
|
b97faa8173
|
fixes...
|
2025-02-28 18:53:07 -06:00 |
|
|
a174c33db6
|
a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow)
|
2025-02-28 17:56:50 -06:00 |
|
|
0a45c9c042
|
fix attention backend not being used
|
2025-02-27 21:38:38 -06:00 |
|
|
eff180248c
|
decoupled llama backend to avoid any funny changes from transformers, removed other backends since i dont think i'll ever bother using them
|
2025-02-27 19:00:37 -06:00 |
|
|
6634d07576
|
added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed
|
2025-02-23 11:22:13 -06:00 |
|
|
04fef5dad5
|
agony
|
2025-02-12 00:18:24 -06:00 |
|
|
075ffef68a
|
ugh
|
2025-02-09 13:02:51 -06:00 |
|
|
bb2ebe1ca2
|
fixed issues that may rise from updating transformers with attention, added nvidia/audio-codec-44khz backend support (by gutting everything necessary because I do NOT want to install more dependencies
|
2025-02-04 20:30:07 -06:00 |
|
|
0841f366e8
|
I should really just grab modelling_llama wholesale (fix for the adapted attention class)
|
2025-01-28 21:55:05 -06:00 |
|
|
e5f9da2221
|
oops
|
2025-01-21 11:59:24 -06:00 |
|
|
69c1d2991f
|
updated mixtral backend (need this for something else)
|
2025-01-20 21:50:56 -06:00 |
|
|
ca31da0a95
|
sageattn (forgot to bother with testing this the other day, seems ifne)
|
2024-12-03 15:14:57 -06:00 |
|
|
84a05acb6d
|
touch ups in docs
|
2024-12-02 19:10:42 -06:00 |
|
|
dcaf38b359
|
fixed training tqdm being stubborn
|
2024-11-23 09:45:23 -06:00 |
|
|
41d7c30ea5
|
added much cleaner non-causal mask generation
|
2024-11-22 19:43:32 -06:00 |
|
|
c99a74e834
|
actually generate a causal mask because it seems sometimes it does not actually generate one because it makes assumptions
|
2024-11-22 18:30:24 -06:00 |
|
|
ccee5fc11c
|
that was actually all pointless since sdpa always had an attention mask fed to it and does not need is_causal to implicitly generate one
|
2024-11-22 16:51:50 -06:00 |
|
|
4aa685e749
|
what has science done
|
2024-11-22 16:45:40 -06:00 |
|
|
147219a5e0
|
huge oversight in the attention masking......... (i realized I have not been providing a non-causal mask to non-causal tasks)
|
2024-11-22 13:44:43 -06:00 |
|
|
24d888c47c
|
temporarily dropping support for xformers because it's breaking when using an attention mask (which i dont remember commenting it out when being passed), default to not use wandb because it's being a pain when doing tests and not actual sessionsS)
|
2024-11-22 11:29:12 -06:00 |
|
|
2cef97e43f
|
cleanup
|
2024-11-21 23:08:43 -06:00 |
|
|
c6a38693a2
|
This better work
|
2024-11-09 18:04:59 -06:00 |
|
|
c83670c38c
|
Windows specific fixes (to-do: find libespeak-ng.dll automatically because it cannot be trusted to do it by default)
|
2024-11-03 19:19:15 -06:00 |
|
|
ded746e157
|
very, very naive layerskip speculative sampling (it just checks if the current layer's state is good enough)
|
2024-11-02 11:49:05 -05:00 |
|
|
ec79230965
|
shuffled web UI options hidden by cfg.experimental to its own tab, expose early exit selection to inferencing (it kinda works naively, still need to implement self-speculation)
|
2024-11-01 21:30:06 -05:00 |
|
|
fb8faa295b
|
actually float16(+AMP) and layerskip is bad and will kill the model......
|
2024-11-01 18:36:44 -05:00 |
|
|
9b6c57bc57
|
third time's the charm (for some reason it escaped me that I should treat early exit loss as an aux_loss to be used with the normal loss, as if I was training a MoE's router)
|
2024-11-01 12:50:37 -05:00 |
|
|
76ebef45dc
|
off-by-one...
|
2024-10-31 13:24:48 -05:00 |
|
|
b63293cbbe
|
ugh
|
2024-10-30 22:49:11 -05:00 |
|
|
a22534e8f4
|
layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this)
|
2024-10-30 20:05:45 -05:00 |
|
|
fc8dfd8617
|
made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar)
|
2024-10-18 16:55:00 -05:00 |
|
|
84005c5b00
|
entropix apparently processes the entire sequence of logits but it falls apart when doing that
|
2024-10-13 12:01:12 -05:00 |
|
|
c800d28bb8
|
respect attention defined in the yaml for web UI (which might explain why theres been a discrepancy in outputs for me)
|
2024-10-13 11:02:24 -05:00 |
|
|
ed6b7a690f
|
ugh.........
|
2024-10-13 00:26:46 -05:00 |
|
|
d405f243d4
|
at wits end in trying to output the right attention scores
|
2024-10-12 23:53:13 -05:00 |
|
|
70cf694cfd
|
output attention scores for SDPA/flash, since naive attention seems broken
|
2024-10-12 12:09:17 -05:00 |
|
|
04e983b86b
|
modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now
|
2024-10-12 11:27:55 -05:00 |
|
|
3d6ef9666b
|
overridden naive llama attention to get the right score values that entropix needs
|
2024-10-12 10:05:47 -05:00 |
|
|
168e203942
|
ugh
|
2024-08-30 14:39:07 -05:00 |
|