|
29e45be0b4
|
tweaks to bucket sampling
|
2024-11-13 11:09:24 -06:00 |
|
|
b2eca271a8
|
ugh
|
2024-11-13 10:35:44 -06:00 |
|
|
be83ddabaa
|
better causal-ness for split loss calc, and also do masking for NAR-len for it
|
2024-11-13 10:17:52 -06:00 |
|
|
6b76419123
|
ugh
|
2024-11-13 09:54:20 -06:00 |
|
|
ad7cfffc00
|
NAR-len RVQ-0 was being trained causally.............
|
2024-11-13 09:43:50 -06:00 |
|
|
976ee87f6f
|
resume iteration step in tqdm trainer, warn to logger if the sampler state dict was invalidated
|
2024-11-13 09:09:28 -06:00 |
|
|
8286aa54c8
|
do not pass timestep token/embedding since it doesn't seem to matter at all after all, fixed training masking rate to 80% because a paper said so
|
2024-11-13 09:07:10 -06:00 |
|
|
caf721c67b
|
set it to zero because it'll make the stop token hide more often than not
|
2024-11-12 22:30:50 -06:00 |
|
|
0f2584eba7
|
new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?)
|
2024-11-12 22:30:09 -06:00 |
|
|
663f07038d
|
haha... (do not create a token dropout/noise mask when not training (this sadly didnt fix NAR-len output))
|
2024-11-12 16:41:58 -06:00 |
|
|
b09328069e
|
actually do CFG sampling for base AR+NAR tasks
|
2024-11-12 13:42:39 -06:00 |
|
|
2495a7ef67
|
Fixed STT in the web UI
|
2024-11-12 12:49:53 -06:00 |
|
|
8927bad7bc
|
actually fixed rep pen (for ar and nar, it seems to help with nar unmasking)
|
2024-11-11 21:40:19 -06:00 |
|
|
ec92613847
|
actually pass input prompt length size to inference
|
2024-11-11 20:39:48 -06:00 |
|
|
b1df6a7bed
|
reverted rep pen sampler due to a regression
|
2024-11-11 20:35:08 -06:00 |
|
|
b1f4db39c8
|
threw in CFG sampling for normal model as well to experiment with
|
2024-11-11 20:27:38 -06:00 |
|
|
2f56696506
|
overhauled inference/sampler kwargs to stop being a bloated mess
|
2024-11-11 20:21:16 -06:00 |
|
|
354f8e059d
|
store dataset hash alongside state dict so it can be ignored if mismatched
|
2024-11-11 18:16:56 -06:00 |
|
|
f7b8b1e825
|
dropped subtrain dataloader since its useless to duplicate
|
2024-11-11 17:00:49 -06:00 |
|
|
cf9df71f2c
|
use homwbrewed caching system for dataloader paths / durations (I'm pretty sure I am now triggering OOM killers with my entire dataset used)
|
2024-11-11 16:32:08 -06:00 |
|
|
a748e223ce
|
tweaks
|
2024-11-11 12:40:41 -06:00 |
|
|
48490757da
|
fixes
|
2024-11-10 20:37:50 -06:00 |
|
|
9def34cd66
|
lol
|
2024-11-10 12:48:41 -06:00 |
|
|
9cb0b6901b
|
unified nar.py into ar_nar.py
|
2024-11-10 12:19:48 -06:00 |
|
|
a9d2faf2d7
|
all I can do now until I wait for the model to (re)train for pure NAR
|
2024-11-09 22:57:34 -06:00 |
|
|
ad7e290a5e
|
ugh (ROCm seems to silently clamp any token value >= logits.shape[-1] for loss calculation, while cuda will throw an assert, making it hard to find this dumb fuckup)
|
2024-11-09 19:40:02 -06:00 |
|
|
943fe70c10
|
I don't know why this fixes an assert thrown but it does
|
2024-11-09 19:04:13 -06:00 |
|
|
f50d92ba6c
|
Almost made a mistake
|
2024-11-09 18:12:54 -06:00 |
|
|
c6a38693a2
|
This better work
|
2024-11-09 18:04:59 -06:00 |
|
|
8b3d1cf70a
|
Something's Wrong
|
2024-11-09 15:07:43 -06:00 |
|
|
dcd5fecff3
|
some cleanup while I wait for the NAR-len to train to an acceptable state (currently it performs okay, but only on audo after 3 seconds or so)
|
2024-11-09 12:12:46 -06:00 |
|
|
69b0b3b854
|
set timestep tensor to whatever the time embedding's dtype is because it'll gripe under amp
|
2024-11-09 00:11:16 -06:00 |
|
|
5a09a5f6e9
|
I forgot about the time embedding...
|
2024-11-08 22:46:26 -06:00 |
|
|
811b15d280
|
I suppose I just have a shit training method since the sampler is as solid as I can get it...............
|
2024-11-08 22:05:41 -06:00 |
|
|
13b54953bd
|
agony
|
2024-11-08 13:34:39 -06:00 |
|
|
c127c4e488
|
'borrowed' a sampling scheduler for NAR-len's RVQ level 0 (better than before, but still not good enough)
|
2024-11-07 21:19:14 -06:00 |
|
|
e108c54daf
|
new NAR-len training paradigm......
|
2024-11-07 11:32:11 -06:00 |
|
|
ed174c589e
|
ugh
|
2024-11-07 09:19:21 -06:00 |
|
|
d13ab00ad8
|
one more note
|
2024-11-07 09:11:21 -06:00 |
|
|
5698188824
|
あたしって、ほんとバカ
|
2024-11-07 09:10:18 -06:00 |
|
|
77ff23e319
|
repeat extend the prom to fill the initial tokens for nar-len (it somewhat works, the model just needs to train more)
|
2024-11-06 23:29:53 -06:00 |
|
|
a3bc26f7ec
|
ugh
|
2024-11-06 23:16:28 -06:00 |
|
|
d606a693ff
|
eval fix for nar-len
|
2024-11-06 23:14:16 -06:00 |
|
|
105ed51159
|
I guess I'll fall for the NAR-len meme again (I don't know where my previous weights are, so I need to train it again to test something)
|
2024-11-06 19:17:12 -06:00 |
|
|
bcabde3454
|
more notes
|
2024-11-06 13:51:28 -06:00 |
|
|
bfc5e1d723
|
agony
|
2024-11-05 22:30:49 -06:00 |
|
|
aefe8fcdad
|
UGH
|
2024-11-05 22:13:58 -06:00 |
|
|
556d9db0d5
|
web UI support for HF ZeroGPU
|
2024-11-05 21:38:02 -06:00 |
|
|
e58a9469a3
|
move layerskip to experimental settings.......
|
2024-11-05 20:37:06 -06:00 |
|
|
bbc2de3713
|
ugh
|
2024-11-05 11:50:05 -06:00 |
|