Commit Graph

591 Commits

Author SHA1 Message Date
mrq
e412e98125 ugh 2024-11-14 07:34:22 -06:00
mrq
c00fc18b62 actually use the right embedding for nar-len 2024-11-13 18:04:04 -06:00
mrq
3ea8a610d6 fix STT 2024-11-13 14:27:15 -06:00
mrq
910033343c overhauled how the right resp level / classifier gets picked to avoid cringemath 2024-11-13 13:31:17 -06:00
mrq
269648605e move NAR-len rvq level 0 to separate embedding 2024-11-13 11:38:58 -06:00
mrq
29e45be0b4 tweaks to bucket sampling 2024-11-13 11:09:24 -06:00
mrq
b2eca271a8 ugh 2024-11-13 10:35:44 -06:00
mrq
be83ddabaa better causal-ness for split loss calc, and also do masking for NAR-len for it 2024-11-13 10:17:52 -06:00
mrq
6b76419123 ugh 2024-11-13 09:54:20 -06:00
mrq
ad7cfffc00 NAR-len RVQ-0 was being trained causally............. 2024-11-13 09:43:50 -06:00
mrq
976ee87f6f resume iteration step in tqdm trainer, warn to logger if the sampler state dict was invalidated 2024-11-13 09:09:28 -06:00
mrq
8286aa54c8 do not pass timestep token/embedding since it doesn't seem to matter at all after all, fixed training masking rate to 80% because a paper said so 2024-11-13 09:07:10 -06:00
mrq
caf721c67b set it to zero because it'll make the stop token hide more often than not 2024-11-12 22:30:50 -06:00
mrq
0f2584eba7 new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?) 2024-11-12 22:30:09 -06:00
mrq
663f07038d haha... (do not create a token dropout/noise mask when not training (this sadly didnt fix NAR-len output)) 2024-11-12 16:41:58 -06:00
mrq
b09328069e actually do CFG sampling for base AR+NAR tasks 2024-11-12 13:42:39 -06:00
mrq
2495a7ef67 Fixed STT in the web UI 2024-11-12 12:49:53 -06:00
mrq
8927bad7bc actually fixed rep pen (for ar and nar, it seems to help with nar unmasking) 2024-11-11 21:40:19 -06:00
mrq
ec92613847 actually pass input prompt length size to inference 2024-11-11 20:39:48 -06:00
mrq
b1df6a7bed reverted rep pen sampler due to a regression 2024-11-11 20:35:08 -06:00
mrq
b1f4db39c8 threw in CFG sampling for normal model as well to experiment with 2024-11-11 20:27:38 -06:00
mrq
2f56696506 overhauled inference/sampler kwargs to stop being a bloated mess 2024-11-11 20:21:16 -06:00
mrq
354f8e059d store dataset hash alongside state dict so it can be ignored if mismatched 2024-11-11 18:16:56 -06:00
mrq
f7b8b1e825 dropped subtrain dataloader since its useless to duplicate 2024-11-11 17:00:49 -06:00
mrq
cf9df71f2c use homwbrewed caching system for dataloader paths / durations (I'm pretty sure I am now triggering OOM killers with my entire dataset used) 2024-11-11 16:32:08 -06:00
mrq
a748e223ce tweaks 2024-11-11 12:40:41 -06:00
mrq
48490757da fixes 2024-11-10 20:37:50 -06:00
mrq
9def34cd66 lol 2024-11-10 12:48:41 -06:00
mrq
9cb0b6901b unified nar.py into ar_nar.py 2024-11-10 12:19:48 -06:00
mrq
a9d2faf2d7 all I can do now until I wait for the model to (re)train for pure NAR 2024-11-09 22:57:34 -06:00
mrq
ad7e290a5e ugh (ROCm seems to silently clamp any token value >= logits.shape[-1] for loss calculation, while cuda will throw an assert, making it hard to find this dumb fuckup) 2024-11-09 19:40:02 -06:00
mrq
943fe70c10 I don't know why this fixes an assert thrown but it does 2024-11-09 19:04:13 -06:00
mrq
f50d92ba6c Almost made a mistake 2024-11-09 18:12:54 -06:00
mrq
c6a38693a2 This better work 2024-11-09 18:04:59 -06:00
mrq
8b3d1cf70a Something's Wrong 2024-11-09 15:07:43 -06:00
mrq
dcd5fecff3 some cleanup while I wait for the NAR-len to train to an acceptable state (currently it performs okay, but only on audo after 3 seconds or so) 2024-11-09 12:12:46 -06:00
mrq
69b0b3b854 set timestep tensor to whatever the time embedding's dtype is because it'll gripe under amp 2024-11-09 00:11:16 -06:00
mrq
5a09a5f6e9 I forgot about the time embedding... 2024-11-08 22:46:26 -06:00
mrq
811b15d280 I suppose I just have a shit training method since the sampler is as solid as I can get it............... 2024-11-08 22:05:41 -06:00
mrq
13b54953bd agony 2024-11-08 13:34:39 -06:00
mrq
c127c4e488 'borrowed' a sampling scheduler for NAR-len's RVQ level 0 (better than before, but still not good enough) 2024-11-07 21:19:14 -06:00
mrq
e108c54daf new NAR-len training paradigm...... 2024-11-07 11:32:11 -06:00
mrq
ed174c589e ugh 2024-11-07 09:19:21 -06:00
mrq
d13ab00ad8 one more note 2024-11-07 09:11:21 -06:00
mrq
5698188824 あたしって、ほんとバカ 2024-11-07 09:10:18 -06:00
mrq
77ff23e319 repeat extend the prom to fill the initial tokens for nar-len (it somewhat works, the model just needs to train more) 2024-11-06 23:29:53 -06:00
mrq
a3bc26f7ec ugh 2024-11-06 23:16:28 -06:00
mrq
d606a693ff eval fix for nar-len 2024-11-06 23:14:16 -06:00
mrq
105ed51159 I guess I'll fall for the NAR-len meme again (I don't know where my previous weights are, so I need to train it again to test something) 2024-11-06 19:17:12 -06:00
mrq
bcabde3454 more notes 2024-11-06 13:51:28 -06:00