Commit Graph

347 Commits

Author SHA1 Message Date
mrq
cf97560e70 minimum CFG of 3 for NAR-len because it seems the model will auto-default to NAR-len now 2024-12-03 19:40:05 -06:00
mrq
ca31da0a95 sageattn (forgot to bother with testing this the other day, seems ifne) 2024-12-03 15:14:57 -06:00
mrq
84a05acb6d touch ups in docs 2024-12-02 19:10:42 -06:00
mrq
dcaf38b359 fixed training tqdm being stubborn 2024-11-23 09:45:23 -06:00
mrq
41d7c30ea5 added much cleaner non-causal mask generation 2024-11-22 19:43:32 -06:00
mrq
c99a74e834 actually generate a causal mask because it seems sometimes it does not actually generate one because it makes assumptions 2024-11-22 18:30:24 -06:00
mrq
ccee5fc11c that was actually all pointless since sdpa always had an attention mask fed to it and does not need is_causal to implicitly generate one 2024-11-22 16:51:50 -06:00
mrq
4aa685e749 what has science done 2024-11-22 16:45:40 -06:00
mrq
147219a5e0 huge oversight in the attention masking......... (i realized I have not been providing a non-causal mask to non-causal tasks) 2024-11-22 13:44:43 -06:00
mrq
24d888c47c temporarily dropping support for xformers because it's breaking when using an attention mask (which i dont remember commenting it out when being passed), default to not use wandb because it's being a pain when doing tests and not actual sessionsS) 2024-11-22 11:29:12 -06:00
mrq
8aafae91fd dont use timeembedding 2024-11-21 23:14:52 -06:00
mrq
2cef97e43f cleanup 2024-11-21 23:08:43 -06:00
mrq
67f7bad168 added mixed modality AR+NAR-len to generate a short prefix through the AR, then inference with said prefix through the NAR-len (need to experiment with it more to ensure that the masked off tokens are the only tokens getting updated) 2024-11-20 14:22:12 -06:00
mrq
b1369e7824 better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len) 2024-11-19 18:51:17 -06:00
mrq
190a917b3e I did it. 2024-11-19 12:24:33 -06:00
mrq
0e621354e7 cleaned up classifier-free guidance logit processing (in order to try and cope with a bad nar-len model) 2024-11-19 10:30:05 -06:00
mrq
5ba80686e1 two weeks of agony concludes 2024-11-18 21:29:28 -06:00
mrq
2b29790173 oops 2024-11-18 14:12:26 -06:00
mrq
6cfdf94bf9 swap priority to use nar-len if available, added notes 2024-11-18 09:40:04 -06:00
mrq
069b27570f set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things) 2024-11-17 17:04:07 -06:00
mrq
88d840218d default set cfg strength to 3.0 since the reference model is updated 2024-11-17 10:23:40 -06:00
mrq
a3e1fa3518 ugh 2024-11-17 09:28:33 -06:00
mrq
23fdba0c98 tweaks and changes 2024-11-16 15:49:06 -06:00
mrq
2fbeacfe92 ugh 2024-11-14 22:18:33 -06:00
mrq
39096f8ff3 redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........) 2024-11-14 22:17:47 -06:00
mrq
e412e98125 ugh 2024-11-14 07:34:22 -06:00
mrq
c00fc18b62 actually use the right embedding for nar-len 2024-11-13 18:04:04 -06:00
mrq
3ea8a610d6 fix STT 2024-11-13 14:27:15 -06:00
mrq
910033343c overhauled how the right resp level / classifier gets picked to avoid cringemath 2024-11-13 13:31:17 -06:00
mrq
269648605e move NAR-len rvq level 0 to separate embedding 2024-11-13 11:38:58 -06:00
mrq
be83ddabaa better causal-ness for split loss calc, and also do masking for NAR-len for it 2024-11-13 10:17:52 -06:00
mrq
6b76419123 ugh 2024-11-13 09:54:20 -06:00
mrq
ad7cfffc00 NAR-len RVQ-0 was being trained causally............. 2024-11-13 09:43:50 -06:00
mrq
8286aa54c8 do not pass timestep token/embedding since it doesn't seem to matter at all after all, fixed training masking rate to 80% because a paper said so 2024-11-13 09:07:10 -06:00
mrq
0f2584eba7 new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?) 2024-11-12 22:30:09 -06:00
mrq
663f07038d haha... (do not create a token dropout/noise mask when not training (this sadly didnt fix NAR-len output)) 2024-11-12 16:41:58 -06:00
mrq
b09328069e actually do CFG sampling for base AR+NAR tasks 2024-11-12 13:42:39 -06:00
mrq
2495a7ef67 Fixed STT in the web UI 2024-11-12 12:49:53 -06:00
mrq
8927bad7bc actually fixed rep pen (for ar and nar, it seems to help with nar unmasking) 2024-11-11 21:40:19 -06:00
mrq
b1f4db39c8 threw in CFG sampling for normal model as well to experiment with 2024-11-11 20:27:38 -06:00
mrq
2f56696506 overhauled inference/sampler kwargs to stop being a bloated mess 2024-11-11 20:21:16 -06:00
mrq
a748e223ce tweaks 2024-11-11 12:40:41 -06:00
mrq
48490757da fixes 2024-11-10 20:37:50 -06:00
mrq
9def34cd66 lol 2024-11-10 12:48:41 -06:00
mrq
9cb0b6901b unified nar.py into ar_nar.py 2024-11-10 12:19:48 -06:00
mrq
a9d2faf2d7 all I can do now until I wait for the model to (re)train for pure NAR 2024-11-09 22:57:34 -06:00
mrq
ad7e290a5e ugh (ROCm seems to silently clamp any token value >= logits.shape[-1] for loss calculation, while cuda will throw an assert, making it hard to find this dumb fuckup) 2024-11-09 19:40:02 -06:00
mrq
943fe70c10 I don't know why this fixes an assert thrown but it does 2024-11-09 19:04:13 -06:00
mrq
f50d92ba6c Almost made a mistake 2024-11-09 18:12:54 -06:00
mrq
c6a38693a2 This better work 2024-11-09 18:04:59 -06:00