Commit Graph

18 Commits

Author SHA1 Message Date
mrq
5cd71ef238 QoL so I can stop having to manually inject different configs 2025-03-06 14:48:14 -06:00
mrq
0d809561c6 accuracy k=1 and k=80 because im probably dumb for k=10 as the default since it does not represent any usecase 2025-03-05 16:35:34 -06:00
mrq
2fb2b732fc wow that was fast 2025-03-04 23:17:18 -06:00
mrq
0451f75e33 now that the new model seems a little more promising, i can re-document things non-cynically 2025-03-03 13:21:41 -06:00
mrq
3f1070f575 tweaks 2025-03-02 22:36:25 -06:00
mrq
17094b8002 reticulating splines 2025-03-01 17:48:51 -06:00
mrq
b97faa8173 fixes... 2025-02-28 18:53:07 -06:00
mrq
4e7d885542 lol 2025-02-28 18:06:41 -06:00
mrq
a174c33db6 a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow) 2025-02-28 17:56:50 -06:00
mrq
09d82a26fe ugh 2025-02-28 01:06:38 -06:00
mrq
93feb5660f do not like that 2025-02-27 23:59:56 -06:00
mrq
f4f435d7f5 when you already had these ideas to stabilize training but you just ignored them 2025-02-27 23:39:20 -06:00
mrq
0a45c9c042 fix attention backend not being used 2025-02-27 21:38:38 -06:00
mrq
b8e9f3d785 maybe this will work 2025-02-27 20:42:12 -06:00
mrq
01e96bafc9 ugh 2025-02-27 19:05:32 -06:00
mrq
ceecac6ffe I think I made resp_parallel_training=True faster with loss factoring? 2025-02-26 23:13:32 -06:00
mrq
cbd4d7d7f4 ugh 2025-02-26 21:31:10 -06:00
mrq
2ea387c08a segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working) 2025-02-26 21:26:13 -06:00