|
93feb5660f
|
do not like that
|
2025-02-27 23:59:56 -06:00 |
|
|
f4f435d7f5
|
when you already had these ideas to stabilize training but you just ignored them
|
2025-02-27 23:39:20 -06:00 |
|
|
0a45c9c042
|
fix attention backend not being used
|
2025-02-27 21:38:38 -06:00 |
|
|
b8e9f3d785
|
maybe this will work
|
2025-02-27 20:42:12 -06:00 |
|
|
01e96bafc9
|
ugh
|
2025-02-27 19:05:32 -06:00 |
|
|
ceecac6ffe
|
I think I made resp_parallel_training=True faster with loss factoring?
|
2025-02-26 23:13:32 -06:00 |
|
|
cbd4d7d7f4
|
ugh
|
2025-02-26 21:31:10 -06:00 |
|
|
2ea387c08a
|
segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working)
|
2025-02-26 21:26:13 -06:00 |
|