• https://git.ecker.tech/ aims to provide a place to share my efforts while maintaining true ownership of my code, as I do not trust GitHub.

    XMR: 4B9TQdkAkBFYrbj5ztvTx89e5LpucPeTSPzemCihdDi9EBnx7btn8RDNZTBz2zihWsjMnDkzn5As1LU6gLv3KQy8BLsZ8SG

  • Joined on 2022-10-10
mrq pushed to master at mrq/vall-e 2025-02-28 05:52:51 +00:00
fc25a9a7dc do not like that
mrq pushed to master at mrq/vall-e 2025-02-28 05:50:46 +00:00
396163d40d do not like that
mrq pushed to master at mrq/vall-e 2025-02-28 05:34:28 +00:00
f4f435d7f5 when you already had these ideas to stabilize training but you just ignored them
mrq pushed to master at mrq/vall-e 2025-02-28 03:33:46 +00:00
0a45c9c042 fix attention backend not being used
mrq pushed to master at mrq/vall-e 2025-02-28 03:31:23 +00:00
3171712440 fix attention backend not being used
mrq pushed to master at mrq/vall-e 2025-02-28 02:37:25 +00:00
b8e9f3d785 maybe this will work
mrq pushed to master at mrq/vall-e 2025-02-28 01:00:47 +00:00
mrq pushed to master at mrq/vall-e 2025-02-28 00:55:31 +00:00
eff180248c decoupled llama backend to avoid any funny changes from transformers, removed other backends since i dont think i'll ever bother using them
mrq pushed to master at mrq/vall-e 2025-02-27 05:08:36 +00:00
ceecac6ffe I think I made resp_parallel_training=True faster with loss factoring?
mrq pushed to master at mrq/vall-e 2025-02-27 04:43:35 +00:00
mrq pushed to master at mrq/vall-e 2025-02-27 04:42:43 +00:00
0c224090d7 slightly faster pathway for resp_parallel_training=True with loss factoring
mrq pushed to master at mrq/vall-e 2025-02-27 03:55:30 +00:00
06ef3daf3c require minimum of 1 second durations for training because of my slop code auto-transposing that I don't wanna fix right now
mrq pushed to master at mrq/vall-e 2025-02-27 03:26:10 +00:00
mrq pushed to master at mrq/vall-e 2025-02-27 03:21:08 +00:00
2ea387c08a segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working)
mrq pushed to master at mrq/vall-e 2025-02-26 16:44:01 +00:00
mrq pushed to master at mrq/vall-e 2025-02-26 16:40:54 +00:00
mrq pushed to master at mrq/vall-e 2025-02-26 16:34:13 +00:00
95da4e9405 made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split)
mrq pushed to master at mrq/vall-e 2025-02-25 21:09:27 +00:00
de27115bb7 there's something wrong with it on my 4xV100 rig......
mrq pushed to master at mrq/vall-e 2025-02-25 03:02:45 +00:00
db181f8e88 only do auto=equal for nemo as its an FSQ
mrq pushed to master at mrq/vall-e 2025-02-25 02:58:31 +00:00
a5a04c39ef when the