mrq - ecker.tech

mrq

18 Followers · 0 Following

https://git.ecker.tech/ aims to provide a place to share my efforts while maintaining true ownership of my code, as I do not trust GitHub.

XMR: 4B9TQdkAkBFYrbj5ztvTx89e5LpucPeTSPzemCihdDi9EBnx7btn8RDNZTBz2zihWsjMnDkzn5As1LU6gLv3KQy8BLsZ8SG
Joined on 2022-10-10

mrq pushed to master at mrq/vall-e

2025-02-28 05:52:51 +00:00

fc25a9a7dc do not like that

mrq pushed to master at mrq/vall-e

2025-02-28 05:50:46 +00:00

396163d40d do not like that

mrq pushed to master at mrq/vall-e

2025-02-28 05:34:28 +00:00

f4f435d7f5 when you already had these ideas to stabilize training but you just ignored them

mrq pushed to master at mrq/vall-e

2025-02-28 03:33:46 +00:00

0a45c9c042 fix attention backend not being used

mrq pushed to master at mrq/vall-e

2025-02-28 03:31:23 +00:00

3171712440 fix attention backend not being used

mrq pushed to master at mrq/vall-e

2025-02-28 02:37:25 +00:00

b8e9f3d785 maybe this will work

mrq pushed to master at mrq/vall-e

2025-02-28 01:00:47 +00:00

01e96bafc9 ugh

mrq pushed to master at mrq/vall-e

2025-02-28 00:55:31 +00:00

eff180248c decoupled llama backend to avoid any funny changes from transformers, removed other backends since i dont think i'll ever bother using them

mrq pushed to master at mrq/vall-e

2025-02-27 05:08:36 +00:00

ceecac6ffe I think I made resp_parallel_training=True faster with loss factoring?

mrq pushed to master at mrq/vall-e

2025-02-27 04:43:35 +00:00

mrq pushed to master at mrq/vall-e

2025-02-27 04:42:43 +00:00

0c224090d7 slightly faster pathway for resp_parallel_training=True with loss factoring

mrq pushed to master at mrq/vall-e

2025-02-27 03:55:30 +00:00

06ef3daf3c require minimum of 1 second durations for training because of my slop code auto-transposing that I don't wanna fix right now

mrq pushed to master at mrq/vall-e

2025-02-27 03:26:10 +00:00

cbd4d7d7f4 ugh

mrq pushed to master at mrq/vall-e

2025-02-27 03:21:08 +00:00

2ea387c08a segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working)

mrq pushed to master at mrq/vall-e

2025-02-26 16:44:01 +00:00

7d2e64630c lol

mrq pushed to master at mrq/vall-e

2025-02-26 16:40:54 +00:00

fd91e447a1 lol

mrq pushed to master at mrq/vall-e

2025-02-26 16:34:13 +00:00

95da4e9405 made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split)

mrq pushed to master at mrq/vall-e

2025-02-25 21:09:27 +00:00

de27115bb7 there's something wrong with it on my 4xV100 rig......

mrq pushed to master at mrq/vall-e

2025-02-25 03:02:45 +00:00

db181f8e88 only do auto=equal for nemo as its an FSQ

mrq pushed to master at mrq/vall-e

2025-02-25 02:58:31 +00:00

a5a04c39ef when the

... 4 5 6 7 8 ...