This website requires JavaScript.
Explore
Help
Register
Sign In
mrq
/
vall-e
Watch
5
Star
9
Fork
0
You've already forked vall-e
Code
Issues
8
Pull Requests
Packages
Projects
Releases
Wiki
Activity
a174c33db6
vall-e
/
vall_e
/
utils
History
mrq
a174c33db6
a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow)
2025-02-28 17:56:50 -06:00
..
ext
made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split)
2025-02-26 10:39:13 -06:00
__init__.py
ugh
2025-02-28 01:06:38 -06:00
distributed.py
io.py
ml.py
borrowed muon since it might better work under deepspeed and not require cruft (even though it really does not like the masked-NAR, also make the masked-NAR faux-causal since it might better help out for cfg.model.version >= 7
2025-02-23 17:23:24 -06:00
pattern.py
sampler.py
trainer.py
when you already had these ideas to stabilize training but you just ignored them
2025-02-27 23:39:20 -06:00
utils.py
a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow)
2025-02-28 17:56:50 -06:00