vall-e/vall_e/models/arch
2025-02-23 11:22:13 -06:00
..
attention
__init__.py ugh 2025-02-09 13:02:51 -06:00
bitnet.py
llama.py agony 2025-02-12 00:18:24 -06:00
mamba.py
mixtral.py
retnet.py
transformer.py added muon optimizer through kludge hacks because it necessitates a second optimizer in tandum that seems to only sometimes work with deepspeed 2025-02-23 11:22:13 -06:00