vall-e/vall_e/models/arch
2024-11-01 12:50:37 -05:00
..
attention
mamba_vasqu
retnet_syncdoth
__init__.py layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this) 2024-10-30 20:05:45 -05:00
bitnet.py
llama.py third time's the charm (for some reason it escaped me that I should treat early exit loss as an aux_loss to be used with the normal loss, as if I was training a MoE's router) 2024-11-01 12:50:37 -05:00
mamba.py
mixtral.py
retnet.py
transformer.py