vall-e/vall_e/models/arch
2024-11-01 18:36:44 -05:00
..
attention
mamba_vasqu
retnet_syncdoth
__init__.py layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this) 2024-10-30 20:05:45 -05:00
bitnet.py
llama.py actually float16(+AMP) and layerskip is bad and will kill the model...... 2024-11-01 18:36:44 -05:00
mamba.py
mixtral.py
retnet.py
transformer.py