vall-e

mrq/vall-e

History

mrq e513d2ef19 experts weren't forwarded into constructer (wasted a few days of training garbage)		2023-12-23 16:08:17 -06:00
..
__init__.py	experts weren't forwarded into constructer (wasted a few days of training garbage)	2023-12-23 16:08:17 -06:00
adaln.py
ar_nar.py	added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go)	2023-12-22 19:27:36 -06:00
ar.py	actually use langs from the dataloader	2023-10-11 21:21:50 -05:00
base.py	added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go)	2023-12-22 19:27:36 -06:00
nar.py	actually use langs from the dataloader	2023-10-11 21:21:50 -05:00
retnet.py	restructured some things with the model to remove dead weights	2023-09-20 19:10:59 -05:00
transformer.py