This website requires JavaScript.
Explore
Help
Register
Sign In
mrq
/
vall-e
Watch
5
Star
9
Fork
0
You've already forked vall-e
Code
Issues
8
Pull Requests
Packages
Projects
Releases
Wiki
Activity
3da1518ace
vall-e
/
vall_e
/
models
History
mrq
3da1518ace
added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)
2024-01-31 21:48:36 -06:00
..
__init__.py
added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)
2024-01-31 21:48:36 -06:00
adaln.py
Tweaks
2023-08-02 22:06:39 +00:00
ar_nar.py
added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)
2024-01-31 21:48:36 -06:00
ar.py
actually use langs from the dataloader
2023-10-11 21:21:50 -05:00
base.py
added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)
2024-01-31 21:48:36 -06:00
nar.py
actually use langs from the dataloader
2023-10-11 21:21:50 -05:00
retnet.py
restructured some things with the model to remove dead weights
2023-09-20 19:10:59 -05:00
transformer.py
added ability to disable activation checkpointing through the YAML (it is very VRAM intensive at double layer size)
2023-09-05 15:38:21 -05:00