|
cce929e136
|
nasty hotfix for transformer's Mixtral throwing an error when batch sizes > 1
|
2024-01-26 19:41:12 -06:00 |
|
|
0db3203b21
|
added LLaMA/Mixtral (if experts>1) model arches, utilize XMoE's loss as well, set MoE frequency to 1 to make every layer MoE'd for RetNet, etc. (going to do tests without burning out again to see how things go)
|
2023-12-22 19:27:36 -06:00 |
|
|
12cfc9e502
|
added prodigyopt as a dependency because I keep forgetting
|
2023-10-04 19:42:56 -05:00 |
|
|
c0b25541e3
|
restructured some things with the model to remove dead weights
|
2023-09-20 19:10:59 -05:00 |
|
|
b6c9686f7d
|
Do not install DeepSpeed under Windows (to-do: default backend to use local if on Windows)
|
2023-08-24 14:27:36 -05:00 |
|
|
2e03e5ac93
|
Fixed an issue with having fairseq installed at all will brick logging
|
2023-08-02 22:57:10 -05:00 |
|
|
0f9b81de75
|
oops
|
2023-08-02 18:12:36 -05:00 |
|
|
bf8cedc9dd
|
Rewrite init
|
2023-08-02 21:53:35 +00:00 |
|