|
4d75ee066c
|
actually do the Linear replacement with TE's Linear
|
2024-04-09 14:41:13 -05:00 |
|
|
7075c2a5f0
|
added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml)
|
2024-04-04 19:11:49 -05:00 |
|
|
47435207f7
|
Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model
|
2024-03-01 19:20:10 -06:00 |
|
|
3da1518ace
|
added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)
|
2024-01-31 21:48:36 -06:00 |
|
|
c690aa509d
|
fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...)
|
2023-12-25 21:20:32 -06:00 |
|
|
32d4271ca8
|
fixed issue with training from scratch (oops)
|
2023-10-21 09:55:38 -05:00 |
|
|
65f500083d
|
tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work
|
2023-10-12 22:21:43 -05:00 |
|
|
893a610fad
|
cleanup, use deepspeed inferencing pathway if requested
|
2023-10-09 15:24:04 -05:00 |
|
|
87c4bfedba
|
added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU)
|
2023-08-27 12:26:12 -05:00 |
|
|
d89568a96e
|
some fixes for the local framework
|
2023-08-05 03:22:15 +00:00 |
|
|
0a524f1d59
|
reticulating splines
|
2023-08-03 21:39:00 -05:00 |
|
|
c85101403f
|
big cleanup
|
2023-08-03 20:26:36 -05:00 |
|