|
aefe8fcdad
|
UGH
|
2024-11-05 22:13:58 -06:00 |
|
|
ccf71dc1b6
|
added option to load from a model state dict directly instead of a yaml (to-do: do this for LoRAs too), automatically download the default model if none is provided
|
2024-10-25 22:15:15 -05:00 |
|
|
acdce66d4e
|
readme tweaks, set the (unused) default model download URL back to the base ar+nar-llama-8 model, as ar+nar-tts+stt-llama-8 was renamed back to it since it performs well
|
2024-10-05 22:53:53 -05:00 |
|
|
31e8b7edb8
|
tweaks and fixes for lora stuffs
|
2024-09-08 18:05:21 -05:00 |
|
|
32287710a2
|
moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)
|
2024-08-29 13:27:16 -05:00 |
|
|
b7b99a25f1
|
added ability to specify attention backend for CLI and webui (because im tired of editing the yaml)
|
2024-08-26 19:33:51 -05:00 |
|
|
bc2a6fa756
|
sanity cleanup: moved experimental features under its own thing
|
2024-06-30 10:37:33 -05:00 |
|
|
cca542a4c0
|
ugh
|
2024-06-11 23:59:28 -05:00 |
|
|
8d068fa3f9
|
reticulating splines
|
2024-06-08 20:30:15 -05:00 |
|
|
b2194b859a
|
re-added loading multiple models because I'm now entertaining having split AR/NAR models again (and need a way to load both at once)
|
2024-06-06 09:48:43 -05:00 |
|
|
880b4ecd1b
|
cleanup, putting some thoughts in comments before I forget about them
|
2024-06-05 19:50:06 -05:00 |
|
|
c93d5863fd
|
fixes
|
2024-06-04 00:07:00 -05:00 |
|
|
934672252b
|
feverish cleanup
|
2024-06-03 21:28:49 -05:00 |
|
|
0b6499601b
|
sanitizing
|
2024-05-11 16:31:05 -05:00 |
|
|
545162195b
|
deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things
|
2024-04-15 19:54:32 -05:00 |
|
|
9d97eb5104
|
added FP8 support through NVIDIA/TransformerEngine , added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale)
|
2024-04-08 20:14:51 -05:00 |
|
|
3da1518ace
|
added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)
|
2024-01-31 21:48:36 -06:00 |
|
|
e513d2ef19
|
experts weren't forwarded into constructer (wasted a few days of training garbage)
|
2023-12-23 16:08:17 -06:00 |
|
|
65f500083d
|
tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work
|
2023-10-12 22:21:43 -05:00 |
|
|
100ca6b7d0
|
added option to use SGD optimizer through the YAML, added option to pass in additional optimizer parameters through the YAML, added experimental unified AR+NAR model (does not seem fruitful in testing)
|
2023-09-06 18:58:35 -05:00 |
|
|
451726fdd5
|
added ability to disable activation checkpointing through the YAML (it is very VRAM intensive at double layer size)
|
2023-09-05 15:38:21 -05:00 |
|
|
8a6c203277
|
added per-speaker samplers
|
2023-09-03 21:27:13 -05:00 |
|
|
0a524f1d59
|
reticulating splines
|
2023-08-03 21:39:00 -05:00 |
|
|
c85101403f
|
big cleanup
|
2023-08-03 20:26:36 -05:00 |
|
|
7a06b27a9c
|
Tweaks
|
2023-08-02 22:06:39 +00:00 |
|