|
d7c6be6f78
|
fix weird regression in handling checkpoints when backend is local, but deepspeed checkpoints are in (it was handled with LoRA loading but not real loading...)
|
2024-07-30 22:15:56 -05:00 |
|
|
06e948aec1
|
suppress warning on exit about distributed not being cleaned up (because I updated my system)
|
2024-07-25 16:50:47 -05:00 |
|
|
188d116222
|
some weird fixes for an equally weird regression with LoRA loading
|
2024-07-22 20:47:24 -05:00 |
|
|
75b04686f8
|
added prom-less training / inferencing, some other things
|
2024-07-22 19:36:07 -05:00 |
|
|
d87b492295
|
added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)
|
2024-07-19 20:49:40 -05:00 |
|
|
d53038a9e4
|
actually have split classifiers working
|
2024-07-19 15:33:31 -05:00 |
|
|
fe0f235335
|
mechanism to store the model config inside the weights and load them, some other things to allow LoRA training on the RetNet (gradient checkpointing will gripe about inputs not having require_grad and nothing seems to remedy it)
|
2024-07-16 18:23:13 -05:00 |
|
|
3acc54df22
|
allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)
|
2024-07-15 19:59:48 -05:00 |
|
|
c4dd523b6f
|
change from chunk-slicing paths for distributed dataloader to instead interleave
|
2024-06-29 10:10:35 -05:00 |
|
|
dd40463803
|
limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid)
|
2024-06-29 09:11:28 -05:00 |
|
|
1a392b69f6
|
local training backend should be a bit more aware of variable batch sizes, maybe
|
2024-06-28 22:39:05 -05:00 |
|
|
8fffb94964
|
backport fix from tortoise_tts with local trainer + loading state when training lora
|
2024-06-25 13:41:29 -05:00 |
|
|
8a986eb480
|
load exported LoRA weights if exists (to-do: make a better LoRA loading mechanism)
|
2024-06-18 21:45:46 -05:00 |
|
|
7cfb78fa64
|
enable LoRA for targetted RVQ levels (to experiment with, seems to help)
|
2024-06-17 21:45:03 -05:00 |
|
|
7047fcc6e2
|
actually make deepspeed work with LoRAs
|
2024-06-17 13:55:37 -05:00 |
|
|
1d159b1476
|
updated export routine to split LoRA weights from the state dict (should work with deepspeed)
|
2024-06-17 13:28:18 -05:00 |
|
|
726a4b613f
|
naive, rudimentary DeepSpeed support (just live with the LoRA weights living with the original weights, they can be split later)
|
2024-06-17 13:17:24 -05:00 |
|
|
bd0bc10ec0
|
added LoRA policy to decide what layer of the model gets adapted based on simple inclusion/exclusion terms
|
2024-06-17 13:05:06 -05:00 |
|
|
45a39fb79f
|
very rudimentary lora support (no deepspeed support, tested training and saving but not loading yet)
|
2024-06-17 00:09:16 -05:00 |
|
|
19410a919e
|
ugh
|
2024-06-15 12:29:03 -05:00 |
|
|
a7a6e0ac76
|
validated that inferencing works, changed some defaults (NAR benefits from greedy sampling)
|
2024-06-09 17:11:38 -05:00 |
|
|
132a02c48b
|
sanity cleanup, backup config yaml for each log file
|
2024-06-09 11:22:52 -05:00 |
|
|
8d92dac829
|
forgot I renamed this
|
2024-06-09 11:12:30 -05:00 |
|
|
4ade2b60ee
|
ugh
|
2024-06-06 21:57:11 -05:00 |
|
|
fcac9503e2
|
cleanup
|
2024-06-06 13:08:02 -05:00 |
|
|
b2194b859a
|
re-added loading multiple models because I'm now entertaining having split AR/NAR models again (and need a way to load both at once)
|
2024-06-06 09:48:43 -05:00 |
|
|
b05a905b95
|
ugh
|
2024-06-05 21:02:05 -05:00 |
|
|
e50edc3b48
|
added a flag to convert to a HF compatible model on export by stitching things
|
2024-06-03 22:34:47 -05:00 |
|
|
934672252b
|
feverish cleanup
|
2024-06-03 21:28:49 -05:00 |
|
|
c2a436d368
|
somehow between training sessions grad_norm = None even though it worked before
|
2024-06-02 08:29:27 -05:00 |
|
|
827cf632e7
|
report current loss scale and adjust grad norm by loss scale (for deepspeed)
|
2024-06-01 10:44:32 -05:00 |
|
|
856545f8bb
|
nan loss detection (should have added it earlier), loss scaling for local backend + fp16
|
2024-05-11 22:23:29 -05:00 |
|
|
88e9b9caff
|
local ddp fix
|
2024-05-11 17:29:01 -05:00 |
|
|
71e373064f
|
remove redundant loss, tweak readme
|
2024-05-11 15:02:47 -05:00 |
|
|
c22a177cf8
|
forgot to pass warmup to schedule free
|
2024-05-09 22:18:49 -05:00 |
|
|
0d5d545a40
|
crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc.
|
2024-05-09 20:28:20 -05:00 |
|
|
8aa1b2dabf
|
documentation update
|
2024-05-04 21:03:46 -05:00 |
|
|
c494894261
|
simple DDP wrapper (for my NVlink test)
|
2024-05-04 11:48:26 -05:00 |
|
|
a7b43b98b5
|
renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes)
|
2024-05-02 20:08:59 -05:00 |
|
|
545162195b
|
deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things
|
2024-04-15 19:54:32 -05:00 |
|
|
f0c4baeb25
|
added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it)
|
2024-04-09 22:04:01 -05:00 |
|
|
4d75ee066c
|
actually do the Linear replacement with TE's Linear
|
2024-04-09 14:41:13 -05:00 |
|
|
9d97eb5104
|
added FP8 support through NVIDIA/TransformerEngine , added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale)
|
2024-04-08 20:14:51 -05:00 |
|
|
7075c2a5f0
|
added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml)
|
2024-04-04 19:11:49 -05:00 |
|
|
91062361af
|
tweaks
|
2024-03-01 20:38:06 -06:00 |
|
|
f3c59c3e7e
|
cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed)
|
2024-03-01 20:18:43 -06:00 |
|
|
47435207f7
|
Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model
|
2024-03-01 19:20:10 -06:00 |
|
|
3da1518ace
|
added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)
|
2024-01-31 21:48:36 -06:00 |
|
|
c690aa509d
|
fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...)
|
2023-12-25 21:20:32 -06:00 |
|
|
9c198eb75a
|
added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works)
|
2023-12-20 18:45:58 -06:00 |
|