|
fe0f235335
|
mechanism to store the model config inside the weights and load them, some other things to allow LoRA training on the RetNet (gradient checkpointing will gripe about inputs not having require_grad and nothing seems to remedy it)
|
2024-07-16 18:23:13 -05:00 |
|
|
3acc54df22
|
allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)
|
2024-07-15 19:59:48 -05:00 |
|
|
c4dd523b6f
|
change from chunk-slicing paths for distributed dataloader to instead interleave
|
2024-06-29 10:10:35 -05:00 |
|
|
dd40463803
|
limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid)
|
2024-06-29 09:11:28 -05:00 |
|
|
1a392b69f6
|
local training backend should be a bit more aware of variable batch sizes, maybe
|
2024-06-28 22:39:05 -05:00 |
|
|
8fffb94964
|
backport fix from tortoise_tts with local trainer + loading state when training lora
|
2024-06-25 13:41:29 -05:00 |
|
|
8a986eb480
|
load exported LoRA weights if exists (to-do: make a better LoRA loading mechanism)
|
2024-06-18 21:45:46 -05:00 |
|
|
7cfb78fa64
|
enable LoRA for targetted RVQ levels (to experiment with, seems to help)
|
2024-06-17 21:45:03 -05:00 |
|
|
7047fcc6e2
|
actually make deepspeed work with LoRAs
|
2024-06-17 13:55:37 -05:00 |
|
|
1d159b1476
|
updated export routine to split LoRA weights from the state dict (should work with deepspeed)
|
2024-06-17 13:28:18 -05:00 |
|
|
726a4b613f
|
naive, rudimentary DeepSpeed support (just live with the LoRA weights living with the original weights, they can be split later)
|
2024-06-17 13:17:24 -05:00 |
|
|
bd0bc10ec0
|
added LoRA policy to decide what layer of the model gets adapted based on simple inclusion/exclusion terms
|
2024-06-17 13:05:06 -05:00 |
|
|
45a39fb79f
|
very rudimentary lora support (no deepspeed support, tested training and saving but not loading yet)
|
2024-06-17 00:09:16 -05:00 |
|
|
19410a919e
|
ugh
|
2024-06-15 12:29:03 -05:00 |
|
|
a7a6e0ac76
|
validated that inferencing works, changed some defaults (NAR benefits from greedy sampling)
|
2024-06-09 17:11:38 -05:00 |
|
|
132a02c48b
|
sanity cleanup, backup config yaml for each log file
|
2024-06-09 11:22:52 -05:00 |
|
|
8d92dac829
|
forgot I renamed this
|
2024-06-09 11:12:30 -05:00 |
|
|
4ade2b60ee
|
ugh
|
2024-06-06 21:57:11 -05:00 |
|
|
fcac9503e2
|
cleanup
|
2024-06-06 13:08:02 -05:00 |
|
|
b2194b859a
|
re-added loading multiple models because I'm now entertaining having split AR/NAR models again (and need a way to load both at once)
|
2024-06-06 09:48:43 -05:00 |
|
|
b05a905b95
|
ugh
|
2024-06-05 21:02:05 -05:00 |
|
|
e50edc3b48
|
added a flag to convert to a HF compatible model on export by stitching things
|
2024-06-03 22:34:47 -05:00 |
|
|
934672252b
|
feverish cleanup
|
2024-06-03 21:28:49 -05:00 |
|
|
c2a436d368
|
somehow between training sessions grad_norm = None even though it worked before
|
2024-06-02 08:29:27 -05:00 |
|
|
827cf632e7
|
report current loss scale and adjust grad norm by loss scale (for deepspeed)
|
2024-06-01 10:44:32 -05:00 |
|
|
856545f8bb
|
nan loss detection (should have added it earlier), loss scaling for local backend + fp16
|
2024-05-11 22:23:29 -05:00 |
|
|
88e9b9caff
|
local ddp fix
|
2024-05-11 17:29:01 -05:00 |
|
|
71e373064f
|
remove redundant loss, tweak readme
|
2024-05-11 15:02:47 -05:00 |
|
|
c22a177cf8
|
forgot to pass warmup to schedule free
|
2024-05-09 22:18:49 -05:00 |
|
|
0d5d545a40
|
crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc.
|
2024-05-09 20:28:20 -05:00 |
|
|
8aa1b2dabf
|
documentation update
|
2024-05-04 21:03:46 -05:00 |
|
|
c494894261
|
simple DDP wrapper (for my NVlink test)
|
2024-05-04 11:48:26 -05:00 |
|
|
a7b43b98b5
|
renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes)
|
2024-05-02 20:08:59 -05:00 |
|
|
545162195b
|
deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things
|
2024-04-15 19:54:32 -05:00 |
|
|
f0c4baeb25
|
added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it)
|
2024-04-09 22:04:01 -05:00 |
|
|
4d75ee066c
|
actually do the Linear replacement with TE's Linear
|
2024-04-09 14:41:13 -05:00 |
|
|
9d97eb5104
|
added FP8 support through NVIDIA/TransformerEngine , added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale)
|
2024-04-08 20:14:51 -05:00 |
|
|
7075c2a5f0
|
added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml)
|
2024-04-04 19:11:49 -05:00 |
|
|
91062361af
|
tweaks
|
2024-03-01 20:38:06 -06:00 |
|
|
f3c59c3e7e
|
cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed)
|
2024-03-01 20:18:43 -06:00 |
|
|
47435207f7
|
Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model
|
2024-03-01 19:20:10 -06:00 |
|
|
3da1518ace
|
added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)
|
2024-01-31 21:48:36 -06:00 |
|
|
c690aa509d
|
fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...)
|
2023-12-25 21:20:32 -06:00 |
|
|
9c198eb75a
|
added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works)
|
2023-12-20 18:45:58 -06:00 |
|
|
6c51a629cc
|
resetting step count resets the samples processed and other metrics
|
2023-10-29 12:11:19 -05:00 |
|
|
32d4271ca8
|
fixed issue with training from scratch (oops)
|
2023-10-21 09:55:38 -05:00 |
|
|
09cda7d3f9
|
added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup
|
2023-10-16 19:30:38 -05:00 |
|
|
65f500083d
|
tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work
|
2023-10-12 22:21:43 -05:00 |
|
|
893a610fad
|
cleanup, use deepspeed inferencing pathway if requested
|
2023-10-09 15:24:04 -05:00 |
|
|
4abd6564d1
|
fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yaml
|
2023-09-23 19:59:00 -05:00 |
|
|
e7da1eb90d
|
edge case
|
2023-09-20 19:20:17 -05:00 |
|
|
c0b25541e3
|
restructured some things with the model to remove dead weights
|
2023-09-20 19:10:59 -05:00 |
|
|
5ac119a6e7
|
added light web UI (need to port the telemetry disabling bandaids from aivc)
|
2023-09-09 16:17:20 -05:00 |
|
|
8837bc34d7
|
added option to specify parameters to freeze per-model in YAML (because I need to see about committing atrocities with convering an AR into an AR+NAR)
|
2023-09-07 18:19:51 -05:00 |
|
|
81b05dabb9
|
accurate epoch metric is now reported (based on samples processed / length of dataset's paths, rather than naive assumptions)
|
2023-09-03 08:03:36 -05:00 |
|
|
57db3ccfa8
|
shuffled VALL-E continuous as a task tts-c instead, logic fixes for it
|
2023-09-02 12:23:40 -05:00 |
|
|
2f06166ddd
|
cleanups
|
2023-09-01 21:33:51 -05:00 |
|
|
e40c0d34a0
|
somewhat got recurrent forward working (it's as accurate as chunkwise forward: it's not accurate at all), added option to use AMP instead of blanket setting the weight's dtype
|
2023-09-01 20:58:29 -05:00 |
|
|
7f4388e591
|
added total samples processed and tokens processed (len of text tokens + len of target response tokens)
|
2023-08-28 11:02:45 -05:00 |
|
|
87c4bfedba
|
added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU)
|
2023-08-27 12:26:12 -05:00 |
|
|
0517d620b8
|
fixes with the local backend
|
2023-08-24 17:05:56 -05:00 |
|
|
736c077282
|
ops
|
2023-08-20 13:42:18 -05:00 |
|
|
b105f6211e
|
added ability to export weights mid-training to avoid CBT to yank the weights while the training script is running
|
2023-08-20 13:39:58 -05:00 |
|
|
fc576010ce
|
wrapped saving the checkpoint in a try/catch so I can stop waking up to the damn trainer crashing because it ran out of disk space; I'd much rather it keep training to give me time to eventually clear up disk space rather than it silently restarting on its own
|
2023-08-20 06:29:17 -05:00 |
|
|
2d1a9f10c0
|
nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now)
|
2023-08-19 15:06:33 -05:00 |
|
|
03872b823f
|
why did I type rglob, another 10 bucks down the drain...
|
2023-08-17 00:11:29 -05:00 |
|
|
b5f247aa11
|
just nuked about 9 hours of progress because I didn't make sure it pruned only on the global leader
|
2023-08-16 23:37:52 -05:00 |
|
|
d7152fc7b9
|
added pruning of old checkpoints if specified (cfg.trainer.keep_last_checkpoints)
|
2023-08-16 20:12:12 -05:00 |
|
|
d7deaf6def
|
distributed training works now (hopefully)
|
2023-08-13 22:07:45 -05:00 |
|
|
d89568a96e
|
some fixes for the local framework
|
2023-08-05 03:22:15 +00:00 |
|
|
5970f254e3
|
some fixes for the local framework
|
2023-08-05 02:17:30 +00:00 |
|
|
012f54b7f1
|
another classic commit so i can copy it to another machine to gut out things and use the trainer bits for a side project that I should really get around to working on sooner than later
|
2023-08-04 14:21:30 -05:00 |
|
|
0a524f1d59
|
reticulating splines
|
2023-08-03 21:39:00 -05:00 |
|
|
608c1970eb
|
ops
|
2023-08-03 20:36:19 -05:00 |
|
|
c85101403f
|
big cleanup
|
2023-08-03 20:26:36 -05:00 |
|