Commit Graph

54 Commits

Author SHA1 Message Date
mrq
e50edc3b48 added a flag to convert to a HF compatible model on export by stitching things 2024-06-03 22:34:47 -05:00
mrq
934672252b feverish cleanup 2024-06-03 21:28:49 -05:00
mrq
c2a436d368 somehow between training sessions grad_norm = None even though it worked before 2024-06-02 08:29:27 -05:00
mrq
827cf632e7 report current loss scale and adjust grad norm by loss scale (for deepspeed) 2024-06-01 10:44:32 -05:00
mrq
856545f8bb nan loss detection (should have added it earlier), loss scaling for local backend + fp16 2024-05-11 22:23:29 -05:00
mrq
88e9b9caff local ddp fix 2024-05-11 17:29:01 -05:00
mrq
71e373064f remove redundant loss, tweak readme 2024-05-11 15:02:47 -05:00
mrq
c22a177cf8 forgot to pass warmup to schedule free 2024-05-09 22:18:49 -05:00
mrq
0d5d545a40 crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc. 2024-05-09 20:28:20 -05:00
mrq
8aa1b2dabf documentation update 2024-05-04 21:03:46 -05:00
mrq
c494894261 simple DDP wrapper (for my NVlink test) 2024-05-04 11:48:26 -05:00
mrq
a7b43b98b5 renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes) 2024-05-02 20:08:59 -05:00
mrq
545162195b deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things 2024-04-15 19:54:32 -05:00
mrq
f0c4baeb25 added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it) 2024-04-09 22:04:01 -05:00
mrq
4d75ee066c actually do the Linear replacement with TE's Linear 2024-04-09 14:41:13 -05:00
mrq
9d97eb5104 added FP8 support through NVIDIA/TransformerEngine, added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale) 2024-04-08 20:14:51 -05:00
mrq
7075c2a5f0 added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml) 2024-04-04 19:11:49 -05:00
mrq
91062361af tweaks 2024-03-01 20:38:06 -06:00
mrq
f3c59c3e7e cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed) 2024-03-01 20:18:43 -06:00
mrq
47435207f7 Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model 2024-03-01 19:20:10 -06:00
mrq
3da1518ace added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work) 2024-01-31 21:48:36 -06:00
mrq
c690aa509d fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...) 2023-12-25 21:20:32 -06:00
mrq
9c198eb75a added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works) 2023-12-20 18:45:58 -06:00
mrq
6c51a629cc resetting step count resets the samples processed and other metrics 2023-10-29 12:11:19 -05:00
mrq
32d4271ca8 fixed issue with training from scratch (oops) 2023-10-21 09:55:38 -05:00
mrq
09cda7d3f9 added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup 2023-10-16 19:30:38 -05:00
mrq
65f500083d tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work 2023-10-12 22:21:43 -05:00
mrq
893a610fad cleanup, use deepspeed inferencing pathway if requested 2023-10-09 15:24:04 -05:00
mrq
4abd6564d1 fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yaml 2023-09-23 19:59:00 -05:00
mrq
e7da1eb90d edge case 2023-09-20 19:20:17 -05:00
mrq
c0b25541e3 restructured some things with the model to remove dead weights 2023-09-20 19:10:59 -05:00
mrq
5ac119a6e7 added light web UI (need to port the telemetry disabling bandaids from aivc) 2023-09-09 16:17:20 -05:00
mrq
8837bc34d7 added option to specify parameters to freeze per-model in YAML (because I need to see about committing atrocities with convering an AR into an AR+NAR) 2023-09-07 18:19:51 -05:00
mrq
81b05dabb9 accurate epoch metric is now reported (based on samples processed / length of dataset's paths, rather than naive assumptions) 2023-09-03 08:03:36 -05:00
mrq
57db3ccfa8 shuffled VALL-E continuous as a task tts-c instead, logic fixes for it 2023-09-02 12:23:40 -05:00
mrq
2f06166ddd cleanups 2023-09-01 21:33:51 -05:00
mrq
e40c0d34a0 somewhat got recurrent forward working (it's as accurate as chunkwise forward: it's not accurate at all), added option to use AMP instead of blanket setting the weight's dtype 2023-09-01 20:58:29 -05:00
mrq
7f4388e591 added total samples processed and tokens processed (len of text tokens + len of target response tokens) 2023-08-28 11:02:45 -05:00
mrq
87c4bfedba added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU) 2023-08-27 12:26:12 -05:00
mrq
0517d620b8 fixes with the local backend 2023-08-24 17:05:56 -05:00
mrq
736c077282 ops 2023-08-20 13:42:18 -05:00
mrq
b105f6211e added ability to export weights mid-training to avoid CBT to yank the weights while the training script is running 2023-08-20 13:39:58 -05:00
mrq
fc576010ce wrapped saving the checkpoint in a try/catch so I can stop waking up to the damn trainer crashing because it ran out of disk space; I'd much rather it keep training to give me time to eventually clear up disk space rather than it silently restarting on its own 2023-08-20 06:29:17 -05:00
mrq
2d1a9f10c0 nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now) 2023-08-19 15:06:33 -05:00
mrq
03872b823f why did I type rglob, another 10 bucks down the drain... 2023-08-17 00:11:29 -05:00
mrq
b5f247aa11 just nuked about 9 hours of progress because I didn't make sure it pruned only on the global leader 2023-08-16 23:37:52 -05:00
mrq
d7152fc7b9 added pruning of old checkpoints if specified (cfg.trainer.keep_last_checkpoints) 2023-08-16 20:12:12 -05:00
mrq
d7deaf6def distributed training works now (hopefully) 2023-08-13 22:07:45 -05:00
mrq
d89568a96e some fixes for the local framework 2023-08-05 03:22:15 +00:00
mrq
5970f254e3 some fixes for the local framework 2023-08-05 02:17:30 +00:00