Commit Graph

45 Commits

Author SHA1 Message Date
mrq
7047fcc6e2 actually make deepspeed work with LoRAs 2024-06-17 13:55:37 -05:00
mrq
1d159b1476 updated export routine to split LoRA weights from the state dict (should work with deepspeed) 2024-06-17 13:28:18 -05:00
mrq
726a4b613f naive, rudimentary DeepSpeed support (just live with the LoRA weights living with the original weights, they can be split later) 2024-06-17 13:17:24 -05:00
mrq
bd0bc10ec0 added LoRA policy to decide what layer of the model gets adapted based on simple inclusion/exclusion terms 2024-06-17 13:05:06 -05:00
mrq
45a39fb79f very rudimentary lora support (no deepspeed support, tested training and saving but not loading yet) 2024-06-17 00:09:16 -05:00
mrq
a7a6e0ac76 validated that inferencing works, changed some defaults (NAR benefits from greedy sampling) 2024-06-09 17:11:38 -05:00
mrq
4ade2b60ee ugh 2024-06-06 21:57:11 -05:00
mrq
fcac9503e2 cleanup 2024-06-06 13:08:02 -05:00
mrq
e50edc3b48 added a flag to convert to a HF compatible model on export by stitching things 2024-06-03 22:34:47 -05:00
mrq
934672252b feverish cleanup 2024-06-03 21:28:49 -05:00
mrq
c2a436d368 somehow between training sessions grad_norm = None even though it worked before 2024-06-02 08:29:27 -05:00
mrq
827cf632e7 report current loss scale and adjust grad norm by loss scale (for deepspeed) 2024-06-01 10:44:32 -05:00
mrq
856545f8bb nan loss detection (should have added it earlier), loss scaling for local backend + fp16 2024-05-11 22:23:29 -05:00
mrq
88e9b9caff local ddp fix 2024-05-11 17:29:01 -05:00
mrq
71e373064f remove redundant loss, tweak readme 2024-05-11 15:02:47 -05:00
mrq
8aa1b2dabf documentation update 2024-05-04 21:03:46 -05:00
mrq
9d97eb5104 added FP8 support through NVIDIA/TransformerEngine, added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale) 2024-04-08 20:14:51 -05:00
mrq
91062361af tweaks 2024-03-01 20:38:06 -06:00
mrq
f3c59c3e7e cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed) 2024-03-01 20:18:43 -06:00
mrq
3da1518ace added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work) 2024-01-31 21:48:36 -06:00
mrq
9c198eb75a added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works) 2023-12-20 18:45:58 -06:00
mrq
6c51a629cc resetting step count resets the samples processed and other metrics 2023-10-29 12:11:19 -05:00
mrq
09cda7d3f9 added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup 2023-10-16 19:30:38 -05:00
mrq
c0b25541e3 restructured some things with the model to remove dead weights 2023-09-20 19:10:59 -05:00
mrq
5ac119a6e7 added light web UI (need to port the telemetry disabling bandaids from aivc) 2023-09-09 16:17:20 -05:00
mrq
8837bc34d7 added option to specify parameters to freeze per-model in YAML (because I need to see about committing atrocities with convering an AR into an AR+NAR) 2023-09-07 18:19:51 -05:00
mrq
81b05dabb9 accurate epoch metric is now reported (based on samples processed / length of dataset's paths, rather than naive assumptions) 2023-09-03 08:03:36 -05:00
mrq
2f06166ddd cleanups 2023-09-01 21:33:51 -05:00
mrq
e40c0d34a0 somewhat got recurrent forward working (it's as accurate as chunkwise forward: it's not accurate at all), added option to use AMP instead of blanket setting the weight's dtype 2023-09-01 20:58:29 -05:00
mrq
7f4388e591 added total samples processed and tokens processed (len of text tokens + len of target response tokens) 2023-08-28 11:02:45 -05:00
mrq
87c4bfedba added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU) 2023-08-27 12:26:12 -05:00
mrq
0517d620b8 fixes with the local backend 2023-08-24 17:05:56 -05:00
mrq
736c077282 ops 2023-08-20 13:42:18 -05:00
mrq
b105f6211e added ability to export weights mid-training to avoid CBT to yank the weights while the training script is running 2023-08-20 13:39:58 -05:00
mrq
fc576010ce wrapped saving the checkpoint in a try/catch so I can stop waking up to the damn trainer crashing because it ran out of disk space; I'd much rather it keep training to give me time to eventually clear up disk space rather than it silently restarting on its own 2023-08-20 06:29:17 -05:00
mrq
2d1a9f10c0 nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now) 2023-08-19 15:06:33 -05:00
mrq
03872b823f why did I type rglob, another 10 bucks down the drain... 2023-08-17 00:11:29 -05:00
mrq
b5f247aa11 just nuked about 9 hours of progress because I didn't make sure it pruned only on the global leader 2023-08-16 23:37:52 -05:00
mrq
d7152fc7b9 added pruning of old checkpoints if specified (cfg.trainer.keep_last_checkpoints) 2023-08-16 20:12:12 -05:00
mrq
d7deaf6def distributed training works now (hopefully) 2023-08-13 22:07:45 -05:00
mrq
d89568a96e some fixes for the local framework 2023-08-05 03:22:15 +00:00
mrq
5970f254e3 some fixes for the local framework 2023-08-05 02:17:30 +00:00
mrq
0a524f1d59 reticulating splines 2023-08-03 21:39:00 -05:00
mrq
608c1970eb ops 2023-08-03 20:36:19 -05:00
mrq
c85101403f big cleanup 2023-08-03 20:26:36 -05:00