Commit Graph

67 Commits

Author SHA1 Message Date
mrq
31f71fa134 sampler update (some brainworm just never actually had a sampler for sample_type=path) 2024-06-14 16:55:40 -05:00
mrq
132a02c48b sanity cleanup, backup config yaml for each log file 2024-06-09 11:22:52 -05:00
mrq
4ade2b60ee ugh 2024-06-06 21:57:11 -05:00
mrq
fcac9503e2 cleanup 2024-06-06 13:08:02 -05:00
mrq
880b4ecd1b cleanup, putting some thoughts in comments before I forget about them 2024-06-05 19:50:06 -05:00
mrq
3cfc8a96bb oops 2024-06-05 10:30:04 -05:00
mrq
c1fcd889d5 reverted automatically disabling split loss calc, since it seems that it's actually cacling loss on prom causes the oddities, maybe 2024-06-01 12:34:59 -05:00
mrq
8cf176ab46 ugh 2024-06-01 10:46:42 -05:00
mrq
d0ebce6bac ugh 2024-06-01 10:30:13 -05:00
mrq
39bc019142 actually save per-rank sampler states 2024-06-01 09:46:32 -05:00
mrq
85f9684720 some cleanup 2024-05-25 17:46:52 -05:00
mrq
3337c69e5a leverage between xformers and torch.backends.cuda.sdp_kernel for attention 2024-05-11 17:14:05 -05:00
mrq
d33c7bb7cf ugh 2024-05-11 16:47:19 -05:00
mrq
0b6499601b sanitizing 2024-05-11 16:31:05 -05:00
mrq
bd0a36ba8d I swear I keep seeing tqdm flicker back a number 2024-05-10 18:36:01 -05:00
mrq
0d5d545a40 crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc. 2024-05-09 20:28:20 -05:00
mrq
277dcec484 apparently I got an error for trying to serialize an errant tensor that made its way into the json, this could be remedied easily with recursively traversing the dict and coercing any objects to primitives, but I'm tired and I just want to start training and nap 2024-05-04 12:33:43 -05:00
mrq
c494894261 simple DDP wrapper (for my NVlink test) 2024-05-04 11:48:26 -05:00
mrq
a7b43b98b5 renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes) 2024-05-02 20:08:59 -05:00
mrq
467fa1c5ee wrapper fixes 2024-04-16 10:19:02 -05:00
mrq
f0c4baeb25 added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it) 2024-04-09 22:04:01 -05:00
mrq
4d75ee066c actually do the Linear replacement with TE's Linear 2024-04-09 14:41:13 -05:00
mrq
9d97eb5104 added FP8 support through NVIDIA/TransformerEngine, added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale) 2024-04-08 20:14:51 -05:00
mrq
f3c59c3e7e cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed) 2024-03-01 20:18:43 -06:00
mrq
47435207f7 Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model 2024-03-01 19:20:10 -06:00
mrq
0427d8d076 logger broke for some reason, added flag to just tqdm.write instead, make cfg.bitsandbytes.bitnet==True yamls denoted since I'm sure they're not interoperable 2024-03-01 10:32:35 -06:00
mrq
35d78a2bb0 Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares) 2024-02-29 20:29:17 -06:00
mrq
cce929e136 nasty hotfix for transformer's Mixtral throwing an error when batch sizes > 1 2024-01-26 19:41:12 -06:00
mrq
9c198eb75a added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works) 2023-12-20 18:45:58 -06:00
mrq
32d4271ca8 fixed issue with training from scratch (oops) 2023-10-21 09:55:38 -05:00
mrq
09cda7d3f9 added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup 2023-10-16 19:30:38 -05:00
mrq
65f500083d tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work 2023-10-12 22:21:43 -05:00
mrq
893a610fad cleanup, use deepspeed inferencing pathway if requested 2023-10-09 15:24:04 -05:00
mrq
3db7e7dea1 implicitly load checkpoint if deepspeed checkpoint not found, updated setup script to grab the diskcached dataloader things 2023-10-06 10:02:45 -05:00
mrq
4abd6564d1 fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yaml 2023-09-23 19:59:00 -05:00
mrq
9384900ce6 revert the frankensteined "train one model but hotload the other" since it kept loading the last exported weights and I'm not supporting this usecase anymore anyways 2023-09-22 13:04:17 -05:00
mrq
c0b25541e3 restructured some things with the model to remove dead weights 2023-09-20 19:10:59 -05:00
mrq
5ac119a6e7 added light web UI (need to port the telemetry disabling bandaids from aivc) 2023-09-09 16:17:20 -05:00
mrq
67617d7d69 also cull frozen_params in the params optimizer receives to reduce VRAM it consumes 2023-09-07 18:27:02 -05:00
mrq
8837bc34d7 added option to specify parameters to freeze per-model in YAML (because I need to see about committing atrocities with convering an AR into an AR+NAR) 2023-09-07 18:19:51 -05:00
mrq
ab5134f385 tweaks and fixes 2023-09-07 17:08:38 -05:00
mrq
e7a67410d1 oops 2023-09-07 09:14:03 -05:00
mrq
712808494f added support for optional prodigy optimizer (https://github.com/konstmish/prodigy) although it consumes a lot more VRAM per parameter 2023-09-06 20:33:16 -05:00
mrq
100ca6b7d0 added option to use SGD optimizer through the YAML, added option to pass in additional optimizer parameters through the YAML, added experimental unified AR+NAR model (does not seem fruitful in testing) 2023-09-06 18:58:35 -05:00
mrq
8a6c203277 added per-speaker samplers 2023-09-03 21:27:13 -05:00
mrq
81b05dabb9 accurate epoch metric is now reported (based on samples processed / length of dataset's paths, rather than naive assumptions) 2023-09-03 08:03:36 -05:00
mrq
57db3ccfa8 shuffled VALL-E continuous as a task tts-c instead, logic fixes for it 2023-09-02 12:23:40 -05:00
mrq
7f4388e591 added total samples processed and tokens processed (len of text tokens + len of target response tokens) 2023-08-28 11:02:45 -05:00
mrq
87c4bfedba added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU) 2023-08-27 12:26:12 -05:00
mrq
0517d620b8 fixes with the local backend 2023-08-24 17:05:56 -05:00