|
e50edc3b48
|
added a flag to convert to a HF compatible model on export by stitching things
|
2024-06-03 22:34:47 -05:00 |
|
|
934672252b
|
feverish cleanup
|
2024-06-03 21:28:49 -05:00 |
|
|
c2a436d368
|
somehow between training sessions grad_norm = None even though it worked before
|
2024-06-02 08:29:27 -05:00 |
|
|
827cf632e7
|
report current loss scale and adjust grad norm by loss scale (for deepspeed)
|
2024-06-01 10:44:32 -05:00 |
|
|
856545f8bb
|
nan loss detection (should have added it earlier), loss scaling for local backend + fp16
|
2024-05-11 22:23:29 -05:00 |
|
|
88e9b9caff
|
local ddp fix
|
2024-05-11 17:29:01 -05:00 |
|
|
71e373064f
|
remove redundant loss, tweak readme
|
2024-05-11 15:02:47 -05:00 |
|
|
c22a177cf8
|
forgot to pass warmup to schedule free
|
2024-05-09 22:18:49 -05:00 |
|
|
0d5d545a40
|
crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc.
|
2024-05-09 20:28:20 -05:00 |
|
|
8aa1b2dabf
|
documentation update
|
2024-05-04 21:03:46 -05:00 |
|
|
c494894261
|
simple DDP wrapper (for my NVlink test)
|
2024-05-04 11:48:26 -05:00 |
|
|
a7b43b98b5
|
renamed cfg.bitsandbytes to cfg.optimizations (and having it serve as cfg.optimizations.bitsandbytes)
|
2024-05-02 20:08:59 -05:00 |
|
|
545162195b
|
deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things
|
2024-04-15 19:54:32 -05:00 |
|
|
f0c4baeb25
|
added Adagrad (experimenting with it), added 'extended' model size (16 layers instead of 12, experimenting with it)
|
2024-04-09 22:04:01 -05:00 |
|
|
4d75ee066c
|
actually do the Linear replacement with TE's Linear
|
2024-04-09 14:41:13 -05:00 |
|
|
9d97eb5104
|
added FP8 support through NVIDIA/TransformerEngine , added RetNet_HF through syncdoth/RetNet (as an alternative to branch away from torchscale)
|
2024-04-08 20:14:51 -05:00 |
|
|
7075c2a5f0
|
added an option to allow injecting embeddings from another model, because it dawned upon me how valuable embeddings from a good model can be for subsequent trainings (defined under cfg.models._embeddings as a relative path to the yaml)
|
2024-04-04 19:11:49 -05:00 |
|
|
91062361af
|
tweaks
|
2024-03-01 20:38:06 -06:00 |
|
|
f3c59c3e7e
|
cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed)
|
2024-03-01 20:18:43 -06:00 |
|
|
47435207f7
|
Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model
|
2024-03-01 19:20:10 -06:00 |
|
|
3da1518ace
|
added Mistral (non-Mixtral) backend, useless optimization when not training, proper adjustment of the LR for Prodigyopt through d_coeff (maybe), recurrent sampling for LLaMA/Mistral/Mixtral backends (again, doesn't actually work)
|
2024-01-31 21:48:36 -06:00 |
|
|
c690aa509d
|
fixes and compat (MoE-fying an existing model and retraining from there just ruins it after a second of audio...)
|
2023-12-25 21:20:32 -06:00 |
|
|
9c198eb75a
|
added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works)
|
2023-12-20 18:45:58 -06:00 |
|
|
6c51a629cc
|
resetting step count resets the samples processed and other metrics
|
2023-10-29 12:11:19 -05:00 |
|
|
32d4271ca8
|
fixed issue with training from scratch (oops)
|
2023-10-21 09:55:38 -05:00 |
|
|
09cda7d3f9
|
added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup
|
2023-10-16 19:30:38 -05:00 |
|
|
65f500083d
|
tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work
|
2023-10-12 22:21:43 -05:00 |
|
|
893a610fad
|
cleanup, use deepspeed inferencing pathway if requested
|
2023-10-09 15:24:04 -05:00 |
|
|
4abd6564d1
|
fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yaml
|
2023-09-23 19:59:00 -05:00 |
|
|
e7da1eb90d
|
edge case
|
2023-09-20 19:20:17 -05:00 |
|
|
c0b25541e3
|
restructured some things with the model to remove dead weights
|
2023-09-20 19:10:59 -05:00 |
|
|
5ac119a6e7
|
added light web UI (need to port the telemetry disabling bandaids from aivc)
|
2023-09-09 16:17:20 -05:00 |
|
|
8837bc34d7
|
added option to specify parameters to freeze per-model in YAML (because I need to see about committing atrocities with convering an AR into an AR+NAR)
|
2023-09-07 18:19:51 -05:00 |
|
|
81b05dabb9
|
accurate epoch metric is now reported (based on samples processed / length of dataset's paths, rather than naive assumptions)
|
2023-09-03 08:03:36 -05:00 |
|
|
57db3ccfa8
|
shuffled VALL-E continuous as a task tts-c instead, logic fixes for it
|
2023-09-02 12:23:40 -05:00 |
|
|
2f06166ddd
|
cleanups
|
2023-09-01 21:33:51 -05:00 |
|
|
e40c0d34a0
|
somewhat got recurrent forward working (it's as accurate as chunkwise forward: it's not accurate at all), added option to use AMP instead of blanket setting the weight's dtype
|
2023-09-01 20:58:29 -05:00 |
|
|
7f4388e591
|
added total samples processed and tokens processed (len of text tokens + len of target response tokens)
|
2023-08-28 11:02:45 -05:00 |
|
|
87c4bfedba
|
added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU)
|
2023-08-27 12:26:12 -05:00 |
|
|
0517d620b8
|
fixes with the local backend
|
2023-08-24 17:05:56 -05:00 |
|
|
736c077282
|
ops
|
2023-08-20 13:42:18 -05:00 |
|
|
b105f6211e
|
added ability to export weights mid-training to avoid CBT to yank the weights while the training script is running
|
2023-08-20 13:39:58 -05:00 |
|
|
fc576010ce
|
wrapped saving the checkpoint in a try/catch so I can stop waking up to the damn trainer crashing because it ran out of disk space; I'd much rather it keep training to give me time to eventually clear up disk space rather than it silently restarting on its own
|
2023-08-20 06:29:17 -05:00 |
|
|
2d1a9f10c0
|
nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now)
|
2023-08-19 15:06:33 -05:00 |
|
|
03872b823f
|
why did I type rglob, another 10 bucks down the drain...
|
2023-08-17 00:11:29 -05:00 |
|
|
b5f247aa11
|
just nuked about 9 hours of progress because I didn't make sure it pruned only on the global leader
|
2023-08-16 23:37:52 -05:00 |
|
|
d7152fc7b9
|
added pruning of old checkpoints if specified (cfg.trainer.keep_last_checkpoints)
|
2023-08-16 20:12:12 -05:00 |
|
|
d7deaf6def
|
distributed training works now (hopefully)
|
2023-08-13 22:07:45 -05:00 |
|
|
d89568a96e
|
some fixes for the local framework
|
2023-08-05 03:22:15 +00:00 |
|
|
5970f254e3
|
some fixes for the local framework
|
2023-08-05 02:17:30 +00:00 |
|