|
f3c59c3e7e
|
cleaner replacement code (because I realized BitNet had an implementation for it too), added calculating gradient norm and performing gradient clipping in local trainer (non-deepspeed)
|
2024-03-01 20:18:43 -06:00 |
|
|
47435207f7
|
Added cfg.bitsandbytes.replace as a less intrusive alternative to cfg.bitsandbytes.inject to replace all Linear modules in a model
|
2024-03-01 19:20:10 -06:00 |
|
|
0427d8d076
|
logger broke for some reason, added flag to just tqdm.write instead, make cfg.bitsandbytes.bitnet==True yamls denoted since I'm sure they're not interoperable
|
2024-03-01 10:32:35 -06:00 |
|
|
35d78a2bb0
|
Yet Another Underlying Transformer Implementation (BitNet, will give it a few days to see how it fares)
|
2024-02-29 20:29:17 -06:00 |
|
|
cce929e136
|
nasty hotfix for transformer's Mixtral throwing an error when batch sizes > 1
|
2024-01-26 19:41:12 -06:00 |
|
|
9c198eb75a
|
added torchscale XMOE integration (because Mixtral 8x7B seems very promising and I want to see if it works)
|
2023-12-20 18:45:58 -06:00 |
|
|
32d4271ca8
|
fixed issue with training from scratch (oops)
|
2023-10-21 09:55:38 -05:00 |
|
|
09cda7d3f9
|
added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup
|
2023-10-16 19:30:38 -05:00 |
|
|
65f500083d
|
tweaks to try and get deepspeed quantized inferencing, validating bitsandbytes and deepspeed quantization, nothing seems to work
|
2023-10-12 22:21:43 -05:00 |
|
|
893a610fad
|
cleanup, use deepspeed inferencing pathway if requested
|
2023-10-09 15:24:04 -05:00 |
|
|
3db7e7dea1
|
implicitly load checkpoint if deepspeed checkpoint not found, updated setup script to grab the diskcached dataloader things
|
2023-10-06 10:02:45 -05:00 |
|
|
4abd6564d1
|
fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yaml
|
2023-09-23 19:59:00 -05:00 |
|
|
9384900ce6
|
revert the frankensteined "train one model but hotload the other" since it kept loading the last exported weights and I'm not supporting this usecase anymore anyways
|
2023-09-22 13:04:17 -05:00 |
|
|
c0b25541e3
|
restructured some things with the model to remove dead weights
|
2023-09-20 19:10:59 -05:00 |
|
|
5ac119a6e7
|
added light web UI (need to port the telemetry disabling bandaids from aivc)
|
2023-09-09 16:17:20 -05:00 |
|
|
67617d7d69
|
also cull frozen_params in the params optimizer receives to reduce VRAM it consumes
|
2023-09-07 18:27:02 -05:00 |
|
|
8837bc34d7
|
added option to specify parameters to freeze per-model in YAML (because I need to see about committing atrocities with convering an AR into an AR+NAR)
|
2023-09-07 18:19:51 -05:00 |
|
|
ab5134f385
|
tweaks and fixes
|
2023-09-07 17:08:38 -05:00 |
|
|
e7a67410d1
|
oops
|
2023-09-07 09:14:03 -05:00 |
|
|
712808494f
|
added support for optional prodigy optimizer (https://github.com/konstmish/prodigy) although it consumes a lot more VRAM per parameter
|
2023-09-06 20:33:16 -05:00 |
|
|
100ca6b7d0
|
added option to use SGD optimizer through the YAML, added option to pass in additional optimizer parameters through the YAML, added experimental unified AR+NAR model (does not seem fruitful in testing)
|
2023-09-06 18:58:35 -05:00 |
|
|
8a6c203277
|
added per-speaker samplers
|
2023-09-03 21:27:13 -05:00 |
|
|
81b05dabb9
|
accurate epoch metric is now reported (based on samples processed / length of dataset's paths, rather than naive assumptions)
|
2023-09-03 08:03:36 -05:00 |
|
|
57db3ccfa8
|
shuffled VALL-E continuous as a task tts-c instead, logic fixes for it
|
2023-09-02 12:23:40 -05:00 |
|
|
7f4388e591
|
added total samples processed and tokens processed (len of text tokens + len of target response tokens)
|
2023-08-28 11:02:45 -05:00 |
|
|
87c4bfedba
|
added ability to mark models as disabled for training, and hotloading them for eval/validation (useful if training only one model, or training a model per GPU)
|
2023-08-27 12:26:12 -05:00 |
|
|
0517d620b8
|
fixes with the local backend
|
2023-08-24 17:05:56 -05:00 |
|
|
501a857d5d
|
ops
|
2023-08-23 17:03:25 -05:00 |
|
|
4585824cd3
|
tweaks, including exporting on save/quit
|
2023-08-23 16:43:03 -05:00 |
|
|
b105f6211e
|
added ability to export weights mid-training to avoid CBT to yank the weights while the training script is running
|
2023-08-20 13:39:58 -05:00 |
|
|
2d1a9f10c0
|
nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now)
|
2023-08-19 15:06:33 -05:00 |
|
|
0b46c1e312
|
god I am inexperienced with retaining compat from previous weights, I hope no one actually has weights
|
2023-08-18 21:29:20 -05:00 |
|
|
2a71486cb6
|
preparing for SpeechX extensions
|
2023-08-18 20:58:07 -05:00 |
|
|
ced31fd9b7
|
removed the sampler as it's very misleading
|
2023-08-18 14:47:48 -05:00 |
|
|
599e47a813
|
might fix user inputted saving/quitting breaking when distributed
|
2023-08-15 23:52:20 -05:00 |
|
|
13571380be
|
made exporter make more sense
|
2023-08-13 22:56:28 -05:00 |
|
|
d7deaf6def
|
distributed training works now (hopefully)
|
2023-08-13 22:07:45 -05:00 |
|
|
2af09d0bef
|
fixed that mysterious discepancy between the reported losses (I am so freaking mad, my piss is boiling, I had to interrupt halfway through an epoch)
|
2023-08-05 15:25:41 -05:00 |
|
|
d89568a96e
|
some fixes for the local framework
|
2023-08-05 03:22:15 +00:00 |
|
|
c85101403f
|
big cleanup
|
2023-08-03 20:26:36 -05:00 |
|
|
2e03e5ac93
|
Fixed an issue with having fairseq installed at all will brick logging
|
2023-08-02 22:57:10 -05:00 |
|
|
f6597e2dfe
|
adjustments
|
2023-08-02 18:36:26 -05:00 |
|
|
0f9b81de75
|
oops
|
2023-08-02 18:12:36 -05:00 |
|
|
bf8cedc9dd
|
Rewrite init
|
2023-08-02 21:53:35 +00:00 |
|