|
5026d93ecd
|
sloppy fix to actually kill children when using multi-GPU distributed training, set GPU training count based on what CUDA exposes automatically so I don't have to keep setting it to 2
|
2023-03-04 20:42:54 +00:00 |
|
|
1a9d159b2a
|
forgot to add 'bs / gradient accum < 2 clamp validation logic
|
2023-03-04 17:37:08 +00:00 |
|
|
df24827b9a
|
renamed mega batch factor to an actual real term: gradient accumulation factor, fixed halting training not actually killing the training process and freeing up resources, some logic cleanup for gradient accumulation (so many brain worms and wrong assumptions from testing on low batch sizes) (read the training section in the wiki for more details)
|
2023-03-04 15:55:06 +00:00 |
|
|
6d5e1e1a80
|
fixed user inputted LR schedule not actually getting used (oops)
|
2023-03-04 04:41:56 +00:00 |
|
|
6d8c2dd459
|
auto-suggested voice chunk size is based on the total duration of the voice files divided by 10 seconds, added setting to adjust the auto-suggested division factor (a really oddly worded one), because I'm sure people will OOM blindly generating without adjusting this slider
|
2023-03-03 21:13:48 +00:00 |
|
|
e1f3ffa08c
|
oops
|
2023-03-03 18:51:33 +00:00 |
|
|
9fb4aa7917
|
validated whispercpp working, fixed args.listen not being saved due to brainworms
|
2023-03-03 07:23:10 +00:00 |
|
|
740b5587df
|
added option to specify using BigVGAN as the vocoder for mrq/tortoise-tts
|
2023-03-03 06:39:37 +00:00 |
|
|
68f4858ce9
|
oops
|
2023-03-03 05:51:17 +00:00 |
|
|
e859a7c01d
|
experimental multi-gpu training (Linux only, because I can't into batch files)
|
2023-03-03 04:37:18 +00:00 |
|
|
c956d81baf
|
added button to just load a training set's loss information, added installing broncotc/bitsandbytes-rocm when running setup-rocm.sh
|
2023-03-02 01:35:12 +00:00 |
|
|
534a761e49
|
added loading/saving of voice latents by model hash, so no more needing to manually regenerate every time you change models
|
2023-03-02 00:46:52 +00:00 |
|
|
5a41db978e
|
oops
|
2023-03-01 19:39:43 +00:00 |
|
|
b989123bd4
|
leverage tensorboard to parse tb_logger files when starting training (it seems to give a nicer resolution of training data, need to see about reading it directly while training)
|
2023-03-01 19:32:11 +00:00 |
|
|
c2726fa0d4
|
added new training tunable: loss_text_ce_loss weight, added option to specify source model in case you want to finetune a finetuned model (for example, train a Japanese finetune on a large dataset, then finetune for a specific voice, need to truly validate if it produces usable output), some bug fixes that came up for some reason now and not earlier
|
2023-03-01 01:17:38 +00:00 |
|
|
5037752059
|
oops
|
2023-02-28 22:13:21 +00:00 |
|
|
787b44807a
|
added to embedded metadata: datetime, model path, model hash
|
2023-02-28 15:36:06 +00:00 |
|
|
81eb58f0d6
|
show different losses, rewordings
|
2023-02-28 06:18:18 +00:00 |
|
|
fda47156ec
|
oops
|
2023-02-28 01:08:07 +00:00 |
|
|
bc0d9ab3ed
|
added graph to chart loss_gpt_total rate, added option to prune X number of previous models/states, something else
|
2023-02-28 01:01:50 +00:00 |
|
|
6925ec731b
|
I don't remember.
|
2023-02-27 19:20:06 +00:00 |
|
|
92553973be
|
Added option to disable bitsandbytesoptimizations for systems that do not support it (systems without a Turing-onward Nvidia card), saves use of float16 and bitsandbytes for training into the config json
|
2023-02-26 01:57:56 +00:00 |
|
|
aafeb9f96a
|
actually fixed the training output text parser
|
2023-02-25 16:44:25 +00:00 |
|
|
65329dba31
|
oops, epoch increments twice
|
2023-02-25 15:31:18 +00:00 |
|
|
8b4da29d5f
|
csome adjustments to the training output parser, now updates per iteration for really large batches (like the one I'm doing for a dataset size of 19420)
|
2023-02-25 13:55:25 +00:00 |
|
|
d5d8821a9d
|
fixed some files not copying for bitsandbytes (I was wrong to assume it copied folders too), fixed stopping generating and training, some other thing that I forgot since it's been slowly worked on in my small free times
|
2023-02-24 23:13:13 +00:00 |
|
|
2104dbdbc5
|
ops
|
2023-02-24 13:05:08 +00:00 |
|
|
f6d0b66e10
|
finally added model refresh button, also searches in the training folder for outputted models so you don't even need to copy them
|
2023-02-24 12:58:41 +00:00 |
|
|
1e0fec4358
|
god i finally found some time and focus: reworded print/save freq per epoch => print/save freq (in epochs), added import config button to reread the last used settings (will check for the output folder's configs first, then the generated ones) and auto-grab the last resume state (if available), some other cleanups i genuinely don't remember what I did when I spaced out for 20 minutes
|
2023-02-23 23:22:23 +00:00 |
|
|
7d1220e83e
|
forgot to mult by batch size
|
2023-02-23 15:38:04 +00:00 |
|
|
487f2ebf32
|
fixed the brain worm discrepancy between epochs, iterations, and steps
|
2023-02-23 15:31:43 +00:00 |
|
|
1cbcf14cff
|
oops
|
2023-02-23 13:18:51 +00:00 |
|
|
225dee22d4
|
huge success
|
2023-02-23 06:24:54 +00:00 |
|
|
526a430c2a
|
how did this revert...
|
2023-02-22 13:24:03 +00:00 |
|
|
93b061fb4d
|
oops
|
2023-02-22 03:21:03 +00:00 |
|
|
c4b41e07fa
|
properly placed the line toe xtract starting iteration
|
2023-02-22 01:17:09 +00:00 |
|
|
fefc7aba03
|
oops
|
2023-02-21 22:13:30 +00:00 |
|
|
9e64dad785
|
clamp batch size to sample count when generating for the sickos that want that, added setting to remove non-final output after a generation, something else I forgot already
|
2023-02-21 21:50:05 +00:00 |
|
|
f119993fb5
|
explicitly use python3 because some OSs will not have python alias to python3, allow batch size 1
|
2023-02-21 20:20:52 +00:00 |
|
|
8a1a48f31e
|
Added very experimental float16 training for cards with not enough VRAM (10GiB and below, maybe) \!NOTE\! this is VERY EXPERIMETNAL, I have zero free time to validate it right now, I'll do it later
|
2023-02-21 19:31:57 +00:00 |
|
|
ed2cf9f5ee
|
wrap checking for metadata when adding a voice in case it throws an error
|
2023-02-21 17:35:30 +00:00 |
|
|
b6f7aa6264
|
fixes
|
2023-02-21 04:22:11 +00:00 |
|
|
bbc2d26289
|
I finally figured out how to fix gr.Dropdown.change, so a lot of dumb UI decisions are fixed and makes sense
|
2023-02-21 03:00:45 +00:00 |
|
|
1fd88afcca
|
updated notebook for newer setup structure, added formatting of getting it/s and lass loss rate (have not tested loss rate yet)
|
2023-02-20 22:56:39 +00:00 |
|
|
37ffa60d14
|
brain worms forgot a global, hate global semantics
|
2023-02-20 15:31:38 +00:00 |
|
|
d17f6fafb0
|
clean up, reordered, added some rather liberal loading/unloading auxiliary models, can't really focus right now to keep testing it, report any issues and I'll get around to it
|
2023-02-20 00:21:16 +00:00 |
|
|
c99cacec2e
|
oops
|
2023-02-19 23:29:12 +00:00 |
|
|
ee95616dfd
|
optimize batch sizes to be as evenly divisible as possible (noticed the calculated epochs mismatched the inputted epochs)
|
2023-02-19 21:06:14 +00:00 |
|
|
6260594a1e
|
Forgot to base print/save frequencies in terms of epochs in the UI, will get converted when saving the YAML
|
2023-02-19 20:38:00 +00:00 |
|
|
4694d622f4
|
doing something completely unrelated had me realize it's 1000x easier to just base things in terms of epochs, and calculate iteratsions from there
|
2023-02-19 20:22:03 +00:00 |
|