Update 'Training'

2023-03-04 13:17:12 +00:00 · 2023-03-04 13:17:12 +00:00 · 25972d9a68
commit 25972d9a68
parent 8414ca39d8
1 changed files with 1 additions and 1 deletions
--- a/Training.md
+++ b/Training.md
@ -62,7 +62,7 @@ This will generate the YAML necessary to feed into training. Here, you can set s
 * `Text_CE LR Weight`: an experimental setting to govern how much weight to factor in with the provided learning rate. This is ***a highly experimental tunable***, and is only exposed so I don't need to edit it myself when testing it. ***Leave this to the default 0.01 unless you know what you are doing.***
 * `Learning Rate Schedule`: a list of epochs on when to decay the learning rate. You really should leave this as the default.
 * `Batch Size`: how large of a batch size for training. Larger batch sizes will result in faster training steps, but at the cost of increased VRAM consumption. This value must exceed the size of your dataset, and *should* be evenly divisible by your dataset size.
-* `Mega Batch Factor`: According to the documentation, `DLAS also supports "mega batching", where multiple forward passes contribute to a single backward pass`. If you can spare the VRAM, I suppose you can bump this to 8. If you're pressed for VRAM, you can lower this down to 1. If you have really small batch sizes, use what the validator gives out.
+* `Mega Batch Factor`: "Gradient accumulation factor". This was commented rather oddly, implying to decrease it to save on VRAM, when the inverse is true. If you're straining on VRAM, increase this, up to half of your batch size. I'm not too sure what the performance implicatons are from this, but I *feel* lower values will train faster.
 * `Print Frequency`: how often the trainer should print its training statistics in epochs. Printing takes a little bit of time, but it's a nice way to gauge how a finetune is baking, as it lists your losses and other statistics. This is purely for debugging and babysitting if a model is being trained adequately. The web UI *should* parse the information from stdout and grab the total loss and report it back.
 * `Save Frequency`: how often to save a copy of the model during training in epochs. It seems the training will save a normal copy, an `ema` version of the model, *AND* a backup archive containing both to resume from. If you're training on a Colab with your Drive mounted, these can easily rack up and eat your allotted space. You *can* delete older copies from training, but it's wise not to in case you want to resume from an older state.
 * `Resume State Path`: the last training state saved to resume from. The general path structure is what the placeholder value is. This will resume from whatever iterations it was last at, and iterate from there until the target step count (for example, resuming from iteration 2500, while requesting 5000 iterations, will iterate 2500 more times).