Update 'Settings'

2023-03-07 04:37:06 +00:00 · 2023-03-07 04:37:06 +00:00 · 9fbff5148e
commit 9fbff5148e
parent a9eb3b924a
1 changed files with 2 additions and 1 deletions
--- a/Settings.md
+++ b/Settings.md
@ -16,7 +16,6 @@ Below are settings that override the default launch arguments. Some of these req
 * `Use CUDA for Voice Fixer`: allows voicefixer to use CUDA. Speeds up cleaning the output, but at the cost of more VRAM consumed. Disable if you OOM.
 * `Do Not Load TTS On Startup`: skips loading TorToiSe on initialization, but will get loaded when anything that requires it needs it. This is useful if you're doing non-TTS functions that require VRAM, but you'll OOM while doing it when the model is loaded (for example, training).
 * `Delete Non-Final Output`: if enabled and using multi-line generation, it will delete the individual pieces after combining. If enabled and using Voicefixer, it will remove the un-fixed file. Useful for reducing clutter.
-* `Use BigVGAN Vocoder`: uses [NVIDIA/BigVGAN](https://github.com/NVIDIA/BigVGAN) as the vocoder instead of the default one. Offers a slight improvement when generating the waveform.
 * `Device Override`: overrides the device name used to pass to PyTorch for hardware acceleration. You can use the accompanied `list_devices.py` script to map valid strings to GPU names. You can also pass `cpu` if you want to fallback to software mode.

 * `Sample Batch Size`: sets the batch size when generating autoregressive samples. Bigger batches result in faster compute, at the cost of increased VRAM consumption. Leave to 0 to calculate a "best" fit.
@ -24,6 +23,8 @@ Below are settings that override the default launch arguments. Some of these req
 * `Auto-Calculate Voice Chunk Duration (in seconds)`: for automatically suggesting a voice chunk size, this value will divide the total duration of a voice's input samples. For example, 100 seconds worth of audio with this value as 10 will give 10 chunks. This is to make people stop shitting their pants when they OOM from not adjusting the `Voice Chunk` slider.
 * `Output Volume`: adjusts the volume through amplitude scaling.
 * `Autoregressive Model`: the autoregressive model to use for generating audio output. This will look for models under `./models/finetunes/` and `./training/{voice}-finetune/models/`.
+	- select "auto" to automatically select one based on the current voice loaded.
+* `Vocoder Model`: selects which vocoder to use. Univnet is the default vocoder, while BigVGAN is a better one.
 * `Whisper Model`: the specific model to use for Whisper transcription, when preparing a dataset to finetune with.
 * `Use Whisper.cpp`: leverages [lightmare/whispercpp.py](https://git.ecker.tech/lightmare/whispercpp.py) for transcription and trimming. **!**NOTE**!** this is highly experimental, and I haven't actually tested this myself. There's some caveats.
 * `Refresh Model List`: updates the above dropdown with models