caveats while I tighten some nuts

This commit is contained in:
mrq 2023-02-17 17:44:52 +00:00
parent 8d268bc7a3
commit 13c9920b7f
2 changed files with 13 additions and 1 deletions

View File

@ -223,6 +223,9 @@ To import a voice, click `Import Voice`. Remember to click `Refresh Voice List`
This tab will contain a collection of sub-tabs pertaining to training.
**!**NOTE**!**: training is still in it's infancy, as this was cobbled together to get a good baseline to iterate from afterwards, so be warned of the cruft as I tighten things down. I advise to be patient and understanding if something goes wrong.
#### Prepare Dataset
This section will aid in preparing the dataset for fine-tuning.
@ -233,6 +236,7 @@ The web UI will leverage [openai/whisper](https://github.com/openai/whisper) to
**!**NOTE**!**: transcription leverages FFMPEG, so please make sure you either have an FFMPEG installed visible to your PATH, or drop the binary in the `./bin/` folder.
#### Generate Configuration
This will generate the YAML necessary to feed into training. For now, you can set:
@ -250,6 +254,14 @@ wavs/LJ001-0002.wav|in being comparatively modern.|in being comparatively modern
* `Validation Name`: **!**TODO**!**: fill
* `Validation Path`: path for the validation set, similar to the dataset. I'm not necessarily sure what to really use for this, so explicitly for testing, I just copied the training dataset text
#### Train
After preparing your dataset and configuration file, you are ready to train. Simply select a generated configuration file, click train, then keep an eye on the console window for output.
Please be advised that integration is very much in its infancy.
**!**NOTE**!**: for now, you must provide a `dvae.pth` file into `./models/tortoise/`. I'll add in a way to automatically grab it during initialization soon.
### Settings
This tab (should) hold a bunch of other settings, from tunables that shouldn't be tampered with, to settings pertaining to the web UI itself.

View File

@ -53,7 +53,7 @@ def setup_args():
'sample-batch-size': None,
'embed-output-metadata': True,
'latents-lean-and-mean': True,
'voice-fixer': True,
'voice-fixer': False, # I'm tired of long initialization of Colab notebooks
'voice-fixer-use-cuda': True,
'force-cpu-for-conditioning-latents': False,
'device-override': None,