caveats while I tighten some nuts

2023-02-17 17:44:52 +00:00 · 2023-02-17 17:44:52 +00:00 · 13c9920b7f
commit 13c9920b7f
parent 8d268bc7a3
2 changed files with 13 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -223,6 +223,9 @@ To import a voice, click `Import Voice`. Remember to click `Refresh Voice List`

 This tab will contain a collection of sub-tabs pertaining to training.

+**!**NOTE**!**: training is still in it's infancy, as this was cobbled together to get a good baseline to iterate from afterwards, so be warned of the cruft as I tighten things down. I advise to be patient and understanding if something goes wrong.
+
+
 #### Prepare Dataset

 This section will aid in preparing the dataset for fine-tuning.
@ -233,6 +236,7 @@ The web UI will leverage [openai/whisper](https://github.com/openai/whisper) to

 **!**NOTE**!**: transcription leverages FFMPEG, so please make sure you either have an FFMPEG installed visible to your PATH, or drop the binary in the `./bin/` folder.

+
 #### Generate Configuration

 This will generate the YAML necessary to feed into training. For now, you can set:
@ -250,6 +254,14 @@ wavs/LJ001-0002.wav|in being comparatively modern.|in being comparatively modern
 * `Validation Name`: **!**TODO**!**: fill
 * `Validation Path`: path for the validation set, similar to the dataset. I'm not necessarily sure what to really use for this, so explicitly for testing, I just copied the training dataset text

+#### Train
+
+After preparing your dataset and configuration file, you are ready to train. Simply select a generated configuration file, click train, then keep an eye on the console window for output.
+
+Please be advised that integration is very much in its infancy.
+
+**!**NOTE**!**: for now, you must provide a `dvae.pth` file into `./models/tortoise/`. I'll add in a way to automatically grab it during initialization soon.
+
 ### Settings

 This tab (should) hold a bunch of other settings, from tunables that shouldn't be tampered with, to settings pertaining to the web UI itself.
--- a/src/utils.py
+++ b/src/utils.py
@ -53,7 +53,7 @@ def setup_args():
 		'sample-batch-size': None,
 		'embed-output-metadata': True,
 		'latents-lean-and-mean': True,
-		'voice-fixer': True,
+		'voice-fixer': False, # I'm tired of long initialization of Colab notebooks
 		'voice-fixer-use-cuda': True,
 		'force-cpu-for-conditioning-latents': False,
 		'device-override': None,