From 13c9920b7ff566fb24d478943c64bbfa0e37ad08 Mon Sep 17 00:00:00 2001 From: mrq Date: Fri, 17 Feb 2023 17:44:52 +0000 Subject: [PATCH] caveats while I tighten some nuts --- README.md | 12 ++++++++++++ src/utils.py | 2 +- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 45a7e76..b4724cb 100755 --- a/README.md +++ b/README.md @@ -223,6 +223,9 @@ To import a voice, click `Import Voice`. Remember to click `Refresh Voice List` This tab will contain a collection of sub-tabs pertaining to training. +**!**NOTE**!**: training is still in it's infancy, as this was cobbled together to get a good baseline to iterate from afterwards, so be warned of the cruft as I tighten things down. I advise to be patient and understanding if something goes wrong. + + #### Prepare Dataset This section will aid in preparing the dataset for fine-tuning. @@ -233,6 +236,7 @@ The web UI will leverage [openai/whisper](https://github.com/openai/whisper) to **!**NOTE**!**: transcription leverages FFMPEG, so please make sure you either have an FFMPEG installed visible to your PATH, or drop the binary in the `./bin/` folder. + #### Generate Configuration This will generate the YAML necessary to feed into training. For now, you can set: @@ -250,6 +254,14 @@ wavs/LJ001-0002.wav|in being comparatively modern.|in being comparatively modern * `Validation Name`: **!**TODO**!**: fill * `Validation Path`: path for the validation set, similar to the dataset. I'm not necessarily sure what to really use for this, so explicitly for testing, I just copied the training dataset text +#### Train + +After preparing your dataset and configuration file, you are ready to train. Simply select a generated configuration file, click train, then keep an eye on the console window for output. + +Please be advised that integration is very much in its infancy. + +**!**NOTE**!**: for now, you must provide a `dvae.pth` file into `./models/tortoise/`. I'll add in a way to automatically grab it during initialization soon. + ### Settings This tab (should) hold a bunch of other settings, from tunables that shouldn't be tampered with, to settings pertaining to the web UI itself. diff --git a/src/utils.py b/src/utils.py index 25bacc9..e7601b0 100755 --- a/src/utils.py +++ b/src/utils.py @@ -53,7 +53,7 @@ def setup_args(): 'sample-batch-size': None, 'embed-output-metadata': True, 'latents-lean-and-mean': True, - 'voice-fixer': True, + 'voice-fixer': False, # I'm tired of long initialization of Colab notebooks 'voice-fixer-use-cuda': True, 'force-cpu-for-conditioning-latents': False, 'device-override': None,