forked from camenduru/ai-voice-cloning
caveats while I tighten some nuts
This commit is contained in:
parent
8d268bc7a3
commit
13c9920b7f
12
README.md
12
README.md
|
@ -223,6 +223,9 @@ To import a voice, click `Import Voice`. Remember to click `Refresh Voice List`
|
|||
|
||||
This tab will contain a collection of sub-tabs pertaining to training.
|
||||
|
||||
**!**NOTE**!**: training is still in it's infancy, as this was cobbled together to get a good baseline to iterate from afterwards, so be warned of the cruft as I tighten things down. I advise to be patient and understanding if something goes wrong.
|
||||
|
||||
|
||||
#### Prepare Dataset
|
||||
|
||||
This section will aid in preparing the dataset for fine-tuning.
|
||||
|
@ -233,6 +236,7 @@ The web UI will leverage [openai/whisper](https://github.com/openai/whisper) to
|
|||
|
||||
**!**NOTE**!**: transcription leverages FFMPEG, so please make sure you either have an FFMPEG installed visible to your PATH, or drop the binary in the `./bin/` folder.
|
||||
|
||||
|
||||
#### Generate Configuration
|
||||
|
||||
This will generate the YAML necessary to feed into training. For now, you can set:
|
||||
|
@ -250,6 +254,14 @@ wavs/LJ001-0002.wav|in being comparatively modern.|in being comparatively modern
|
|||
* `Validation Name`: **!**TODO**!**: fill
|
||||
* `Validation Path`: path for the validation set, similar to the dataset. I'm not necessarily sure what to really use for this, so explicitly for testing, I just copied the training dataset text
|
||||
|
||||
#### Train
|
||||
|
||||
After preparing your dataset and configuration file, you are ready to train. Simply select a generated configuration file, click train, then keep an eye on the console window for output.
|
||||
|
||||
Please be advised that integration is very much in its infancy.
|
||||
|
||||
**!**NOTE**!**: for now, you must provide a `dvae.pth` file into `./models/tortoise/`. I'll add in a way to automatically grab it during initialization soon.
|
||||
|
||||
### Settings
|
||||
|
||||
This tab (should) hold a bunch of other settings, from tunables that shouldn't be tampered with, to settings pertaining to the web UI itself.
|
||||
|
|
|
@ -53,7 +53,7 @@ def setup_args():
|
|||
'sample-batch-size': None,
|
||||
'embed-output-metadata': True,
|
||||
'latents-lean-and-mean': True,
|
||||
'voice-fixer': True,
|
||||
'voice-fixer': False, # I'm tired of long initialization of Colab notebooks
|
||||
'voice-fixer-use-cuda': True,
|
||||
'force-cpu-for-conditioning-latents': False,
|
||||
'device-override': None,
|
||||
|
|
Loading…
Reference in New Issue
Block a user