From d0aa6a62a873ff4352837dbf5fc7a54291a7b43b Mon Sep 17 00:00:00 2001 From: mrq Date: Sun, 12 Mar 2023 04:47:31 +0000 Subject: [PATCH] Update 'Training' --- Training.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/Training.md b/Training.md index 97b2a20..8a4f670 100644 --- a/Training.md +++ b/Training.md @@ -58,6 +58,13 @@ This section will cover how to prepare a dataset for training. This tab will leverage any voice you have under the `./voices/` folder, and transcribes your voice samples using [openai/whisper](https://github.com/openai/whisper) to prepare an LJSpeech-formatted dataset to train against. +It's not required to dedicate a small portion of your dataset for validation purposes, but it's recommended, as it helps remove data that's too small to be useful for. Using a validation dataset will help measure how well the finetune is at synthesizing speech from an input that it has not trained against. + +If you're transcribing English text that's already stored as separate sound files (for example, one sentence per file), there isn't much of a concern with utilizing a larger whisper model, as transcription of English is already very decent with even the smaller models. + +However, if you're transcribing something non-Latin (like Japanese), or need your source sliced into segments (if you have everything in one large file), then you should consider using a larger model for better timestamping (however, the large model seems to have some problems providing accurate segmentation). +* **!**NOTE**!**: be very careful with naively trusting how well the audio is segmented. Be sure to manually curate how well + ## Generate Configuration This will generate the YAML necessary to feed into training. For documentation's sake, below are details for what each parameter does: