Update 'Collecting Samples'

master
mrq 2023-03-12 05:12:04 +07:00
parent 201c1d37c2
commit aec298b553
1 changed files with 2 additions and 2 deletions

@ -19,9 +19,9 @@ Unlike training embeddings for AI image generations, preparing a "dataset" for v
As a general rule of thumb, try to source clips that aren't noisy, solely the subject you are trying to clone, and doesn't contain any non-words (like yells, guttural noises, etc.). If you must, run your source through a background music/noise remover (how to is an exercise left to the reader). It isn't entirely a detriment if you're unable to provide clean audio, however. Just be wary that you might have some headaches with getting acceptable output.
Nine times out of ten, you should be fine using as many clips as possible. There's (now) no preference between combining your audio into one file, or leaving it split. However, if you're aiming for a specific delivery, it *should* be best for you to narrow down to just using that as your provided source (for example, changing one word in a line).
Nine times out of ten, you should be fine using as many clips as possible. If you're able to, leave your samples split per line, it will save you the headaches later down the line. It's not the end of the world if your samples are in one large file, but when you're going to finetune with them, you won't have as accurate of segments than if they were already segmented.
There's no hard specifics on how many, or how long, your sources should be.
There's no hard specifics on how many, or how long, your sources should be. The latents generation is good enough most of the time to handle a short sentence or a large dataset.
If you're looking to trim your clips, in my opinion, ~~Audacity~~ Tenacity works good enough. Power users with FFMPEG already installed can simply used the provided conversion script in `.\tortoise\convert\`.