Random voices appear at random when specific voice selected #42
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Running
./scripts/tortoise_tts.py -O /tmp/clips -v chloe < file.txt
will sometimes produce audio clips using a randomly selected voice. Happens about 3 times out of every 75 clips.Is it possible to eliminate the random voice selection altogether, to force users to select a voice?
Beyond my scope, as:
I can assume what you're thinking is a random voice is just how vanilla TorToiSe will behave (given the default model, simple variance will get you something not resembling the base model). You might want to lower the temperature to reduce variance, if it's the case.
I imagine you're trying to use it like 152334H/tortoise-tts-fast. Don't.
For context, I'm writing a novel having three narrators. The following script runs TorToiSe TTS on each plain text chapter:
In effect, I'm trying to instruct TorToiSe TTS to use a specific narrator for each chapter. Chapter 9 generates 121 WAV files. Of those, 28 are narrated in an unexpected random male voice and the remainder are in the desired narrator's female voice. IMO, this breaks the principle of least astonishment. (As a user, I've explicitly set the voice to use, but TorToiSe seems to pick a different voice at random.)
It looks like
--cvvp-amount CVVP_AMOUNT
can reduce the likelihood of multiple speakers, which is what I think is happening, but there's no documentation on the max value (0 being default). Is the float's range from 0 to 1?I looked at the docs for temperature, and there are a few issues:
--temperature
or--diffusion-temperature
?--temperature
controls the variance, what are its min/max/default values? The help only states "The softmax temperature of the autoregressive model."By re-sampling with a higher audio quality source and tweaking the command-line arguments, the same voice is used consistently throughout the narration: