Random voices appear at random when specific voice selected #42

New Issue

thangalin · 2023-03-13T20:37:33Z

thangalin commented

2023-03-13 20:37:33 +00:00

Running ./scripts/tortoise_tts.py -O /tmp/clips -v chloe < file.txt will sometimes produce audio clips using a randomly selected voice. Happens about 3 times out of every 75 clips.

Is it possible to eliminate the random voice selection altogether, to force users to select a voice?

Running `./scripts/tortoise_tts.py -O /tmp/clips -v chloe < file.txt` will sometimes produce audio clips using a randomly selected voice. Happens about 3 times out of every 75 clips. Is it possible to eliminate the random voice selection altogether, to force users to select a voice?

mrq commented

2023-03-13 22:01:22 +00:00

Running ./scripts/tortoise_tts.py

Beyond my scope, as:

I haven't touched that file at all.
I'm not supporting manually invoking tortoise-tts.
my priority lies with mrq/ai-voice-cloning for using this tortoise-tts as a dependency.
- CLI support isn't implemented at the moment, as that's very low priority.

I can assume what you're thinking is a random voice is just how vanilla TorToiSe will behave (given the default model, simple variance will get you something not resembling the base model). You might want to lower the temperature to reduce variance, if it's the case.

I imagine you're trying to use it like 152334H/tortoise-tts-fast. Don't.

> Running `./scripts/tortoise_tts.py` Beyond my scope, as: * I haven't touched that file at all. * I'm not supporting manually invoking tortoise-tts. * my priority lies with [mrq/ai-voice-cloning](https://git.ecker.tech/mrq/ai-voice-cloning) for using this tortoise-tts as a dependency. - CLI support isn't implemented at the moment, as that's very low priority. I can *assume* what you're thinking is a random voice is just how vanilla TorToiSe will behave (given the default model, simple variance will get you something not resembling the base model). You might want to lower the temperature to reduce variance, if it's the case. I imagine you're trying to use it like [152334H/tortoise-tts-fast](https://github.com/152334H/tortoise-tts-fast/). Don't.

mrq closed this issue

2023-03-13 22:01:22 +00:00

thangalin commented

2023-03-29 20:08:21 +00:00

For context, I'm writing a novel having three narrators. The following script runs TorToiSe TTS on each plain text chapter:

function txt_to_tts() {
  pushd $HOME/archives/tortoise-tts
    
  for i in $HOME/dev/novel/chapter/??-*.txt; do
    F=$(basename $i);
    VOICE=$(echo ${F%.*} | cut -c 4-);
    CHAPTER=$(echo ${F%.*} | cut -c -2);
    D=$(dirname $i);
    CLIPS="$D/../audio/$CHAPTER-$VOICE";
    mkdir -p $CLIPS;
    $HOME/dev/tts/scripts/tortoise_tts.py -p high_quality -O $CLIPS -v $VOICE < $i;
  done
  
  popd
}

In effect, I'm trying to instruct TorToiSe TTS to use a specific narrator for each chapter. Chapter 9 generates 121 WAV files. Of those, 28 are narrated in an unexpected random male voice and the remainder are in the desired narrator's female voice. IMO, this breaks the principle of least astonishment. (As a user, I've explicitly set the voice to use, but TorToiSe seems to pick a different voice at random.)

It looks like --cvvp-amount CVVP_AMOUNT can reduce the likelihood of multiple speakers, which is what I think is happening, but there's no documentation on the max value (0 being default). Is the float's range from 0 to 1?

I looked at the docs for temperature, and there are a few issues:

Which temperature controls whether a random voice is used versus the voice specified on the command-line? --temperature or --diffusion-temperature?
If --temperature controls the variance, what are its min/max/default values? The help only states "The softmax temperature of the autoregressive model."

For context, I'm writing a novel having three narrators. The following script runs TorToiSe TTS on each plain text chapter: ``` function txt_to_tts() { pushd $HOME/archives/tortoise-tts for i in $HOME/dev/novel/chapter/??-*.txt; do F=$(basename $i); VOICE=$(echo ${F%.*} | cut -c 4-); CHAPTER=$(echo ${F%.*} | cut -c -2); D=$(dirname $i); CLIPS="$D/../audio/$CHAPTER-$VOICE"; mkdir -p $CLIPS; $HOME/dev/tts/scripts/tortoise_tts.py -p high_quality -O $CLIPS -v $VOICE < $i; done popd } ``` In effect, I'm trying to instruct TorToiSe TTS to use a specific narrator for each chapter. Chapter 9 generates 121 WAV files. Of those, 28 are narrated in an unexpected random male voice and the remainder are in the desired narrator's female voice. IMO, this breaks the principle of least astonishment. (As a user, I've explicitly set the voice to use, but TorToiSe seems to pick a different voice at random.) It looks like `--cvvp-amount CVVP_AMOUNT` can reduce the likelihood of multiple speakers, which is what I think is happening, but there's no documentation on the max value (0 being default). Is the float's range from 0 to 1? I looked at the docs for temperature, and there are a few issues: * Which temperature controls whether a random voice is used versus the voice specified on the command-line? `--temperature` or `--diffusion-temperature`? * If `--temperature` controls the variance, what are its min/max/default values? The help only states "The softmax temperature of the autoregressive model."

thangalin reopened this issue

2023-03-29 20:08:21 +00:00

thangalin commented

2023-04-09 18:31:52 +00:00

By re-sampling with a higher audio quality source and tweaking the command-line arguments, the same voice is used consistently throughout the narration:

./scripts/tortoise_tts.py \
  --cvvp-amount 0.5
  -p high_quality \
  -O /tmp/audio/directory \
  -v cassandra < chapter/03.txt

By re-sampling with a higher audio quality source and tweaking the command-line arguments, the same voice is used consistently throughout the narration: ``` bash ./scripts/tortoise_tts.py \ --cvvp-amount 0.5 -p high_quality \ -O /tmp/audio/directory \ -v cassandra < chapter/03.txt ```

thangalin closed this issue

2023-04-09 18:31:52 +00:00

Sign in to join this conversation.

No Label

No Milestone

No project

No Assignees

2 Participants

Notifications

Due Date

The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/tortoise-tts#42