This commit is contained in:
James Betker 2022-05-06 00:11:10 -06:00
parent b327be56c6
commit ffd0238a16
32 changed files with 77 additions and 34 deletions

View File

@ -9,6 +9,11 @@ This repo contains all the code needed to run Tortoise TTS in inference mode.
### New features
#### v2.2; 2022/5/5
- Added several new voices from the training set.
- Automated redaction. Wrap the text you want to use to prompt the model but not be spoken in brackets.
- Bug fixes
#### v2.1; 2022/5/2
- Added ability to produce totally random voices.
- Added ability to download voice conditioning latent via a script, and then use a user-provided conditioning latent.
@ -95,11 +100,9 @@ For the those in the ML space: this is created by projecting a random vector ont
### Provided voices
This repo comes with several pre-packaged voices. You will be familiar with many of them. :)
Most of the provided voices were not found in the training set. Experimentally, it seems that voices from the training set
produce more realistic outputs then those outside of the training set. Any voice prepended with "train" came from the
training set.
This repo comes with several pre-packaged voices. Voices prepended with "train_" came from the training set and perform
far better than the others. If your goal is high quality speech, I recommend you pick one of them. If you want to see
what Tortoise can do for zero-shot mimicing, take a look at the others.
### Adding a new voice

Binary file not shown.

Binary file not shown.

BIN
examples/prompting/sad.mp3 Normal file

Binary file not shown.

Binary file not shown.

View File

@ -1,4 +0,0 @@
[ViewState]
Mode=
Vid=
FolderType=Generic

View File

@ -6,7 +6,7 @@ with open("README.md", "r", encoding="utf-8") as fh:
setuptools.setup(
name="TorToiSe",
packages=setuptools.find_packages(),
version="2.1.3",
version="2.2.0",
author="James Betker",
author_email="james@adamant.ai",
description="A high quality multi-voice text-to-speech library",

View File

@ -284,8 +284,6 @@ class UnivNetGenerator(nn.Module):
self.remove_weight_norm()
def remove_weight_norm(self):
print('Removing weight norm...')
nn.utils.remove_weight_norm(self.conv_pre)
for layer in self.conv_post:

View File

@ -137,7 +137,7 @@ class TacotronSTFT(torch.nn.Module):
self.stft_fn = STFT(filter_length, hop_length, win_length)
from librosa.filters import mel as librosa_mel_fn
mel_basis = librosa_mel_fn(
sampling_rate, filter_length, n_mel_channels, mel_fmin, mel_fmax)
sr=sampling_rate, n_fft=filter_length, n_mels=n_mel_channels, fmin=mel_fmin, fmax=mel_fmax)
mel_basis = torch.from_numpy(mel_basis).float()
self.register_buffer('mel_basis', mel_basis)

View File

@ -66,7 +66,7 @@ class Wav2VecAlignment:
logits = logits[0]
pred_string = self.tokenizer.decode(logits.argmax(-1).tolist())
fixed_expectation = max_alignment(expected_text, pred_string)
fixed_expectation = max_alignment(expected_text.lower(), pred_string)
w2v_compression = orig_len // logits.shape[0]
expected_tokens = self.tokenizer.encode(fixed_expectation)
expected_chars = list(fixed_expectation)
@ -100,7 +100,10 @@ class Wav2VecAlignment:
break
pop_till_you_win()
assert len(expected_tokens) == 0, "This shouldn't happen. My coding sucks."
if not (len(expected_tokens) == 0 and len(alignments) == len(expected_text)):
torch.save([audio, expected_text], 'alignment_debug.pth')
assert False, "Something went wrong with the alignment algorithm. I've dumped a file, 'alignment_debug.pth' to" \
"your current working directory. Please report this along with the file so it can get fixed."
# Now fix up alignments. Anything with -1 should be interpolated.
alignments.append(orig_len) # This'll get removed but makes the algorithm below more readable.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -1,4 +1,16 @@
<html><head><title>These words were never spoken.</title></head><body><h1>Handpicked results</h1><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/favorites/atkins_mha.mp3" type="audio/mp3"></audio><br>
<html><head><meta charset="UTF-8"><title>TorToiSe - These words were never spoken.</title></head>
<body>
<h1>Introduction 🐢 </h1>
<p>TorToiSe is a text-to-speech program built in April 2022 by jbetker@. TorToiSe is open source, with trained model weights
available at <a href="https://github.com/neonbjb/tortoise-tts">https://github.com/neonbjb/tortoise-tts</a></p>
<p>This page demonstrates some of the results of TorToiSe.</p>
<h1>Handpicked results 🐢 </h1>
<p>Following are several particularly good results generated by the model.</p>
<h2>Short-form</h2>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/favorites/atkins_mha.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/favorites/atkins_omicron.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/favorites/atkins_value.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/favorites/daniel_craig_dumbledore.mp3" type="audio/mp3"></audio><br>
@ -16,14 +28,28 @@
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/favorites/patrick_stewart_secret_of_life.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/favorites/robert_deniro_review.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/favorites/william_shatner_spacecraft_interview.mp3" type="audio/mp3"></audio><br>
<h1>Handpicked longform result:<h1><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/favorite_riding_hood.mp3" type="audio/mp3"></audio><br>
<h1>Compared to Tacotron2 (with the LJSpeech voice):</h1><table><th>Tacotron2+Waveglow</th><th>TorToiSe</th><tr><td><audio controls="" style="width: 300px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/tacotron_comparison/2-tacotron2.mp3" type="audio/mp3"></audio><br>
<h2>Short-form</h2>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/favorite_riding_hood.mp3" type="audio/mp3"></audio><br>
<h1>Compared to Tacotron2 (with the LJSpeech voice): 🐢 </h1>
<p>LJSpeech is a popular dataset used to train small-scale TTS models. TorToiSe is a multi-voice model, following is how
it renders the LJSpeech voice with no fine-tuning, compared with results for the same text from the popular Tacotron2
model paired with the Waveglow transformer:</p>
<table><th>Tacotron2+Waveglow</th><th>TorToiSe</th><tr><td><audio controls="" style="width: 300px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/tacotron_comparison/2-tacotron2.mp3" type="audio/mp3"></audio><br>
</td><td><audio controls="" style="width: 300px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/tacotron_comparison/2-tortoise.mp3" type="audio/mp3"></audio><br>
</td></tr><tr><td><audio controls="" style="width: 300px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/tacotron_comparison/3-tacotron2.mp3" type="audio/mp3"></audio><br>
</td><td><audio controls="" style="width: 300px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/tacotron_comparison/3-tortoise.mp3" type="audio/mp3"></audio><br>
</td></tr><tr><td><audio controls="" style="width: 300px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/tacotron_comparison/4-tacotron2.mp3" type="audio/mp3"></audio><br>
</td><td><audio controls="" style="width: 300px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/tacotron_comparison/4-tortoise.mp3" type="audio/mp3"></audio><br>
</td></tr></table><h1>Various spoken texts for all voices:<h1><table><th>text</th><th>angie</th><th>daniel</th><th>deniro</th><th>emma</th><th>freeman</th><th>geralt</th><th>halle</th><th>jlaw</th><th>lj</th><th>myself</th><th>pat</th><th>snakes</th><th>tom</th><th>train_atkins</th><th>train_dotrice</th><th>train_kennard</th><th>weaver</th><th>william</th>
</td></tr></table>
<h1>All Results 🐢</h1>
<p> Following are all the results from which the hand-picked results were drawn from. Also included is the reference
audio that the program is trying to mimic. This will give you a better sense of how TorToiSe really performs.</p>
<h2>Short-form</h2>
<table><th>text</th><th>angie</th><th>daniel</th><th>deniro</th><th>emma</th><th>freeman</th><th>geralt</th><th>halle</th><th>jlaw</th><th>lj</th><th>myself</th><th>pat</th><th>snakes</th><th>tom</th><th>train_atkins</th><th>train_dotrice</th><th>train_kennard</th><th>weaver</th><th>william</th>
<tr><td>reference clip</td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/angie/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/daniel/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/deniro/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/emma/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/freeman/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/geralt/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/halle/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/jlaw/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/lj/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/myself/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/pat/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/snakes/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/tom/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/train_atkins/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/train_dotrice/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/train_kennard/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/weaver/1.wav" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/voices/william/1.wav" type="audio/mp3"></audio></td></tr>
<tr><td>autoregressive_ml</td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/angie.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/daniel.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/deniro.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/emma.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/freeman.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/geralt.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/halle.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/jlaw.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/lj.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/myself.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/pat.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/snakes.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/tom.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/train_atkins.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/train_dotrice.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/train_kennard.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/weaver.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/autoregressive_ml/william.mp3" type="audio/mp3"></audio></td></tr>
<tr><td>bengio_it_needs_to_know_what_is_bad</td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/angie.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/daniel.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/deniro.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/emma.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/freeman.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/geralt.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/halle.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/jlaw.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/lj.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/myself.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/pat.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/snakes.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/tom.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/train_atkins.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/train_dotrice.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/train_kennard.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/weaver.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/bengio_it_needs_to_know_what_is_bad/william.mp3" type="audio/mp3"></audio></td></tr>
@ -44,19 +70,36 @@
<tr><td>tacotron2_sample3</td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/angie.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/daniel.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/deniro.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/emma.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/freeman.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/geralt.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/halle.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/jlaw.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/lj.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/myself.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/pat.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/snakes.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/tom.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/train_atkins.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/train_dotrice.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/train_kennard.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/weaver.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample3/william.mp3" type="audio/mp3"></audio></td></tr>
<tr><td>tacotron2_sample4</td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/angie.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/daniel.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/deniro.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/emma.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/freeman.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/geralt.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/halle.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/jlaw.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/lj.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/myself.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/pat.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/snakes.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/tom.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/train_atkins.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/train_dotrice.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/train_kennard.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/weaver.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/tacotron2_sample4/william.mp3" type="audio/mp3"></audio></td></tr>
<tr><td>watts_this_is_the_real_secret_of_life</td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/angie.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/daniel.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/deniro.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/emma.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/freeman.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/geralt.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/halle.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/jlaw.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/lj.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/myself.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/pat.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/snakes.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/tom.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/train_atkins.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/train_dotrice.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/train_kennard.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/weaver.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/watts_this_is_the_real_secret_of_life/william.mp3" type="audio/mp3"></audio></td></tr>
<tr><td>wilde_nowadays_people_know_the_price</td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/angie.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/daniel.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/deniro.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/emma.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/freeman.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/geralt.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/halle.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/jlaw.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/lj.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/myself.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/pat.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/snakes.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/tom.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/train_atkins.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/train_dotrice.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/train_kennard.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/weaver.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/william.mp3" type="audio/mp3"></audio></td></tr></table><h1>Longform result for all voices:</h1><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/angelina.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/craig.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/deniro.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/emma.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/freeman.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/geralt.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/halle.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/jlaw.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/lj.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/myself.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/pat.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/snakes.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/tom.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/weaver.mp3" type="audio/mp3"></audio><br>
<audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/william.mp3" type="audio/mp3"></audio><br>
<tr><td>wilde_nowadays_people_know_the_price</td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/angie.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/daniel.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/deniro.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/emma.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/freeman.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/geralt.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/halle.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/jlaw.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/lj.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/myself.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/pat.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/snakes.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/tom.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/train_atkins.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/train_dotrice.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/train_kennard.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/weaver.mp3" type="audio/mp3"></audio></td><td><audio controls="" style="width: 150px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/various/wilde_nowadays_people_know_the_price/william.mp3" type="audio/mp3"></audio></td></tr></table>
<h2>Long-form</h2>
<b>Angelina:</b> <audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/angelina.mp3" type="audio/mp3"></audio><br>
<b>Craig:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/craig.mp3" type="audio/mp3"></audio><br>
<b>Deniro:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/deniro.mp3" type="audio/mp3"></audio><br>
<b>Emma:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/emma.mp3" type="audio/mp3"></audio><br>
<b>Freeman:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/freeman.mp3" type="audio/mp3"></audio><br>
<b>Geralt:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/geralt.mp3" type="audio/mp3"></audio><br>
<b>Halle:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/halle.mp3" type="audio/mp3"></audio><br>
<b>Jlaw:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/jlaw.mp3" type="audio/mp3"></audio><br>
<b>LJ:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/lj.mp3" type="audio/mp3"></audio><br>
<b>Myself:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/myself.mp3" type="audio/mp3"></audio><br>
<b>Pat:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/pat.mp3" type="audio/mp3"></audio><br>
<b>Snakes:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/snakes.mp3" type="audio/mp3"></audio><br>
<b>Tom:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/tom.mp3" type="audio/mp3"></audio><br>
<b>Weaver:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/weaver.mp3" type="audio/mp3"></audio><br>
<b>William:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/riding_hood/william.mp3" type="audio/mp3"></audio><br>
<h1>Prompt Engineering 🐢</h1>
<p>Tortoise is capable of "prompt-engineering" in that tone and prosody is affected by the emotions inflected in the words
fed to the program. For example, prompting the model with "[I am so angry,] I went to the park and threw a ball" will
result in it outputting "I went to the park and threw the ball" with an angry tone.</p>
<p>Following are a few examples of different prompts. The effect is subtle, but is definitely there. Many voices are
less effected by this.</p>
<b>Angry:</b> <audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/prompting/angry.mp3" type="audio/mp3"></audio><br>
<b>Sad:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/prompting/sad.mp3" type="audio/mp3"></audio><br>
<b>Happy:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/prompting/happy.mp3" type="audio/mp3"></audio><br>
<b>Scared:</b><audio controls="" style="width: 600px;"><source src="https://github.com/neonbjb/tortoise-tts/raw/main/examples/prompting/scared.mp3" type="audio/mp3"></audio><br>
</body></html>