1
1
forked from mrq/tortoise-tts

Add read script

This commit is contained in:
James Betker 2022-04-10 19:29:42 -06:00
parent 7e29c68336
commit a5f4382a10
3 changed files with 134 additions and 6 deletions

54
data/riding_hood.txt Normal file
View File

@ -0,0 +1,54 @@
Once upon a time there lived in a certain village a little country girl, the prettiest creature who was ever seen. Her mother was excessively fond of her; and her grandmother doted on her still more. This good woman had a little red riding hood made for her. It suited the girl so extremely well that everybody called her Little Red Riding Hood.
One day her mother, having made some cakes, said to her, "Go, my dear, and see how your grandmother is doing, for I hear she has been very ill. Take her a cake, and this little pot of butter."
Little Red Riding Hood set out immediately to go to her grandmother, who lived in another village.
As she was going through the wood, she met with a wolf, who had a very great mind to eat her up, but he dared not, because of some woodcutters working nearby in the forest. He asked her where she was going. The poor child, who did not know that it was dangerous to stay and talk to a wolf, said to him, "I am going to see my grandmother and carry her a cake and a little pot of butter from my mother."
"Does she live far off?" said the wolf
"Oh I say," answered Little Red Riding Hood; "it is beyond that mill you see there, at the first house in the village."
"Well," said the wolf, "and I'll go and see her too. I'll go this way and go you that, and we shall see who will be there first."
The wolf ran as fast as he could, taking the shortest path, and the little girl took a roundabout way, entertaining herself by gathering nuts, running after butterflies, and gathering bouquets of little flowers. It was not long before the wolf arrived at the old woman's house. He knocked at the door: tap, tap.
"Who's there?"
"Your grandchild, Little Red Riding Hood," replied the wolf, counterfeiting her voice; "who has brought you a cake and a little pot of butter sent you by mother."
The good grandmother, who was in bed, because she was somewhat ill, cried out, "Pull the bobbin, and the latch will go up."
The wolf pulled the bobbin, and the door opened, and then he immediately fell upon the good woman and ate her up in a moment, for it been more than three days since he had eaten. He then shut the door and got into the grandmother's bed, expecting Little Red Riding Hood, who came some time afterwards and knocked at the door: tap, tap.
"Who's there?"
Little Red Riding Hood, hearing the big voice of the wolf, was at first afraid; but believing her grandmother had a cold and was hoarse, answered, "It is your grandchild Little Red Riding Hood, who has brought you a cake and a little pot of butter mother sends you."
The wolf cried out to her, softening his voice as much as he could, "Pull the bobbin, and the latch will go up."
Little Red Riding Hood pulled the bobbin, and the door opened.
The wolf, seeing her come in, said to her, hiding himself under the bedclothes, "Put the cake and the little pot of butter upon the stool, and come get into bed with me."
Little Red Riding Hood took off her clothes and got into bed. She was greatly amazed to see how her grandmother looked in her nightclothes, and said to her, "Grandmother, what big arms you have!"
"All the better to hug you with, my dear."
"Grandmother, what big legs you have!"
"All the better to run with, my child."
"Grandmother, what big ears you have!"
"All the better to hear with, my child."
"Grandmother, what big eyes you have!"
"All the better to see with, my child."
"Grandmother, what big teeth you have got!"
"All the better to eat you up with."
And, saying these words, this wicked wolf fell upon Little Red Riding Hood, and ate her all up.

View File

@ -5,7 +5,7 @@ import torch
import torch.nn.functional as F import torch.nn.functional as F
import torchaudio import torchaudio
from api_new_autoregressive import TextToSpeech, load_conditioning from api import TextToSpeech, load_conditioning
from utils.audio import load_audio from utils.audio import load_audio
from utils.tokenizer import VoiceBpeTokenizer from utils.tokenizer import VoiceBpeTokenizer
@ -18,6 +18,7 @@ if __name__ == '__main__':
'harris': ['voices/harris/1.wav', 'voices/harris/2.wav'], 'harris': ['voices/harris/1.wav', 'voices/harris/2.wav'],
'lescault': ['voices/lescault/1.wav', 'voices/lescault/2.wav'], 'lescault': ['voices/lescault/1.wav', 'voices/lescault/2.wav'],
'otto': ['voices/otto/1.wav', 'voices/otto/2.wav'], 'otto': ['voices/otto/1.wav', 'voices/otto/2.wav'],
'obama': ['voices/obama/1.wav', 'voices/obama/2.wav'],
# Female voices # Female voices
'atkins': ['voices/atkins/1.wav', 'voices/atkins/2.wav'], 'atkins': ['voices/atkins/1.wav', 'voices/atkins/2.wav'],
'grace': ['voices/grace/1.wav', 'voices/grace/2.wav'], 'grace': ['voices/grace/1.wav', 'voices/grace/2.wav'],
@ -27,8 +28,8 @@ if __name__ == '__main__':
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument('-text', type=str, help='Text to speak.', default="I am a language model that has learned to speak.") parser.add_argument('-text', type=str, help='Text to speak.', default="I am a language model that has learned to speak.")
parser.add_argument('-voice', type=str, help='Use a preset conditioning voice (defined above). Overrides cond_path.', default='dotrice,harris,lescault,otto,atkins,grace,kennard,mol') parser.add_argument('-voice', type=str, help='Use a preset conditioning voice (defined above). Overrides cond_path.', default='obama,dotrice,harris,lescault,otto,atkins,grace,kennard,mol')
parser.add_argument('-num_samples', type=int, help='How many total outputs the autoregressive transformer should produce.', default=32) parser.add_argument('-num_samples', type=int, help='How many total outputs the autoregressive transformer should produce.', default=128)
parser.add_argument('-batch_size', type=int, help='How many samples to process at once in the autoregressive model.', default=16) parser.add_argument('-batch_size', type=int, help='How many samples to process at once in the autoregressive model.', default=16)
parser.add_argument('-num_diffusion_samples', type=int, help='Number of outputs that progress to the diffusion stage.', default=16) parser.add_argument('-num_diffusion_samples', type=int, help='Number of outputs that progress to the diffusion stage.', default=16)
parser.add_argument('-output_path', type=str, help='Where to store outputs.', default='results/') parser.add_argument('-output_path', type=str, help='Where to store outputs.', default='results/')
@ -38,9 +39,6 @@ if __name__ == '__main__':
tts = TextToSpeech(autoregressive_batch_size=args.batch_size) tts = TextToSpeech(autoregressive_batch_size=args.batch_size)
for voice in args.voice.split(','): for voice in args.voice.split(','):
tokenizer = VoiceBpeTokenizer()
text = torch.IntTensor(tokenizer.encode(args.text)).unsqueeze(0).cuda()
text = F.pad(text, (0,1)) # This may not be necessary.
cond_paths = preselected_cond_voices[voice] cond_paths = preselected_cond_voices[voice]
conds = [] conds = []
for cond_path in cond_paths: for cond_path in cond_paths:

76
read.py Normal file
View File

@ -0,0 +1,76 @@
import argparse
import os
import torch
import torch.nn.functional as F
import torchaudio
from api import TextToSpeech, load_conditioning
from utils.audio import load_audio
from utils.tokenizer import VoiceBpeTokenizer
def split_and_recombine_text(texts, desired_length=200, max_len=300):
# TODO: also split across '!' and '?'. Attempt to keep quotations together.
texts = [s.strip() + "." for s in texts.split('.')]
i = 0
while i < len(texts):
ltxt = texts[i]
if len(ltxt) >= desired_length or i == len(texts)-1:
i += 1
continue
if len(ltxt) + len(texts[i+1]) > max_len:
i += 1
continue
texts[i] = f'{ltxt} {texts[i+1]}'
texts.pop(i+1)
return texts
if __name__ == '__main__':
# These are voices drawn randomly from the training set. You are free to substitute your own voices in, but testing
# has shown that the model does not generalize to new voices very well.
preselected_cond_voices = {
# Male voices
'dotrice': ['voices/dotrice/1.wav', 'voices/dotrice/2.wav'],
'harris': ['voices/harris/1.wav', 'voices/harris/2.wav'],
'lescault': ['voices/lescault/1.wav', 'voices/lescault/2.wav'],
'otto': ['voices/otto/1.wav', 'voices/otto/2.wav'],
'obama': ['voices/obama/1.wav', 'voices/obama/2.wav'],
'carlin': ['voices/carlin/1.wav', 'voices/carlin/2.wav'],
# Female voices
'atkins': ['voices/atkins/1.wav', 'voices/atkins/2.wav'],
'grace': ['voices/grace/1.wav', 'voices/grace/2.wav'],
'kennard': ['voices/kennard/1.wav', 'voices/kennard/2.wav'],
'mol': ['voices/mol/1.wav', 'voices/mol/2.wav'],
'lj': ['voices/lj/1.wav', 'voices/lj/2.wav'],
}
parser = argparse.ArgumentParser()
parser.add_argument('-textfile', type=str, help='A file containing the text to read.', default="data/riding_hood.txt")
parser.add_argument('-voice', type=str, help='Use a preset conditioning voice (defined above). Overrides cond_path.', default='dotrice')
parser.add_argument('-num_samples', type=int, help='How many total outputs the autoregressive transformer should produce.', default=256)
parser.add_argument('-batch_size', type=int, help='How many samples to process at once in the autoregressive model.', default=16)
parser.add_argument('-output_path', type=str, help='Where to store outputs.', default='results/longform/')
args = parser.parse_args()
os.makedirs(args.output_path, exist_ok=True)
with open(args.textfile, 'r', encoding='utf-8') as f:
text = ''.join([l for l in f.readlines()])
texts = split_and_recombine_text(text)
tts = TextToSpeech(autoregressive_batch_size=args.batch_size)
priors = []
for j, text in enumerate(texts):
cond_paths = preselected_cond_voices[args.voice]
conds = priors.copy()
for cond_path in cond_paths:
c = load_audio(cond_path, 22050)
conds.append(c)
gen = tts.tts(text, conds, num_autoregressive_samples=args.num_samples, temperature=.7, top_p=.7)
torchaudio.save(os.path.join(args.output_path, f'{j}.wav'), gen.squeeze(0).cpu(), 24000)
priors.append(torchaudio.functional.resample(gen, 24000, 22050).squeeze(0))
while len(priors) > 2:
priors.pop(0)