From 048b0996bcaa0e9bb9584c389894d1cf09ac4e89 Mon Sep 17 00:00:00 2001 From: James Betker Date: Thu, 10 Mar 2022 23:32:35 -0700 Subject: [PATCH] Update readme --- README.md | 35 +++++++++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 61dd065..1013f4e 100644 --- a/README.md +++ b/README.md @@ -36,7 +36,32 @@ Based on [ImprovedDiffusion by openai](https://github.com/openai/improved-diffus ## How do I use this? - +Check out the colab: https://colab.research.google.com/drive/1wVVqUPqwiDBUVeWWOUNglpGhU3hg_cbR?usp=sharing + +Or on a computer with a GPU (with >=16GB of VRAM): +```shell +git clone https://github.com/neonbjb/tortoise-tts.git +cd tortoise-tts +pip install -r requirements.txt +python do_tts.py +``` + +## Hand-picked TTS samples + +I generated ~250 samples from 23 text prompts and 8 voices. The text prompts have never been seen by the model. The +voices were pulled from the training set. + +All of the samples can be found in the results/ folder of this repo. + +I handpicked a few to show what the model is capable of: +[Atkins - Road not taken](results/favorites/atkins_road_not_taken.wav) +[Dotrice - Rolling Stone interview](results/favorites/dotrice_rollingstone.wav) +[Dotrice - 'Ornaments' from tacotron test set](results/favorites/dotrice_tacotron_samp1.wav) +[Kennard - 'Acute emotional intelligence' from tacotron test set](results/favorites/kennard_tacotron_samp2.wav) +[Mol - Because I could not stop for death](results/favorites/mol_dickenson.wav) +[Mol - Obama](results/favorites/mol_obama.wav) + +Prosody is remarkably good for poetry, despite the fact that it was never trained on poetry. ## How do I train this? @@ -44,4 +69,10 @@ Frankly - you don't. Building this model has been a labor of love for me, consum resources for the better part of 6 months. It uses a dataset I've gathered, refined and transcribed that consists of a lot of audio data which I cannot distribute because of copywrite or no open licenses. -With that said, I'm willing to help you out if you really want to give it a shot. DM me. \ No newline at end of file +With that said, I'm willing to help you out if you really want to give it a shot. DM me. + +## Looking forward + +I'm not satisfied with this yet. Treat this as a "sneak peek" and check back in a couple of months. I think the concept +is sound, but there are a few hurdles to overcome to get sample quality up. I have been doing major tweaks to the +diffusion model and should have something new and much better soon. \ No newline at end of file