Update readme

2022-03-10 23:32:35 -07:00 · 2022-03-10 23:32:35 -07:00 · f36bb6006d
commit f36bb6006d
parent 873c399ff1
1 changed files with 33 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -36,7 +36,32 @@ Based on [ImprovedDiffusion by openai](https://github.com/openai/improved-diffus

 ## How do I use this?

-<incoming>
+Check out the colab: https://colab.research.google.com/drive/1wVVqUPqwiDBUVeWWOUNglpGhU3hg_cbR?usp=sharing
+
+Or on a computer with a GPU (with >=16GB of VRAM):
+```shell
+git clone https://github.com/neonbjb/tortoise-tts.git
+cd tortoise-tts
+pip install -r requirements.txt
+python do_tts.py
+```
+
+## Hand-picked TTS samples
+
+I generated ~250 samples from 23 text prompts and 8 voices. The text prompts have never been seen by the model. The
+voices were pulled from the training set.
+
+All of the samples can be found in the results/ folder of this repo.
+
+I handpicked a few to show what the model is capable of:
+[Atkins - Road not taken](results/favorites/atkins_road_not_taken.wav)
+[Dotrice - Rolling Stone interview](results/favorites/dotrice_rollingstone.wav)
+[Dotrice - 'Ornaments' from tacotron test set](results/favorites/dotrice_tacotron_samp1.wav)
+[Kennard - 'Acute emotional intelligence' from tacotron test set](results/favorites/kennard_tacotron_samp2.wav)
+[Mol - Because I could not stop for death](results/favorites/mol_dickenson.wav)
+[Mol - Obama](results/favorites/mol_obama.wav)
+
+Prosody is remarkably good for poetry, despite the fact that it was never trained on poetry.

 ## How do I train this?

@ -45,3 +70,9 @@ resources for the better part of 6 months. It uses a dataset I've gathered, refi
 a lot of audio data which I cannot distribute because of copywrite or no open licenses.

 With that said, I'm willing to help you out if you really want to give it a shot. DM me.
+
+## Looking forward
+
+I'm not satisfied with this yet. Treat this as a "sneak peek" and check back in a couple of months. I think the concept
+is sound, but there are a few hurdles to overcome to get sample quality up. I have been doing major tweaks to the
+diffusion model and should have something new and much better soon.