diff --git a/examples/naturalspeech_comparison/fibers/naturalspeech.mp3 b/examples/naturalspeech_comparison/fibers/naturalspeech.mp3
new file mode 100644
index 0000000..57e540e
Binary files /dev/null and b/examples/naturalspeech_comparison/fibers/naturalspeech.mp3 differ
diff --git a/examples/naturalspeech_comparison/fibers/tortoise.mp3 b/examples/naturalspeech_comparison/fibers/tortoise.mp3
new file mode 100644
index 0000000..1788df8
Binary files /dev/null and b/examples/naturalspeech_comparison/fibers/tortoise.mp3 differ
diff --git a/examples/naturalspeech_comparison/lax/naturalspeech.mp3 b/examples/naturalspeech_comparison/lax/naturalspeech.mp3
new file mode 100644
index 0000000..ebcb779
Binary files /dev/null and b/examples/naturalspeech_comparison/lax/naturalspeech.mp3 differ
diff --git a/examples/naturalspeech_comparison/lax/tortoise.mp3 b/examples/naturalspeech_comparison/lax/tortoise.mp3
new file mode 100644
index 0000000..2901215
Binary files /dev/null and b/examples/naturalspeech_comparison/lax/tortoise.mp3 differ
diff --git a/examples/naturalspeech_comparison/maltby/naturalspeech.mp3 b/examples/naturalspeech_comparison/maltby/naturalspeech.mp3
new file mode 100644
index 0000000..4cee574
Binary files /dev/null and b/examples/naturalspeech_comparison/maltby/naturalspeech.mp3 differ
diff --git a/examples/naturalspeech_comparison/maltby/tortoise.mp3 b/examples/naturalspeech_comparison/maltby/tortoise.mp3
new file mode 100644
index 0000000..1831056
Binary files /dev/null and b/examples/naturalspeech_comparison/maltby/tortoise.mp3 differ
diff --git a/tortoise_v2_examples.html b/tortoise_v2_examples.html
index 6b4b6c7..2fb68d3 100644
--- a/tortoise_v2_examples.html
+++ b/tortoise_v2_examples.html
@@ -32,10 +32,10 @@ available at https://github.co
LJSpeech is a popular dataset used to train small-scale TTS models. TorToiSe is a multi-voice model, following is how
-it renders the LJSpeech voice with no fine-tuning, compared with results for the same text from the popular Tacotron2
-model paired with the Waveglow transformer:Short-form
-Compared to Tacotron2 (with the LJSpeech voice): 🐢
+Comparisons (with the LJSpeech voice): 🐢
Tacotron2+Waveglow | TorToiSe | TorToiSe Finetuned |
---|---|
NaturalVoice is a SOTA TTS engine developed by Microsoft Research Asia in May 2022. It features realistic prosody +and end-to-end generation with no need for a vocoder. While not much has actually been released about this model other +than five samples, those samples are quite good and I would consider this the most competitive TTS engine out there +right now.
+Natural Voice | TorToiSe Finetuned | +
---|---|
+ |
It is important to note that it is not actually fair to compare any of these models: Tortoise is a multi-voice probabilistic +model trained on millions of hours of speech with an exceptionally slow inference time. Tacotron and NaturalVoice are efficient, +fast, single-voice models trained on 24 hours of speech. Unfortunately, there isn't much in the way of actually comparable +research to Tortoise.
Following are all the results from which the hand-picked results were drawn from. Also included is the reference