diff --git a/examples/naturalspeech_comparison/fibers/naturalspeech.mp3 b/examples/naturalspeech_comparison/fibers/naturalspeech.mp3 new file mode 100644 index 0000000..57e540e Binary files /dev/null and b/examples/naturalspeech_comparison/fibers/naturalspeech.mp3 differ diff --git a/examples/naturalspeech_comparison/fibers/tortoise.mp3 b/examples/naturalspeech_comparison/fibers/tortoise.mp3 new file mode 100644 index 0000000..1788df8 Binary files /dev/null and b/examples/naturalspeech_comparison/fibers/tortoise.mp3 differ diff --git a/examples/naturalspeech_comparison/lax/naturalspeech.mp3 b/examples/naturalspeech_comparison/lax/naturalspeech.mp3 new file mode 100644 index 0000000..ebcb779 Binary files /dev/null and b/examples/naturalspeech_comparison/lax/naturalspeech.mp3 differ diff --git a/examples/naturalspeech_comparison/lax/tortoise.mp3 b/examples/naturalspeech_comparison/lax/tortoise.mp3 new file mode 100644 index 0000000..2901215 Binary files /dev/null and b/examples/naturalspeech_comparison/lax/tortoise.mp3 differ diff --git a/examples/naturalspeech_comparison/maltby/naturalspeech.mp3 b/examples/naturalspeech_comparison/maltby/naturalspeech.mp3 new file mode 100644 index 0000000..4cee574 Binary files /dev/null and b/examples/naturalspeech_comparison/maltby/naturalspeech.mp3 differ diff --git a/examples/naturalspeech_comparison/maltby/tortoise.mp3 b/examples/naturalspeech_comparison/maltby/tortoise.mp3 new file mode 100644 index 0000000..1831056 Binary files /dev/null and b/examples/naturalspeech_comparison/maltby/tortoise.mp3 differ diff --git a/tortoise_v2_examples.html b/tortoise_v2_examples.html index 6b4b6c7..2fb68d3 100644 --- a/tortoise_v2_examples.html +++ b/tortoise_v2_examples.html @@ -32,10 +32,10 @@ available at https://github.co

Short-form


-

Compared to Tacotron2 (with the LJSpeech voice): 🐢

+

Comparisons (with the LJSpeech voice): 🐢

LJSpeech is a popular dataset used to train small-scale TTS models. TorToiSe is a multi-voice model, following is how -it renders the LJSpeech voice with no fine-tuning, compared with results for the same text from the popular Tacotron2 -model paired with the Waveglow transformer:

+it renders the LJSpeech voice with and without fine-tuning, compared with results for the same text from the popular Tacotron2 +model paired with the Waveglow vocoder.

@@ -50,6 +50,22 @@ model paired with the Waveglow transformer:

Tacotron2+WaveglowTorToiSeTorToiSe Finetuned


+

NaturalVoice is a SOTA TTS engine developed by Microsoft Research Asia in May 2022. It features realistic prosody +and end-to-end generation with no need for a vocoder. While not much has actually been released about this model other +than five samples, those samples are quite good and I would consider this the most competitive TTS engine out there +right now.

+ + + + + + +
Natural VoiceTorToiSe Finetuned





+

+

It is important to note that it is not actually fair to compare any of these models: Tortoise is a multi-voice probabilistic +model trained on millions of hours of speech with an exceptionally slow inference time. Tacotron and NaturalVoice are efficient, +fast, single-voice models trained on 24 hours of speech. Unfortunately, there isn't much in the way of actually comparable +research to Tortoise.

All Results 🐢

Following are all the results from which the hand-picked results were drawn from. Also included is the reference