A Tortoise TTS Model Fine-Tuned to Speak in Russian #259

New Issue

SerCeMan · 2023-06-08T12:42:31Z

SerCeMan commented

2023-06-08 12:42:31 +00:00

Thank you for creating this project! I've had a lot of fun experimenting with different voices. I'm writing this to share a model that I've fine-tuned to speak in the Russian language. The model can be further fine-tuned to clone other Russian male voices. I've included a few examples of randomly generated voices, as well as one that has been specifically fine-tuned.

https://huggingface.co/SerCe/tortoise-tts-ruslan

I am fairly new to this field, and I am sharing this model in hopes that it might prove useful to others in their research, or simply for entertainment.

Thank you for creating this project! I've had a lot of fun experimenting with different voices. I'm writing this to share a model that I've fine-tuned to speak in the Russian language. The model can be further fine-tuned to clone other Russian male voices. I've included a few examples of randomly generated voices, as well as one that has been specifically fine-tuned. https://huggingface.co/SerCe/tortoise-tts-ruslan I am fairly new to this field, and I am sharing this model in hopes that it might prove useful to others in their research, or simply for entertainment.

👍 2

psammites commented

2023-06-08 13:25:57 +00:00

Does it require the prompt to be romanized or is cyrillic input supported? If so, did you have to create your own tokenizer?

Edit @nk990 this may be of interest to you.

Does it require the prompt to be romanized or is cyrillic input supported? If so, did you have to create your own tokenizer? Edit @nk990 this may be of interest to you.

SerCeMan commented

2023-06-09 00:06:46 +00:00

In my experiments, cyrillic input works fine. Even the original model is able to handle cyrillic input, but it would product output with a very heavy accent. Training for longer period of time on an extended Russian Open Speech To Text might produce even better results. In my cases, I only selected the supposedly cleanest parts of the dataset.

It's by no means perfect, but it produces decent results, you can check the examples that I put on huggingface.

In my experiments, cyrillic input works fine. Even the original model is able to handle cyrillic input, but it would product output with a very heavy accent. Training for longer period of time on an extended `Russian Open Speech To Text` might produce even better results. In my cases, I only selected the supposedly cleanest parts of the dataset. It's by no means perfect, but it produces decent results, you can check the examples that I put on huggingface.

surovaen commented

2023-08-02 00:15:33 +00:00

@SerCeMan thanks a lot for your work!
I'm trying to finetune your model on a specific speaker's dataset. The voice clones perfectly and the intonation is preserved, but there is one problem.
During generation stresses in words are incorrectly placed sometimes, for example:

text prompt: шахматы
correct accent: ша́хматы
generated: шaхма́ты

Did you have such problems? Are there solutions?

@SerCeMan thanks a lot for your work! I'm trying to finetune your model on a specific speaker's dataset. The voice clones perfectly and the intonation is preserved, but there is one problem. During generation stresses in words are incorrectly placed sometimes, for example: text prompt: шахматы correct accent: ша́хматы generated: шaхма́ты Did you have such problems? Are there solutions?

SerCeMan commented

2023-08-05 05:46:05 +00:00

Hi, @surovaen! Yes, I noticed this issue when I was experimenting with the model. I believe that it might be possible to fix it by training for longer on the generic youtube dataset before refining it with the Ruslan dataset.

Dimanchik commented

2023-08-08 08:33:12 +00:00

Hi bro, just recently got interested in Tortoise TTS, I downloaded your model for Russian, but I don't really understand where to throw it and in which folder, can you please explain, maybe there is a guide, I'll be glad if you can write me in tg more @Blue_Fish33