A Tortoise TTS Model Fine-Tuned to Speak in Russian #259

Open
opened 2023-06-08 12:42:31 +00:00 by SerCeMan · 6 comments

Thank you for creating this project! I've had a lot of fun experimenting with different voices. I'm writing this to share a model that I've fine-tuned to speak in the Russian language. The model can be further fine-tuned to clone other Russian male voices. I've included a few examples of randomly generated voices, as well as one that has been specifically fine-tuned.

https://huggingface.co/SerCe/tortoise-tts-ruslan

I am fairly new to this field, and I am sharing this model in hopes that it might prove useful to others in their research, or simply for entertainment.

Thank you for creating this project! I've had a lot of fun experimenting with different voices. I'm writing this to share a model that I've fine-tuned to speak in the Russian language. The model can be further fine-tuned to clone other Russian male voices. I've included a few examples of randomly generated voices, as well as one that has been specifically fine-tuned. https://huggingface.co/SerCe/tortoise-tts-ruslan I am fairly new to this field, and I am sharing this model in hopes that it might prove useful to others in their research, or simply for entertainment.

Does it require the prompt to be romanized or is cyrillic input supported? If so, did you have to create your own tokenizer?

Edit @nk990 this may be of interest to you.

Does it require the prompt to be romanized or is cyrillic input supported? If so, did you have to create your own tokenizer? Edit @nk990 this may be of interest to you.
Author

In my experiments, cyrillic input works fine. Even the original model is able to handle cyrillic input, but it would product output with a very heavy accent. Training for longer period of time on an extended Russian Open Speech To Text might produce even better results. In my cases, I only selected the supposedly cleanest parts of the dataset.

It's by no means perfect, but it produces decent results, you can check the examples that I put on huggingface.

In my experiments, cyrillic input works fine. Even the original model is able to handle cyrillic input, but it would product output with a very heavy accent. Training for longer period of time on an extended `Russian Open Speech To Text` might produce even better results. In my cases, I only selected the supposedly cleanest parts of the dataset. It's by no means perfect, but it produces decent results, you can check the examples that I put on huggingface.

@SerCeMan thanks a lot for your work!
I'm trying to finetune your model on a specific speaker's dataset. The voice clones perfectly and the intonation is preserved, but there is one problem.
During generation stresses in words are incorrectly placed sometimes, for example:

text prompt: шахматы
correct accent: ша́хматы
generated: шaхма́ты

Did you have such problems? Are there solutions?

@SerCeMan thanks a lot for your work! I'm trying to finetune your model on a specific speaker's dataset. The voice clones perfectly and the intonation is preserved, but there is one problem. During generation stresses in words are incorrectly placed sometimes, for example: text prompt: шахматы correct accent: ша́хматы generated: шaхма́ты Did you have such problems? Are there solutions?
Author

Hi, @surovaen! Yes, I noticed this issue when I was experimenting with the model. I believe that it might be possible to fix it by training for longer on the generic youtube dataset before refining it with the Ruslan dataset.

Hi, @surovaen! Yes, I noticed this issue when I was experimenting with the model. I believe that it might be possible to fix it by training for longer on the generic youtube dataset before refining it with the Ruslan dataset.

Hi bro, just recently got interested in Tortoise TTS, I downloaded your model for Russian, but I don't really understand where to throw it and in which folder, can you please explain, maybe there is a guide, I'll be glad if you can write me in tg more @Blue_Fish33

Hi bro, just recently got interested in Tortoise TTS, I downloaded your model for Russian, but I don't really understand where to throw it and in which folder, can you please explain, maybe there is a guide, I'll be glad if you can write me in tg more @Blue_Fish33

Hi, thank you for project! What hardware did you use and how long did the training take?

Hi, thank you for project! What hardware did you use and how long did the training take?
Sign in to join this conversation.
No Milestone
No project
No Assignees
5 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#259
No description provided.