A Tortoise TTS Model Fine-Tuned to Speak in Russian #259
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
5 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#259
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Thank you for creating this project! I've had a lot of fun experimenting with different voices. I'm writing this to share a model that I've fine-tuned to speak in the Russian language. The model can be further fine-tuned to clone other Russian male voices. I've included a few examples of randomly generated voices, as well as one that has been specifically fine-tuned.
https://huggingface.co/SerCe/tortoise-tts-ruslan
I am fairly new to this field, and I am sharing this model in hopes that it might prove useful to others in their research, or simply for entertainment.
Does it require the prompt to be romanized or is cyrillic input supported? If so, did you have to create your own tokenizer?
Edit @nk990 this may be of interest to you.
In my experiments, cyrillic input works fine. Even the original model is able to handle cyrillic input, but it would product output with a very heavy accent. Training for longer period of time on an extended
Russian Open Speech To Text
might produce even better results. In my cases, I only selected the supposedly cleanest parts of the dataset.It's by no means perfect, but it produces decent results, you can check the examples that I put on huggingface.
@SerCeMan thanks a lot for your work!
I'm trying to finetune your model on a specific speaker's dataset. The voice clones perfectly and the intonation is preserved, but there is one problem.
During generation stresses in words are incorrectly placed sometimes, for example:
text prompt: шахматы
correct accent: ша́хматы
generated: шaхма́ты
Did you have such problems? Are there solutions?
Hi, @surovaen! Yes, I noticed this issue when I was experimenting with the model. I believe that it might be possible to fix it by training for longer on the generic youtube dataset before refining it with the Ruslan dataset.
Hi bro, just recently got interested in Tortoise TTS, I downloaded your model for Russian, but I don't really understand where to throw it and in which folder, can you please explain, maybe there is a guide, I'll be glad if you can write me in tg more @Blue_Fish33
Hi, thank you for project! What hardware did you use and how long did the training take?