How to finetune a target voice(about 20 hours) without decay of robust #486
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#486
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I train tortoise with 10w hours of chinese audio, and then finetune it with 20 hours dataset of a female voice.
I tried:
Thought the synthesized waves sounds good, but the robustness declained.
Repeat sometimes happens especially for long sentences.
For example, the input text is "I think .. blabla ......, I am happy",
the syntheszied wave is "I think .. blabla ......, I am happy, I am happy"
How to finetune to keep robustness?