[Question] Segments : what are their role in the training? #468
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#468
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hey there,
I'm building a tool to help me process and prepare audiobooks to feed the ai-voice-cloning tool. I've come to the point where I want to transcribe my audiofiles. After diving into the ai-voice-cloning code (utils.py in particular), I found out that I had to use "medium.en" as a model to get the same segments as you (like 2 instead of 4 for instance).
But what I'm wondering right now is : what are segments exactly? How are they calculated, and, above all, does a different segmenting change the training a lot?
Thank you for any light you could shed on this!