[Question] Segments : what are their role in the training? #468

Open
opened 2024-01-25 20:00:42 +07:00 by DoctorPopi · 0 comments

Hey there,

I'm building a tool to help me process and prepare audiobooks to feed the ai-voice-cloning tool. I've come to the point where I want to transcribe my audiofiles. After diving into the ai-voice-cloning code (utils.py in particular), I found out that I had to use "medium.en" as a model to get the same segments as you (like 2 instead of 4 for instance).

But what I'm wondering right now is : what are segments exactly? How are they calculated, and, above all, does a different segmenting change the training a lot?

Thank you for any light you could shed on this!

Hey there, I'm building a tool to help me process and prepare audiobooks to feed the ai-voice-cloning tool. I've come to the point where I want to transcribe my audiofiles. After diving into the ai-voice-cloning code (utils.py in particular), I found out that I had to use "medium.en" as a model to get the same segments as you (like 2 instead of 4 for instance). But what I'm wondering right now is : what are segments exactly? How are they calculated, and, above all, does a different segmenting change the training a lot? Thank you for any light you could shed on this!
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#468
There is no content yet.