Feature Request: Use WhisperX instead of Whisper for preparing dataset #45
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#45
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I tried using the Prepare Dataset option, and it does a somewhat poor job with timestamps on the generated dataset; with the text outputs not quite matching the audio when it's split up. I tried modifying the code to use WhisperX instead, and it seemed to do a much better job, although I still had to add a window of about 0.1s on either side of the split audio for more accuracy. It still misses the audio a little bit but the majority of the audio/text is much more accurate time-wise.
The WhisperX repo is here: https://github.com/m-bain/whisperX
Very impressed with whisperx. Added a pr #67
It's implemented, but with headaches.