[Feature request] Add support for whisper.cpp #37
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#37
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Preface:
This should probably be considered low priority, a nice-to-have.
whisper.cpp is an alternative C++ implementation of Whisper.
I have prepared a fork of python bindings for the project.
The current interface works slightly different to openai's whisper:
Here are some evaluations off the top of my head:
Pros:
pip install git+https://git.ecker.tech/lightmare/whispercpp.py
download_model
, no network calls are made (if the file exists, nothing is called)Caveats:
ffmpeg-python
for automatic conversion. This could be changed, in case this dependency should be dropped in the futurewhisper_model_load
is hardcoded by whisper.cppFurther notes:
I'll take a gander and see about adding it in whenever I get a chance (maybe soon). For sure, it'll exist as a toggle between the two. I definitely see it as promising despite it being CPU-only, as I would be able to leverage using the large models on my local machine.
Ironically, this morning I sat through maybe four hours of transcribing a dataset that I might document later, but the takeaway was how long it took for me to sit through with a result of 19420 lines, although I'm not sure if using whispercpp would have leveraged any speedups, as I was using the large model on an A4000.
My other initial concern would have been a lack of getting the start and end times for fetaure parity, but it seems
extract_text_and_timestamps
takes care of that too.Just my luck with anything MSVC related:
I'll see if it plays nice under MSYS2, if I can get it to leverage GCC.Nevermind, I forgot how doing literally anything with PIP under MSYS2 is actual cock and ball torture. I'll see what I can do to get it to work on my machine™.I suppose I might have to reinstall MSVC build tools, although I'd imagine for the normal end user they won't want to go through this unless a supplied .whl or however precompiled python binaries are distributed.
Forgot to mention I think yesterday when I remembered to commit it: added support for it in commit
6925ec731b
.Although, I couldn't get it to compile for the life of me on Windows. I'll have to test it on a paperspace instance, but it should work.
Very nice, thank you.
I must admit, I haven't tested it on a Windows machine, but the setup shouldn't really use any esoteric wizardry... The failing stddef.h include is also very odd.
The whisper.cpp github workflow has a recipe for windows. Maybe it had something to do with the SDL2 version?
I tried the other day on Windows but absolutely couldn't get it to be happy. I imagine if it's giving me hell, it'd be hell for any other Windows user to try and get it to compile anyhow. Oh well.
It was easy peasy to get it on Linux after setting up my dedicated system for it. Just one small bug involving not converting a string to bytes, and it works fine. The
large
model takes forever (naturally), but I think it already took forever anyways on normal whisper. With thebase
model it's pretty quick despite being CPU only.I have updated whispercpp to a newer version. Haven't tested it on Windows, though.
The requirement of passing the language param as bytes is indeed a bit odd for python. I might release a new version in the future that supports both.