[Feature request] Add support for whisper.cpp #37

Open
opened 2023-02-25 21:40:19 +00:00 by lightmare · 6 comments
Contributor

Preface:
This should probably be considered low priority, a nice-to-have.

whisper.cpp is an alternative C++ implementation of Whisper.

I have prepared a fork of python bindings for the project.
The current interface works slightly different to openai's whisper:

>>> from whispercpp import Whisper

>>> w = Whisper('tiny', models_dir='./models/', language=b'en')
>>> result = w.transcribe("myfile.mp3")
>>> text = [l.strip() for l in w.extract_text(result)]
['This is a test.']

Here are some evaluations off the top of my head:

Pros:

  • Runs on CPU, so it does not interfere with work done on GPU
  • Code of bindings is very simple and easy to change
  • Easy installation via pip: pip install git+https://git.ecker.tech/lightmare/whispercpp.py
  • Except for automatic checks for models by download_model, no network calls are made (if the file exists, nothing is called)

Caveats:

  • Runs on CPU, might be slower
  • Requires additional disk space for ggml models
  • Current bindings implementation uses ffmpeg-python for automatic conversion. This could be changed, in case this dependency should be dropped in the future
  • Not completely silent: the output of whisper_model_load is hardcoded by whisper.cpp
  • Official limitations

Further notes:

Preface: This should probably be considered low priority, a nice-to-have. [whisper.cpp](https://github.com/ggerganov/whisper.cpp) is an alternative C++ implementation of Whisper. I have prepared a [fork](https://git.ecker.tech/lightmare/whispercpp.py) of python bindings for the project. The current interface works slightly different to openai's whisper: ```python >>> from whispercpp import Whisper >>> w = Whisper('tiny', models_dir='./models/', language=b'en') >>> result = w.transcribe("myfile.mp3") >>> text = [l.strip() for l in w.extract_text(result)] ['This is a test.'] ``` Here are some evaluations off the top of my head: Pros: - Runs on CPU, so it does not interfere with work done on GPU - Code of bindings is very simple and easy to change - Easy installation via pip: `pip install git+https://git.ecker.tech/lightmare/whispercpp.py` - Except for automatic checks for models by `download_model`, no network calls are made (if the file exists, nothing is called) Caveats: - Runs on CPU, might be slower - Requires additional disk space for [ggml models](https://github.com/ggerganov/whisper.cpp/tree/master/models) - Current bindings implementation uses `ffmpeg-python` for automatic conversion. This could be changed, in case this dependency should be dropped in the future - Not completely silent: the output of `whisper_model_load` is hardcoded by whisper.cpp - [Official limitations](https://github.com/ggerganov/whisper.cpp/#limitations) Further notes: - Memory: - If memory usage is a concern, then it could be started in a child process and terminated when no longer required - [Some mem usage stats of the models](https://github.com/ggerganov/whisper.cpp/#memory-usage) - Language selection: - openai's whisper provides a [dict with short and long forms of language codes](https://github.com/openai/whisper/blob/v20230124/whisper/tokenizer.py#L10-L110) and whisper.cpp is [compatible with it](https://github.com/ggerganov/whisper.cpp/blob/v1.2.0/whisper.cpp#L119-L219)
Owner

I'll take a gander and see about adding it in whenever I get a chance (maybe soon). For sure, it'll exist as a toggle between the two. I definitely see it as promising despite it being CPU-only, as I would be able to leverage using the large models on my local machine.

Ironically, this morning I sat through maybe four hours of transcribing a dataset that I might document later, but the takeaway was how long it took for me to sit through with a result of 19420 lines, although I'm not sure if using whispercpp would have leveraged any speedups, as I was using the large model on an A4000.

My other initial concern would have been a lack of getting the start and end times for fetaure parity, but it seems extract_text_and_timestamps takes care of that too.

I'll take a gander and see about adding it in whenever I get a chance (maybe soon). For sure, it'll exist as a toggle between the two. I definitely see it as promising despite it being CPU-only, as I would be able to leverage using the large models on my local machine. Ironically, this morning I sat through maybe four hours of transcribing a dataset that I might document later, but the takeaway was how long it took for me to sit through with a result of 19420 lines, although I'm not sure if using whispercpp would have leveraged any speedups, as I was using the large model on an A4000. My other initial concern would have been a lack of getting the start and end times for fetaure parity, but it seems `extract_text_and_timestamps` takes care of that too.
Owner

Just my luck with anything MSVC related:

Building wheels for collected packages: whispercpp
  Building wheel for whispercpp (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for whispercpp (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [3 lines of output]
      ggml.c
      C:\Users\User\AppData\Local\Temp\pip-req-build-dnvdcz40\whisper.cpp\ggml.h(177): fatal error C1083: Cannot open include file: 'stddef.h': No such file or directory
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.34.31933\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for whispercpp
Failed to build whispercpp
ERROR: Could not build wheels for whispercpp, which is required to install pyproject.toml-based projects

I'll see if it plays nice under MSYS2, if I can get it to leverage GCC. Nevermind, I forgot how doing literally anything with PIP under MSYS2 is actual cock and ball torture. I'll see what I can do to get it to work on my machine™.

I suppose I might have to reinstall MSVC build tools, although I'd imagine for the normal end user they won't want to go through this unless a supplied .whl or however precompiled python binaries are distributed.

Just my luck with anything MSVC related: ``` Building wheels for collected packages: whispercpp Building wheel for whispercpp (pyproject.toml) ... error error: subprocess-exited-with-error × Building wheel for whispercpp (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [3 lines of output] ggml.c C:\Users\User\AppData\Local\Temp\pip-req-build-dnvdcz40\whisper.cpp\ggml.h(177): fatal error C1083: Cannot open include file: 'stddef.h': No such file or directory error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.34.31933\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for whispercpp Failed to build whispercpp ERROR: Could not build wheels for whispercpp, which is required to install pyproject.toml-based projects ``` ~~I'll see if it plays nice under MSYS2, if I can get it to leverage GCC.~~ Nevermind, I forgot how doing literally anything with PIP under MSYS2 is actual cock and ball torture. I'll see what I can do to get it to work on my machine™. I suppose I might have to reinstall MSVC build tools, although I'd imagine for the normal end user they won't want to go through this unless a supplied .whl or however precompiled python binaries are distributed.
Owner

Forgot to mention I think yesterday when I remembered to commit it: added support for it in commit 6925ec731b.

Although, I couldn't get it to compile for the life of me on Windows. I'll have to test it on a paperspace instance, but it should work.

Forgot to mention I think yesterday when I remembered to commit it: added support for it in commit 6925ec731baebf08de3fc81368559e08684c047f. Although, I couldn't get it to compile for the life of me on Windows. I'll have to test it on a paperspace instance, but it *should* work.
Author
Contributor

Forgot to mention I think yesterday when I remembered to commit it: added support for it in commit 6925ec731b.

Very nice, thank you.

Although, I couldn't get it to compile for the life of me on Windows. I'll have to test it on a paperspace instance, but it should work.

I must admit, I haven't tested it on a Windows machine, but the setup shouldn't really use any esoteric wizardry... The failing stddef.h include is also very odd.

The whisper.cpp github workflow has a recipe for windows. Maybe it had something to do with the SDL2 version?

> Forgot to mention I think yesterday when I remembered to commit it: added support for it in commit 6925ec731baebf08de3fc81368559e08684c047f. Very nice, thank you. > Although, I couldn't get it to compile for the life of me on Windows. I'll have to test it on a paperspace instance, but it *should* work. I must admit, I haven't tested it on a Windows machine, but the setup shouldn't really use any esoteric wizardry... The failing stddef.h include is also very odd. The [whisper.cpp github workflow has a recipe for windows](https://github.com/ggerganov/whisper.cpp/blob/master/.github/workflows/build.yml#L117-L144). ***Maybe*** it had something to do with the SDL2 version?
Owner

I tried the other day on Windows but absolutely couldn't get it to be happy. I imagine if it's giving me hell, it'd be hell for any other Windows user to try and get it to compile anyhow. Oh well.

It was easy peasy to get it on Linux after setting up my dedicated system for it. Just one small bug involving not converting a string to bytes, and it works fine. The large model takes forever (naturally), but I think it already took forever anyways on normal whisper. With the base model it's pretty quick despite being CPU only.

I tried the other day on Windows but absolutely couldn't get it to be happy. I imagine if it's giving me hell, it'd be hell for any other Windows user to try and get it to compile anyhow. Oh well. It was easy peasy to get it on Linux after setting up my dedicated system for it. Just one small bug involving not converting a string to bytes, and it works fine. The `large` model takes forever (naturally), but I think it already took forever anyways on normal whisper. With the `base` model it's pretty quick despite being CPU only.
Author
Contributor

I tried the other day on Windows but absolutely couldn't get it to be happy. I imagine if it's giving me hell, it'd be hell for any other Windows user to try and get it to compile anyhow. Oh well.

I have updated whispercpp to a newer version. Haven't tested it on Windows, though.

It was easy peasy to get it on Linux after setting up my dedicated system for it. Just one small bug involving not converting a string to bytes, and it works fine. The large model takes forever (naturally), but I think it already took forever anyways on normal whisper. With the base model it's pretty quick despite being CPU only.

The requirement of passing the language param as bytes is indeed a bit odd for python. I might release a new version in the future that supports both.

> I tried the other day on Windows but absolutely couldn't get it to be happy. I imagine if it's giving me hell, it'd be hell for any other Windows user to try and get it to compile anyhow. Oh well. I have updated whispercpp to a newer version. Haven't tested it on Windows, though. > It was easy peasy to get it on Linux after setting up my dedicated system for it. Just one small bug involving not converting a string to bytes, and it works fine. The `large` model takes forever (naturally), but I think it already took forever anyways on normal whisper. With the `base` model it's pretty quick despite being CPU only. The requirement of passing the language param as bytes is indeed a bit odd for python. I might release a new version in the future that supports both.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#37
No description provided.