Adding faster-whisper backend #222
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#222
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hi,
There's yet another implementation of Whisper out there called faster-whisper, and I've found it to be ~2-3x faster when I've been doing transcriptions for training VITS models.
I'm not a coder, but I tried implementing it in the AI Voice Cloning UI if anyone else wants to attempt to replicate it.
It'll probably whine about some CUDA library not found in the path, so find wherever that is supposed to be on your system,
source ./venv/bin/activate
and then
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:path_to_missing_libs_here
deactivate
Code changes:
Change compute_type to load the model in fp16i8, fp32, fp16.
53 in src/utils.py:
WHISPER_BACKENDS = ["openai/whisper", "lightmare/whispercpp", "m-bain/whisperx","guillaumekln/faster-whisper"]
~line 2020 in src/utils.py:
if args.whisper_backend == "guillaumekln/faster-whisper":
from faster_whisper import WhisperModel
~line 3696 in src/utils.py:
elif args.whisper_backend == "guillaumekln/faster-whisper":
# or run on GPU with INT8
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")
from faster_whisper import WhisperModel
try:
#is it possible for model to fit on vram but go oom later on while executing on data?
whisper_model = WhisperModel(model_name, device="cuda", compute_type="float32")
except:
print("Out of VRAM memory. falling back to loading Whisper on CPU.")
whisper_model = WhisperModel(model_name, device="cpu", compute_type="float32")
Reqs:
cd text-generation-webui
source ./venv/bin/activate
git clone https://github.com/guillaumekln/faster-whisper.git
pip install -e faster-whisper/
deactivate
./start.sh
Well that formatting looks like shit, and I pasted the wrong code. Let's try again
~Line 2020 src/utils.py
So:
For now, I'm going to see how whisperX's integration of faster-whisper fares, and depending on how it goes then I'll add in faster-whisper itself (which realistically can either wow my socks off or make me want to scream and shit myself in anger, so either way it'll get added as a backend, eventually).
I'm very mixed about it. Like usual, I had a decent writeup about my issues, but I figured to scrap it, since it's mostly based on whisperX's implementation (which has really soured me with getting it to work).
Ignoring the headaches of getting it to work after trashing several venvs, the main crux is that I do not trust its timestamps. The boon of whisperX to me before was its terrible timestamps, and what made me turn around and accept it again was the VAD filter it uses redeeming those terrible timestamps. Faster-whisper (and by extension whisperX-v3) don't have that cushion.
Maybe when I'm not so fried I'll give raw faster-whisper a shot rather than seeing the intermediate transcript in whisperX.