WhisperX models (Large issue) #326

Closed
opened 2023-08-15 19:56:38 +00:00 by SyntheticVoices · 1 comment

So all the WhisperX models work apart from the large model and I get this error when I try transcribe and process a dataset :

Loading Whisper model: large
Loading Whisper model: large
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
Traceback (most recent call last):
  File "H:\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 394, in run_predict
    output = await app.get_blocks().process_api(
  File "H:\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1075, in process_api
    result = await self.call_function(
  File "H:\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 884, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "H:\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "H:\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "H:\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "H:\ai-voice-cloning\venv\lib\site-packages\gradio\helpers.py", line 587, in tracked_fn
    response = fn(*args)
  File "H:\ai-voice-cloning\src\webui.py", line 243, in prepare_dataset_proxy
    message = transcribe_dataset( voice=voice, language=language, skip_existings=skip_existings, progress=progress )
  File "H:\ai-voice-cloning\src\utils.py", line 2215, in transcribe_dataset
    load_whisper_model(language=language)
  File "H:\ai-voice-cloning\src\utils.py", line 3781, in load_whisper_model
    whisper_model = whisperx.load_model(model_name, device)
  File "H:\ai-voice-cloning\venv\lib\site-packages\whisperx\asr.py", line 50, in load_model
    model = WhisperModel(whisper_arch,
  File "H:\ai-voice-cloning\venv\lib\site-packages\faster_whisper\transcribe.py", line 117, in __init__
    model_path = download_model(
  File "H:\ai-voice-cloning\venv\lib\site-packages\faster_whisper\utils.py", line 61, in download_model
    raise ValueError(
ValueError: Invalid model size 'large', expected one of: tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2

Of course it's saying I should select large-v1 or large-v2. However I don't have this option nor do I know what I need to edit to make this option available to me : image

So all the WhisperX models work apart from the large model and I get this error when I try transcribe and process a dataset : ``` Loading Whisper model: large Loading Whisper model: large The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows. The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows. Traceback (most recent call last): File "H:\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 394, in run_predict output = await app.get_blocks().process_api( File "H:\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1075, in process_api result = await self.call_function( File "H:\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 884, in call_function prediction = await anyio.to_thread.run_sync( File "H:\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "H:\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "H:\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run result = context.run(func, *args) File "H:\ai-voice-cloning\venv\lib\site-packages\gradio\helpers.py", line 587, in tracked_fn response = fn(*args) File "H:\ai-voice-cloning\src\webui.py", line 243, in prepare_dataset_proxy message = transcribe_dataset( voice=voice, language=language, skip_existings=skip_existings, progress=progress ) File "H:\ai-voice-cloning\src\utils.py", line 2215, in transcribe_dataset load_whisper_model(language=language) File "H:\ai-voice-cloning\src\utils.py", line 3781, in load_whisper_model whisper_model = whisperx.load_model(model_name, device) File "H:\ai-voice-cloning\venv\lib\site-packages\whisperx\asr.py", line 50, in load_model model = WhisperModel(whisper_arch, File "H:\ai-voice-cloning\venv\lib\site-packages\faster_whisper\transcribe.py", line 117, in __init__ model_path = download_model( File "H:\ai-voice-cloning\venv\lib\site-packages\faster_whisper\utils.py", line 61, in download_model raise ValueError( ValueError: Invalid model size 'large', expected one of: tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2 ``` Of course it's saying I should select large-v1 or large-v2. However I don't have this option nor do I know what I need to edit to make this option available to me : ![image](/attachments/6661c2c0-75c7-4891-a96c-5d9e140ac91b)

\ai-voice-cloning\venv\lib\site-packages\faster_whisper\utils.py - I edited this file and just added a "large" on line 22 and that seems to have work. Just to check this was the correct thing to do?

image

\ai-voice-cloning\venv\lib\site-packages\faster_whisper\utils.py - I edited this file and just added a "large" on line 22 and that seems to have work. Just to check this was the correct thing to do? ![image](/attachments/6fdd1af6-91c6-4f83-bcbc-61106c1f44af)
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#326
No description provided.