Out of memory errors and using whisperX #249

Open
opened 2023-05-23 06:03:49 +07:00 by Fresh12 · 12 comments

Hello, I hope its okay if I continue to post questions here about my efforts to get this working since there does not seem to be any more appropriate forum for this software.

I am having two major issues so far.

The first issue I am facing is when I attempt to use whisperx to create the dataset. When I select the option from the dropdown menu it gives me the error 'no module named 'whisperx'. I've installed whisperx. I've tried starting the software in different conda environments with whisperx installed. I've tried activating the venv in the aicloning directory and installing whisperx to no avail. It still gives me the same error. I'm not even using it through Colab which is where others seem to have this problem. Below is the error in the console.

  File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/src/utils.py", line 3664, in load_whisper_model
    import whisper, whisperx

ModuleNotFoundError: No module named 'whisperx'

I can import whisperx in a regular python session just fine but it just won't work here for some reason.

The second issue is that I am getting out of memory errors. This happens even if I shrink the dataset size down to just 5mb, the batch size to 2 and the gradient accumulation size to 2. I have a 3070 mobile with 8gb vram which is similar to other people who are successfully running training here so I don't know whats up. Below is a reproduction of the traceback.


[Training] [2023-05-22T22:37:08.078146] Traceback (most recent call last):
[Training] [2023-05-22T22:37:08.078190]   File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/./src/train.py", line 64, in <module>
[Training] [2023-05-22T22:37:08.078218]     train(config_path, args.launcher)
[Training] [2023-05-22T22:37:08.078233]   File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/./src/train.py", line 31, in train
[Training] [2023-05-22T22:37:08.078247]     trainer.do_training()
[Training] [2023-05-22T22:37:08.078264]   File "/home/mainusera/Documents/ai-voice-cloning/modules/dlas/dlas/train.py", line 408, in do_training
[Training] [2023-05-22T22:37:08.078281]     metric = self.do_step(train_data)
[Training] [2023-05-22T22:37:08.078304]   File "/home/mainusera/Documents/ai-voice-cloning/modules/dlas/dlas/train.py", line 271, in do_step
[Training] [2023-05-22T22:37:08.078321]     gradient_norms_dict = self.model.optimize_parameters(
[Training] [2023-05-22T22:37:08.078339]   File "/home/mainusera/Documents/ai-voice-cloning/modules/dlas/dlas/trainer/ExtensibleTrainer.py", line 321, in optimize_parameters
[Training] [2023-05-22T22:37:08.078356]     ns = step.do_forward_backward(
[Training] [2023-05-22T22:37:08.078373]   File "/home/mainusera/Documents/ai-voice-cloning/modules/dlas/dlas/trainer/steps.py", line 322, in do_forward_backward
[Training] [2023-05-22T22:37:08.078396]     self.scaler.scale(total_loss).backward()
[Training] [2023-05-22T22:37:08.078414]   File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
[Training] [2023-05-22T22:37:08.078431]     torch.autograd.backward(
[Training] [2023-05-22T22:37:08.078448]   File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
[Training] [2023-05-22T22:37:08.078465]     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[Training] [2023-05-22T22:37:08.078481]   File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 274, in apply
[Training] [2023-05-22T22:37:08.078504]     return user_fn(self, *args)
[Training] [2023-05-22T22:37:08.078520]   File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 141, in backward
[Training] [2023-05-22T22:37:08.078537]     outputs = ctx.run_function(*detached_inputs)
[Training] [2023-05-22T22:37:08.078554]   File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 875, in custom_forward
[Training] [2023-05-22T22:37:08.078584]     return module(*inputs, use_cache, output_attentions)
[Training] [2023-05-22T22:37:08.078602]   File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
[Training] [2023-05-22T22:37:08.078634]     return forward_call(*args, **kwargs)
[Training] [2023-05-22T22:37:08.078648]   File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 427, in forward
[Training] [2023-05-22T22:37:08.078677]     feed_forward_hidden_states = self.mlp(hidden_states)
[Training] [2023-05-22T22:37:08.078697]   File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
[Training] [2023-05-22T22:37:08.078850]     return forward_call(*args, **kwargs)
[Training] [2023-05-22T22:37:08.078883]   File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 355, in forward
[Training] [2023-05-22T22:37:08.078902]     hidden_states = self.act(hidden_states)
[Training] [2023-05-22T22:37:08.078918]   File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
[Training] [2023-05-22T22:37:08.078944]     return forward_call(*args, **kwargs)
[Training] [2023-05-22T22:37:08.078962]   File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/transformers/activations.py", line 34, in forward
[Training] [2023-05-22T22:37:08.078979]     return 0.5 * input * (1.0 + torch.tanh(math.sqrt(2.0 / math.pi) * (input + 0.044715 * torch.pow(input, 3.0))))
[Training] [2023-05-22T22:37:08.078997] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.79 GiB total capacity; 2.67 GiB already allocated; 2.50 MiB free; 2.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Hello, I hope its okay if I continue to post questions here about my efforts to get this working since there does not seem to be any more appropriate forum for this software. I am having two major issues so far. The first issue I am facing is when I attempt to use whisperx to create the dataset. When I select the option from the dropdown menu it gives me the error 'no module named 'whisperx'. I've installed whisperx. I've tried starting the software in different conda environments with whisperx installed. I've tried activating the venv in the aicloning directory and installing whisperx to no avail. It still gives me the same error. I'm not even using it through Colab which is where others seem to have this problem. Below is the error in the console. ``` File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/src/utils.py", line 3664, in load_whisper_model import whisper, whisperx ModuleNotFoundError: No module named 'whisperx' ``` I can import whisperx in a regular python session just fine but it just won't work here for some reason. The second issue is that I am getting out of memory errors. This happens even if I shrink the dataset size down to just 5mb, the batch size to 2 and the gradient accumulation size to 2. I have a 3070 mobile with 8gb vram which is similar to other people who are successfully running training here so I don't know whats up. Below is a reproduction of the traceback. ``` [Training] [2023-05-22T22:37:08.078146] Traceback (most recent call last): [Training] [2023-05-22T22:37:08.078190] File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/./src/train.py", line 64, in <module> [Training] [2023-05-22T22:37:08.078218] train(config_path, args.launcher) [Training] [2023-05-22T22:37:08.078233] File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/./src/train.py", line 31, in train [Training] [2023-05-22T22:37:08.078247] trainer.do_training() [Training] [2023-05-22T22:37:08.078264] File "/home/mainusera/Documents/ai-voice-cloning/modules/dlas/dlas/train.py", line 408, in do_training [Training] [2023-05-22T22:37:08.078281] metric = self.do_step(train_data) [Training] [2023-05-22T22:37:08.078304] File "/home/mainusera/Documents/ai-voice-cloning/modules/dlas/dlas/train.py", line 271, in do_step [Training] [2023-05-22T22:37:08.078321] gradient_norms_dict = self.model.optimize_parameters( [Training] [2023-05-22T22:37:08.078339] File "/home/mainusera/Documents/ai-voice-cloning/modules/dlas/dlas/trainer/ExtensibleTrainer.py", line 321, in optimize_parameters [Training] [2023-05-22T22:37:08.078356] ns = step.do_forward_backward( [Training] [2023-05-22T22:37:08.078373] File "/home/mainusera/Documents/ai-voice-cloning/modules/dlas/dlas/trainer/steps.py", line 322, in do_forward_backward [Training] [2023-05-22T22:37:08.078396] self.scaler.scale(total_loss).backward() [Training] [2023-05-22T22:37:08.078414] File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward [Training] [2023-05-22T22:37:08.078431] torch.autograd.backward( [Training] [2023-05-22T22:37:08.078448] File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward [Training] [2023-05-22T22:37:08.078465] Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass [Training] [2023-05-22T22:37:08.078481] File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 274, in apply [Training] [2023-05-22T22:37:08.078504] return user_fn(self, *args) [Training] [2023-05-22T22:37:08.078520] File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 141, in backward [Training] [2023-05-22T22:37:08.078537] outputs = ctx.run_function(*detached_inputs) [Training] [2023-05-22T22:37:08.078554] File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 875, in custom_forward [Training] [2023-05-22T22:37:08.078584] return module(*inputs, use_cache, output_attentions) [Training] [2023-05-22T22:37:08.078602] File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl [Training] [2023-05-22T22:37:08.078634] return forward_call(*args, **kwargs) [Training] [2023-05-22T22:37:08.078648] File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 427, in forward [Training] [2023-05-22T22:37:08.078677] feed_forward_hidden_states = self.mlp(hidden_states) [Training] [2023-05-22T22:37:08.078697] File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl [Training] [2023-05-22T22:37:08.078850] return forward_call(*args, **kwargs) [Training] [2023-05-22T22:37:08.078883] File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 355, in forward [Training] [2023-05-22T22:37:08.078902] hidden_states = self.act(hidden_states) [Training] [2023-05-22T22:37:08.078918] File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl [Training] [2023-05-22T22:37:08.078944] return forward_call(*args, **kwargs) [Training] [2023-05-22T22:37:08.078962] File "/home/mainusera/Documents/ProjectFolder/ai-voice-cloning/venv/lib/python3.10/site-packages/transformers/activations.py", line 34, in forward [Training] [2023-05-22T22:37:08.078979] return 0.5 * input * (1.0 + torch.tanh(math.sqrt(2.0 / math.pi) * (input + 0.044715 * torch.pow(input, 3.0)))) [Training] [2023-05-22T22:37:08.078997] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.79 GiB total capacity; 2.67 GiB already allocated; 2.50 MiB free; 2.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ```

Can you run whisperx from the command line? (Not though importing it in a python session, just from the prompt.)

Can you run `whisperx` from the command line? (Not though importing it in a python session, just from the prompt.)

Can you run whisperx from the command line? (Not though importing it in a python session, just from the prompt.)

Just tried and while I can run whisperx directly it still doesn't work through the ai cloning ui.

> Can you run `whisperx` from the command line? (Not though importing it in a python session, just from the prompt.) Just tried and while I can run whisperx directly it still doesn't work through the ai cloning ui.

Try activating the venv in the directory you cloned the repo into and then git submodule update –remote

Try activating the venv in the directory you cloned the repo into and then `git submodule update –remote`

Try activating the venv in the directory you cloned the repo into and then git submodule update –remote

(base) mainusera@mainusera-COMP:~/Documents/ProjectFolder/ai-voice-cloning$ source venv/bin/activate
(venv) (base) mainusera@mainusera-COMP:~/Documents/ProjectFolder/ai-voice-cloning$ git submodule update --remote
remote: Enumerating objects: 9, done.
remote: Counting objects: 100% (9/9), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 5 (delta 4), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (5/5), 931 bytes | 232.00 KiB/s, done.
From https://git.ecker.tech/mrq/tortoise-tts
   c90ee7c..5ff00bf  main       -> origin/main
   c90ee7c..5ff00bf  master     -> origin/master
Submodule path 'modules/tortoise-tts': checked out '5ff00bf3bfa97e2c8e9f166b920273f83ac9d8f0'

Still gives me


 import whisper, whisperx
ModuleNotFoundError: No module named 'whisperx'

whether I run aivoicecloning ui in (base) or (venv)(base) via source venv/bin/activate

> Try activating the venv in the directory you cloned the repo into and then `git submodule update –remote` ``` (base) mainusera@mainusera-COMP:~/Documents/ProjectFolder/ai-voice-cloning$ source venv/bin/activate (venv) (base) mainusera@mainusera-COMP:~/Documents/ProjectFolder/ai-voice-cloning$ git submodule update --remote remote: Enumerating objects: 9, done. remote: Counting objects: 100% (9/9), done. remote: Compressing objects: 100% (5/5), done. remote: Total 5 (delta 4), reused 0 (delta 0), pack-reused 0 Unpacking objects: 100% (5/5), 931 bytes | 232.00 KiB/s, done. From https://git.ecker.tech/mrq/tortoise-tts c90ee7c..5ff00bf main -> origin/main c90ee7c..5ff00bf master -> origin/master Submodule path 'modules/tortoise-tts': checked out '5ff00bf3bfa97e2c8e9f166b920273f83ac9d8f0' ``` Still gives me ``` import whisper, whisperx ModuleNotFoundError: No module named 'whisperx' ``` whether I run aivoicecloning ui in (base) or (venv)(base) via source venv/bin/activate

After activating the venv does pip list installed show whisperx?

After activating the venv does `pip list installed` show whisperx?

After activating the venv does pip list installed show whisperx?

Okay we're getting somewhere. Whisperx wasn't on the list despite being able to be called under the environment. I reinstalled all the software but while it now detects whisperx from the ui it still does not work properly.

Loading specialized model for language: en
Loading Whisper model: base.en
Loading Whisper model: base.en
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.0.2. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file models/torch/whisperx-vad-segmentation.bin`
Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu117. Bad things might happen unless you revert torch to 1.x.
No huggingface token used, needs to be saved in environment variable, otherwise will throw error loading VAD model.

Could not download 'pyannote/segmentation' model.
It might be because the model is private or gated so make
sure to authenticate. Visit https://hf.co/settings/tokens to
create your access token and retry with:

   >>> Model.from_pretrained('pyannote/segmentation',
   ...                       use_auth_token=YOUR_AUTH_TOKEN)

If this still does not work, it might be because the model is gated:
visit https://hf.co/pyannote/segmentation to accept the user conditions.
Loaded Whisper model
Failed to transcribe: ./voices/SoundAbridged/third_00000.wav 'word-segments'
Failed to transcribe: ./voices/SoundAbridged/third_00001.wav 'word-segments'
Failed to transcribe: ./voices/SoundAbridged/third_00002.wav 'word-segments'
Missing dataset: ./training/SoundAbridged//whisper.json

Im still searching for how to include this authorization token the error asks for assuming it actually is the problem. There doesn't seem to be an obvious way.

> After activating the venv does `pip list installed` show whisperx? Okay we're getting somewhere. Whisperx wasn't on the list despite being able to be called under the environment. I reinstalled all the software but while it now detects whisperx from the ui it still does not work properly. ``` Loading specialized model for language: en Loading Whisper model: base.en Loading Whisper model: base.en Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.0.2. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file models/torch/whisperx-vad-segmentation.bin` Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu117. Bad things might happen unless you revert torch to 1.x. No huggingface token used, needs to be saved in environment variable, otherwise will throw error loading VAD model. Could not download 'pyannote/segmentation' model. It might be because the model is private or gated so make sure to authenticate. Visit https://hf.co/settings/tokens to create your access token and retry with: >>> Model.from_pretrained('pyannote/segmentation', ... use_auth_token=YOUR_AUTH_TOKEN) If this still does not work, it might be because the model is gated: visit https://hf.co/pyannote/segmentation to accept the user conditions. Loaded Whisper model Failed to transcribe: ./voices/SoundAbridged/third_00000.wav 'word-segments' Failed to transcribe: ./voices/SoundAbridged/third_00001.wav 'word-segments' Failed to transcribe: ./voices/SoundAbridged/third_00002.wav 'word-segments' Missing dataset: ./training/SoundAbridged//whisper.json ``` Im still searching for how to include this authorization token the error asks for assuming it actually is the problem. There doesn't seem to be an obvious way.

Im still searching for how to include this authorization token the error asks for assuming it actually is the problem. There doesn't seem to be an obvious way.

It's in the Wiki:

!NOTE!: better transcription requires an HF-token. If you do not provide one within ./config/exec.json, you're better off just using another backend.

> Im still searching for how to include this authorization token the error asks for assuming it actually is the problem. There doesn't seem to be an obvious way. It's [in the Wiki](https://git.ecker.tech/mrq/ai-voice-cloning/wiki/Training): >!NOTE!: better transcription requires an HF-token. If you do not provide one within ./config/exec.json, you're better off just using another backend.

Im still searching for how to include this authorization token the error asks for assuming it actually is the problem. There doesn't seem to be an obvious way.

It's in the Wiki:

!NOTE!: better transcription requires an HF-token. If you do not provide one within ./config/exec.json, you're better off just using another backend.

Thanks for your help so far. I still haven't completely gotten it to work. If you want to continue picking away at it with me I'll try a few more things. Otherwise I'll look into an alternative way to get better timestamping than base whisper.

Anyway. I got further and it appears that the current Whisperx is not completely compatible with Voiceclonings implementation of it.

Unloaded TTS
Loading specialized model for language: en
Loading Whisper model: base.en
Loading Whisper model: base.en
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.0.2. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file models/torch/whisperx-vad-segmentation.bin`
Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu117. Bad things might happen unless you revert torch to 1.x.
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.0.2. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file ../../../.cache/torch/pyannote/models--pyannote--segmentation/snapshots/7d5cf7bca4dcac7f943eb834bec0906a90da8c97/pytorch_model.bin`
Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu117. Bad things might happen unless you revert torch to 1.x.
Loaded Whisper model
Failed to transcribe: ./voices/SoundsAbridged/third_00000.wav module 'whisperx' has no attribute 'transcribe_with_vad'
Failed to transcribe: ./voices/SoundsAbridged/third_00001.wav module 'whisperx' has no attribute 'transcribe_with_vad'
Failed to transcribe: ./voices/SoundsAbridged/third_00002.wav module 'whisperx' has no attribute 'transcribe_with_vad'
Failed to transcribe: ./voices/SoundsAbridged/third_00003.wav module 'whisperx' has no attribute 'transcribe_with_vad'
Failed to transcribe: ./voices/SoundsAbridged/third_00004.wav module 'whisperx' has no attribute 'transcribe_with_vad'
Failed to transcribe: ./voices/SoundsAbridged/third_00005.wav module 'whisperx' has no attribute 'transcribe_with_vad'
Failed to transcribe: ./voices/SoundsAbridged/third_00006.wav module 'whisperx' has no attribute 'transcribe_with_vad'
Failed to transcribe: ./voices/SoundsAbridged/third_00007.wav module 'whisperx' has no attribute 'transcribe_with_vad'
Missing dataset: ./training/SoundsAbridged//whisper.json

I inserted a transcribe_with_vad function found here

https://github.com/m-bain/whisperX/issues/68

into transcribe.py but that doesn't seem to solve it.

> > Im still searching for how to include this authorization token the error asks for assuming it actually is the problem. There doesn't seem to be an obvious way. > > It's [in the Wiki](https://git.ecker.tech/mrq/ai-voice-cloning/wiki/Training): > >!NOTE!: better transcription requires an HF-token. If you do not provide one within ./config/exec.json, you're better off just using another backend. > > Thanks for your help so far. I still haven't completely gotten it to work. If you want to continue picking away at it with me I'll try a few more things. Otherwise I'll look into an alternative way to get better timestamping than base whisper. Anyway. I got further and it appears that the current Whisperx is not completely compatible with Voiceclonings implementation of it. ``` Unloaded TTS Loading specialized model for language: en Loading Whisper model: base.en Loading Whisper model: base.en Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.0.2. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file models/torch/whisperx-vad-segmentation.bin` Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu117. Bad things might happen unless you revert torch to 1.x. Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.0.2. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file ../../../.cache/torch/pyannote/models--pyannote--segmentation/snapshots/7d5cf7bca4dcac7f943eb834bec0906a90da8c97/pytorch_model.bin` Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu117. Bad things might happen unless you revert torch to 1.x. Loaded Whisper model Failed to transcribe: ./voices/SoundsAbridged/third_00000.wav module 'whisperx' has no attribute 'transcribe_with_vad' Failed to transcribe: ./voices/SoundsAbridged/third_00001.wav module 'whisperx' has no attribute 'transcribe_with_vad' Failed to transcribe: ./voices/SoundsAbridged/third_00002.wav module 'whisperx' has no attribute 'transcribe_with_vad' Failed to transcribe: ./voices/SoundsAbridged/third_00003.wav module 'whisperx' has no attribute 'transcribe_with_vad' Failed to transcribe: ./voices/SoundsAbridged/third_00004.wav module 'whisperx' has no attribute 'transcribe_with_vad' Failed to transcribe: ./voices/SoundsAbridged/third_00005.wav module 'whisperx' has no attribute 'transcribe_with_vad' Failed to transcribe: ./voices/SoundsAbridged/third_00006.wav module 'whisperx' has no attribute 'transcribe_with_vad' Failed to transcribe: ./voices/SoundsAbridged/third_00007.wav module 'whisperx' has no attribute 'transcribe_with_vad' Missing dataset: ./training/SoundsAbridged//whisper.json ``` I inserted a transcribe_with_vad function found here https://github.com/m-bain/whisperX/issues/68 into transcribe.py but that doesn't seem to solve it.

There's probably some way to tinker with your current install to get it working but I think the most efficient thing to do is wipe it (save your datasets, of course), reclone the repo, and run the install scripts over again.

There's probably some way to tinker with your current install to get it working but I think the most efficient thing to do is wipe it (save your datasets, of course), reclone the repo, and run the install scripts over again.

There's probably some way to tinker with your current install to get it working but I think the most efficient thing to do is wipe it (save your datasets, of course), reclone the repo, and run the install scripts over again.

I did. aivoicecloning calls for 'transcribe_with_vad' from whisperx through utils.py

But I downloaded the current whisperx repo and there is no 'transcribe_with_vad' phrase anywhere in it. According to grep

> There's probably some way to tinker with your current install to get it working but I think the most efficient thing to do is wipe it (save your datasets, of course), reclone the repo, and run the install scripts over again. I did. aivoicecloning calls for 'transcribe_with_vad' from whisperx through utils.py But I downloaded the current whisperx repo and there is no 'transcribe_with_vad' phrase anywhere in it. According to grep

As noted in issue #68 from the whisperx repo "with VAD" is now the default, so you could try changing utils.py to just call transcribe() and see if that works. I always prepare my datasets externally so I haven't tried it.

As noted in issue #68 from the whisperx repo "with VAD" is now the default, so you could try changing utils.py to just call transcribe() and see if that works. I always prepare my datasets externally so I haven't tried it.

But I downloaded the current whisperx repo and there is no 'transcribe_with_vad' phrase anywhere in it. According to grep

Oh right, I forgot. I think it was v3 is when he broke everything with faster-whisper, which I tried for a day and found it to be un-good to warrant working around.

I suppose the issue is that I'm not actually printing the exception when it's trying to load WhisperX, and it'll remain unloaded.

mmmm freeze it with

pip3 install -U git+https://github.com/m-bain/whisperX@stable

I believe. I can't quite remember how to specifically have Pip install from Git from a branch.

> But I downloaded the current whisperx repo and there is no 'transcribe_with_vad' phrase anywhere in it. According to grep Oh right, I forgot. I think it was v3 is when he broke everything with faster-whisper, which I tried for a day and found it to be un-good to warrant working around. I suppose the issue is that I'm not actually printing the exception when it's trying to load WhisperX, and it'll remain unloaded. mmmm freeze it with ``` pip3 install -U git+https://github.com/m-bain/whisperX@stable ``` I believe. I can't quite remember how to specifically have Pip install from Git from a branch.
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#249
There is no content yet.