Getting error in start.sh script running after whisper detection #483

New Issue

dhamaraiselvi · 2024-03-26T12:10:11Z

dhamaraiselvi commented

2024-03-26 12:10:11 +00:00

python3 ./src/main.py
Whisper detected
Traceback (most recent call last):
File "/home/dhamaraiselvi/ai-voice-cloning/src/utils.py", line 97, in
from vall_e.emb.qnt import encode as valle_quantize
ModuleNotFoundError: No module named 'vall_e'

Bark detected
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Loading TorToiSe... (AR: None, diffusion: None, vocoder: bigvgan_24khz_100band)
Hardware acceleration found: cuda
use_deepspeed api_debug False
Some weights of the model checkpoint at jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v']

This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Loading tokenizer JSON: /home/dhamaraiselvi/ai-voice-cloning/modules/tortoise-tts/tortoise/../tortoise/data/tokenizer.json
Loaded tokenizer
Loading autoregressive model: /home/dhamaraiselvi/ai-voice-cloning/models/tortoise/autoregressive.pth
Traceback (most recent call last):
File "/home/dhamaraiselvi/ai-voice-cloning/./src/main.py", line 27, in
tts = load_tts()
^^^^^^^^^^
File "/home/dhamaraiselvi/ai-voice-cloning/src/utils.py", line 3666, in load_tts
tts = TorToise_TTS(minor_optimizations=not args.low_vram, autoregressive_model_path=autoregressive_model, diffusion_model_path=diffusion_model, vocoder_model=vocoder_model, tokenizer_json=tokenizer_json, unsqueeze_sample_batches=args.unsqueeze_sample_batches, use_deepspeed=args.use_deepspeed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dhamaraiselvi/ai-voice-cloning/modules/tortoise-tts/tortoise/api.py", line 308, in init
self.load_autoregressive_model(autoregressive_model_path)
File "/home/dhamaraiselvi/ai-voice-cloning/modules/tortoise-tts/tortoise/api.py", line 391, in load_autoregressive_model
self.autoregressive.load_state_dict(torch.load(self.autoregressive_model_path))
File "/home/dhamaraiselvi/ai-voice-cloning/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UnifiedVoice:
Unexpected key(s) in state_dict: "gpt.h.0.attn.bias", "gpt.h.0.attn.masked_bias", "gpt.h.1.attn.bias", "gpt.h.1.attn.masked_bias", "gpt.h.2.attn.bias", "gpt.h.2.attn.masked_bias", "gpt.h.3.attn.bias", "gpt.h.3.attn.masked_bias", "gpt.h.4.attn.bias", "gpt.h.4.attn.masked_bias", "gpt.h.5.attn.bias", "gpt.h.5.attn.masked_bias", "gpt.h.6.attn.bias", "gpt.h.6.attn.masked_bias", "gpt.h.7.attn.bias", "gpt.h.7.attn.masked_bias", "gpt.h.8.attn.bias", "gpt.h.8.attn.masked_bias", "gpt.h.9.attn.bias", "gpt.h.9.attn.masked_bias", "gpt.h.10.attn.bias", "gpt.h.10.attn.masked_bias", "gpt.h.11.attn.bias", "gpt.h.11.attn.masked_bias", "gpt.h.12.attn.bias", "gpt.h.12.attn.masked_bias", "gpt.h.13.attn.bias", "gpt.h.13.attn.masked_bias", "gpt.h.14.attn.bias", "gpt.h.14.attn.masked_bias", "gpt.h.15.attn.bias", "gpt.h.15.attn.masked_bias", "gpt.h.16.attn.bias", "gpt.h.16.attn.masked_bias", "gpt.h.17.attn.bias", "gpt.h.17.attn.masked_bias", "gpt.h.18.attn.bias", "gpt.h.18.attn.masked_bias", "gpt.h.19.attn.bias", "gpt.h.19.attn.masked_bias", "gpt.h.20.attn.bias", "gpt.h.20.attn.masked_bias", "gpt.h.21.attn.bias", "gpt.h.21.attn.masked_bias", "gpt.h.22.attn.bias", "gpt.h.22.attn.masked_bias", "gpt.h.23.attn.bias", "gpt.h.23.attn.masked_bias", "gpt.h.24.attn.bias", "gpt.h.24.attn.masked_bias", "gpt.h.25.attn.bias", "gpt.h.25.attn.masked_bias", "gpt.h.26.attn.bias", "gpt.h.26.attn.masked_bias", "gpt.h.27.attn.bias", "gpt.h.27.attn.masked_bias", "gpt.h.28.attn.bias", "gpt.h.28.attn.masked_bias", "gpt.h.29.attn.bias", "gpt.h.29.attn.masked_bias".

python3 ./src/main.py Whisper detected Traceback (most recent call last): File "/home/dhamaraiselvi/ai-voice-cloning/src/utils.py", line 97, in <module> from vall_e.emb.qnt import encode as valle_quantize ModuleNotFoundError: No module named 'vall_e' Bark detected Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Loading TorToiSe... (AR: None, diffusion: None, vocoder: bigvgan_24khz_100band) Hardware acceleration found: cuda use_deepspeed api_debug False Some weights of the model checkpoint at jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v'] - This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Loading tokenizer JSON: /home/dhamaraiselvi/ai-voice-cloning/modules/tortoise-tts/tortoise/../tortoise/data/tokenizer.json Loaded tokenizer Loading autoregressive model: /home/dhamaraiselvi/ai-voice-cloning/models/tortoise/autoregressive.pth Traceback (most recent call last): File "/home/dhamaraiselvi/ai-voice-cloning/./src/main.py", line 27, in <module> tts = load_tts() ^^^^^^^^^^ File "/home/dhamaraiselvi/ai-voice-cloning/src/utils.py", line 3666, in load_tts tts = TorToise_TTS(minor_optimizations=not args.low_vram, autoregressive_model_path=autoregressive_model, diffusion_model_path=diffusion_model, vocoder_model=vocoder_model, tokenizer_json=tokenizer_json, unsqueeze_sample_batches=args.unsqueeze_sample_batches, use_deepspeed=args.use_deepspeed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/dhamaraiselvi/ai-voice-cloning/modules/tortoise-tts/tortoise/api.py", line 308, in __init__ self.load_autoregressive_model(autoregressive_model_path) File "/home/dhamaraiselvi/ai-voice-cloning/modules/tortoise-tts/tortoise/api.py", line 391, in load_autoregressive_model self.autoregressive.load_state_dict(torch.load(self.autoregressive_model_path)) File "/home/dhamaraiselvi/ai-voice-cloning/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for UnifiedVoice: Unexpected key(s) in state_dict: "gpt.h.0.attn.bias", "gpt.h.0.attn.masked_bias", "gpt.h.1.attn.bias", "gpt.h.1.attn.masked_bias", "gpt.h.2.attn.bias", "gpt.h.2.attn.masked_bias", "gpt.h.3.attn.bias", "gpt.h.3.attn.masked_bias", "gpt.h.4.attn.bias", "gpt.h.4.attn.masked_bias", "gpt.h.5.attn.bias", "gpt.h.5.attn.masked_bias", "gpt.h.6.attn.bias", "gpt.h.6.attn.masked_bias", "gpt.h.7.attn.bias", "gpt.h.7.attn.masked_bias", "gpt.h.8.attn.bias", "gpt.h.8.attn.masked_bias", "gpt.h.9.attn.bias", "gpt.h.9.attn.masked_bias", "gpt.h.10.attn.bias", "gpt.h.10.attn.masked_bias", "gpt.h.11.attn.bias", "gpt.h.11.attn.masked_bias", "gpt.h.12.attn.bias", "gpt.h.12.attn.masked_bias", "gpt.h.13.attn.bias", "gpt.h.13.attn.masked_bias", "gpt.h.14.attn.bias", "gpt.h.14.attn.masked_bias", "gpt.h.15.attn.bias", "gpt.h.15.attn.masked_bias", "gpt.h.16.attn.bias", "gpt.h.16.attn.masked_bias", "gpt.h.17.attn.bias", "gpt.h.17.attn.masked_bias", "gpt.h.18.attn.bias", "gpt.h.18.attn.masked_bias", "gpt.h.19.attn.bias", "gpt.h.19.attn.masked_bias", "gpt.h.20.attn.bias", "gpt.h.20.attn.masked_bias", "gpt.h.21.attn.bias", "gpt.h.21.attn.masked_bias", "gpt.h.22.attn.bias", "gpt.h.22.attn.masked_bias", "gpt.h.23.attn.bias", "gpt.h.23.attn.masked_bias", "gpt.h.24.attn.bias", "gpt.h.24.attn.masked_bias", "gpt.h.25.attn.bias", "gpt.h.25.attn.masked_bias", "gpt.h.26.attn.bias", "gpt.h.26.attn.masked_bias", "gpt.h.27.attn.bias", "gpt.h.27.attn.masked_bias", "gpt.h.28.attn.bias", "gpt.h.28.attn.masked_bias", "gpt.h.29.attn.bias", "gpt.h.29.attn.masked_bias".

Screenshot from 2024-03-26 17-39-51.png

370 KiB