Starting the vall-e backend crashes #337

Open
opened 2023-08-23 01:09:31 +00:00 by Bluebomber182 · 7 comments

` ./start.sh --tts-backend="vall-e"
Whisper detected
Traceback (most recent call last):
File "/home/user/ai-voice-cloning/src/utils.py", line 88, in
from vall_e.inference import TTS as VALLE_TTS
File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/inference.py", line 15, in
from .train import load_engines
File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/train.py", line 4, in
from .data import create_train_val_dataloader
File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/data.py", line 597, in
@cfg.diskcache()
File "/usr/lib/python3.10/functools.py", line 981, in get
val = self.func(instance)
File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/config.py", line 460, in diskcache
return diskcache.Cache(self.cache_dir).memoize
File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/config.py", line 455, in cache_dir
return ".cache" / self.relpath
File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/config.py", line 28, in relpath
return Path(self.cfg_path)
File "/usr/lib/python3.10/pathlib.py", line 960, in new
self = cls._from_parts(args)
File "/usr/lib/python3.10/pathlib.py", line 594, in _from_parts
drv, root, parts = self._parse_args(args)
File "/usr/lib/python3.10/pathlib.py", line 578, in _parse_args
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Traceback (most recent call last):
File "/home/user/ai-voice-cloning/src/utils.py", line 105, in
import bark
ModuleNotFoundError: No module named 'bark'

Traceback (most recent call last):
File "/home/user/ai-voice-cloning/./src/main.py", line 24, in
webui = setup_gradio()
File "/home/user/ai-voice-cloning/src/webui.py", line 663, in setup_gradio
EXEC_SETTINGS['valle_model'] = gr.Dropdown(choices=valle_models, label="VALL-E Model Config", value=args.valle_model if args.valle_model else valle_models[0])
IndexError: list index out of range`

` ./start.sh --tts-backend="vall-e" Whisper detected Traceback (most recent call last): File "/home/user/ai-voice-cloning/src/utils.py", line 88, in <module> from vall_e.inference import TTS as VALLE_TTS File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/inference.py", line 15, in <module> from .train import load_engines File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/train.py", line 4, in <module> from .data import create_train_val_dataloader File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/data.py", line 597, in <module> @cfg.diskcache() File "/usr/lib/python3.10/functools.py", line 981, in __get__ val = self.func(instance) File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/config.py", line 460, in diskcache return diskcache.Cache(self.cache_dir).memoize File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/config.py", line 455, in cache_dir return ".cache" / self.relpath File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/config.py", line 28, in relpath return Path(self.cfg_path) File "/usr/lib/python3.10/pathlib.py", line 960, in __new__ self = cls._from_parts(args) File "/usr/lib/python3.10/pathlib.py", line 594, in _from_parts drv, root, parts = self._parse_args(args) File "/usr/lib/python3.10/pathlib.py", line 578, in _parse_args a = os.fspath(a) TypeError: expected str, bytes or os.PathLike object, not NoneType Traceback (most recent call last): File "/home/user/ai-voice-cloning/src/utils.py", line 105, in <module> import bark ModuleNotFoundError: No module named 'bark' Traceback (most recent call last): File "/home/user/ai-voice-cloning/./src/main.py", line 24, in <module> webui = setup_gradio() File "/home/user/ai-voice-cloning/src/webui.py", line 663, in setup_gradio EXEC_SETTINGS['valle_model'] = gr.Dropdown(choices=valle_models, label="VALL-E Model Config", value=args.valle_model if args.valle_model else valle_models[0]) IndexError: list index out of range`
Owner

Since I still have yet to get around to working on the web UI to update the integration for normal people, the web UI expects the a model to be present under ./training/. For example:

  • ./training/valle/ which contains config.yaml and ckpt.
Since I still have yet to get around to working on the web UI to update the integration for normal people, the web UI expects the a model to be present under `./training/`. For example: * `./training/valle/` which contains `config.yaml` and `ckpt`.
Author

Okay, I placed the valle folder. Correct me if I did anything wrong because I still got errors. It's currently placed like this
/ai-voice-cloning/training/valle/ckpt/ar-retnet-4/fp32.pth
/ai-voice-cloning/training/valle/ckpt/nar-retnet-4/fp32.pth
/ai-voice-cloning/training/valle/config.yaml

./start.sh --tts-backend="vall-e"
Whisper detected
Traceback (most recent call last):
File "/home/user/ai-voice-cloning/src/utils.py", line 88, in <module>
from vall_e.inference import TTS as VALLE_TTS
File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/inference.py", line 15, in <module>
from .train import load_engines
File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/train.py", line 4, in <module>
from .data import create_train_val_dataloader
File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/data.py", line 597, in <module>
@cfg.diskcache()
File "/usr/lib/python3.10/functools.py", line 981, in __get__
val = self.func(instance)
File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/config.py", line 460, in diskcache
return diskcache.Cache(self.cache_dir).memoize
File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/config.py", line 455, in cache_dir
return ".cache" / self.relpath
File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/config.py", line 28, in relpath
return Path(self.cfg_path)
File "/usr/lib/python3.10/pathlib.py", line 960, in __new__
self = cls._from_parts(args)
File "/usr/lib/python3.10/pathlib.py", line 594, in _from_parts
drv, root, parts = self._parse_args(args)
File "/usr/lib/python3.10/pathlib.py", line 578, in _parse_args
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Traceback (most recent call last):
File "/home/user/ai-voice-cloning/src/utils.py", line 105, in <module>
import bark
ModuleNotFoundError: No module named 'bark'

Running on local URL: http://XXX.X.X.X:XXXX

To create a public link, set share=Trueinlaunch().
Loading VALL-E... (Config: None)
Traceback (most recent call last):
File "/home/user/ai-voice-cloning/./src/main.py", line 27, in <module>
tts = load_tts()
File "/home/user/ai-voice-cloning/src/utils.py", line 3629, in load_tts
tts = VALLE_TTS(config=args.valle_model)
NameError: name 'VALLE_TTS' is not defined

Okay, I placed the valle folder. Correct me if I did anything wrong because I still got errors. It's currently placed like this `/ai-voice-cloning/training/valle/ckpt/ar-retnet-4/fp32.pth` `/ai-voice-cloning/training/valle/ckpt/nar-retnet-4/fp32.pth` `/ai-voice-cloning/training/valle/config.yaml` ` ./start.sh --tts-backend="vall-e"` `Whisper detected` `Traceback (most recent call last):` `File "/home/user/ai-voice-cloning/src/utils.py", line 88, in <module>` `from vall_e.inference import TTS as VALLE_TTS` `File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/inference.py", line 15, in <module>` `from .train import load_engines` `File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/train.py", line 4, in <module>` `from .data import create_train_val_dataloader` `File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/data.py", line 597, in <module>` `@cfg.diskcache()` `File "/usr/lib/python3.10/functools.py", line 981, in __get__` `val = self.func(instance)` `File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/config.py", line 460, in diskcache` `return diskcache.Cache(self.cache_dir).memoize` `File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/config.py", line 455, in cache_dir` `return ".cache" / self.relpath` `File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/config.py", line 28, in relpath` `return Path(self.cfg_path)` `File "/usr/lib/python3.10/pathlib.py", line 960, in __new__` `self = cls._from_parts(args)` `File "/usr/lib/python3.10/pathlib.py", line 594, in _from_parts` `drv, root, parts = self._parse_args(args)` `File "/usr/lib/python3.10/pathlib.py", line 578, in _parse_args` `a = os.fspath(a)` `TypeError: expected str, bytes or os.PathLike object, not NoneType` `Traceback (most recent call last):` `File "/home/user/ai-voice-cloning/src/utils.py", line 105, in <module>` `import bark` `ModuleNotFoundError: No module named 'bark'` `Running on local URL: http://XXX.X.X.X:XXXX` `To create a public link, set `share=True` in `launch()`.` `Loading VALL-E... (Config: None)` `Traceback (most recent call last):` `File "/home/user/ai-voice-cloning/./src/main.py", line 27, in <module>` `tts = load_tts()` `File "/home/user/ai-voice-cloning/src/utils.py", line 3629, in load_tts` `tts = VALLE_TTS(config=args.valle_model)` `NameError: name 'VALLE_TTS' is not defined`
Owner

TypeError: expected str, bytes or os.PathLike object, not NoneType

Right. I forgot to try and figure out an elegant solution to that during my seldom inference tests using the web UI.

Use ./start.sh --tts-backend="vall-e" yaml="./training/valle/config.yaml".

Remedied in mrq/vall-e commit d1065984.

> TypeError: expected str, bytes or os.PathLike object, not NoneType ~~Right. I forgot to try and figure out an elegant solution to that during my seldom inference tests using the web UI.~~ ~~Use `./start.sh --tts-backend="vall-e" yaml="./training/valle/config.yaml"`.~~ Remedied in [mrq/vall-e](https://git.ecker.tech/mrq/vall-e/) commit [`d1065984`](https://git.ecker.tech/mrq/vall-e/commit/d106598403e3764025dfc9bd9f6868fdf90ccfc9).
Author

Okay, that did it! Thank you for the quick response!

Okay, that did it! Thank you for the quick response!
Author

Using this command works
./start.sh --tts-backend="vall-e" yaml="./training/valle/config.yaml"
But using this command brings up an error
./start.sh --tts-backend="vall-e"
`Whisper`` detected
[2023-08-23 09:08:37,714] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-08-23 09:08:40,229] [INFO] [comm.py:631:init_distributed] cdb=None
[2023-08-23 09:08:40,229] [INFO] [comm.py:662:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
VALL-E detected
Traceback (most recent call last):
File "/home/user/ai-voice-cloning/src/utils.py", line 105, in
import bark
ModuleNotFoundError: No module named 'bark'

Running on local URL: http://XXX.X.X.X:XXXX

To create a public link, set share=True in launch().
Loading VALL-E... (Config: None)
Traceback (most recent call last):
File "/home/user/ai-voice-cloning/./src/main.py", line 27, in
tts = load_tts()
File "/home/user/ai-voice-cloning/src/utils.py", line 3629, in load_tts
tts = VALLE_TTS(config=args.valle_model)
File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/inference.py", line 55, in init
self.load_models()
File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/inference.py", line 64, in load_models
engines = load_engines()
File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/utils/trainer.py", line 64, in load_engines
models = get_models(cfg.models.get())
TypeError: Models.get() missing 1 required positional argument: 'self'`

Using this command works `./start.sh --tts-backend="vall-e" yaml="./training/valle/config.yaml"` But using this command brings up an error `./start.sh --tts-backend="vall-e"` `Whisper`` detected [2023-08-23 09:08:37,714] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-08-23 09:08:40,229] [INFO] [comm.py:631:init_distributed] cdb=None [2023-08-23 09:08:40,229] [INFO] [comm.py:662:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl VALL-E detected Traceback (most recent call last): File "/home/user/ai-voice-cloning/src/utils.py", line 105, in <module> import bark ModuleNotFoundError: No module named 'bark' Running on local URL: http://XXX.X.X.X:XXXX To create a public link, set `share=True` in `launch()`. Loading VALL-E... (Config: None) Traceback (most recent call last): File "/home/user/ai-voice-cloning/./src/main.py", line 27, in <module> tts = load_tts() File "/home/user/ai-voice-cloning/src/utils.py", line 3629, in load_tts tts = VALLE_TTS(config=args.valle_model) File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/inference.py", line 55, in __init__ self.load_models() File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/inference.py", line 64, in load_models engines = load_engines() File "/home/user/ai-voice-cloning/modules/vall-e/vall_e/utils/trainer.py", line 64, in load_engines models = get_models(cfg.models.get()) TypeError: Models.get() missing 1 required positional argument: 'self'`

Having the same issue as the last one on the default launch through start.bat --tts-backend="vall-e"
And having NameError: name 'VALLE_TTS' is not defined when launching with yaml="./training/valle/config.yaml"

Config file, 2 pth and data.h5 are set in the training valle folder.

I assume it is easier to launch vall-e through it's own separated fork rather web ui?

Having the same issue as the last one on the default launch through ```start.bat --tts-backend="vall-e"``` And having ```NameError: name 'VALLE_TTS' is not defined``` when launching with ```yaml="./training/valle/config.yaml"``` Config file, 2 pth and data.h5 are set in the training valle folder. I assume it is easier to launch vall-e through it's own separated fork rather web ui?

Ok, I solved the issue and was able to launch it on windows. What is written bellow are my steps till finally generating something with vall-e through webUI. I am not a coder so consider that I just tried to fix my current errors without thinking how it will break other code and functions. This was quite CBT and I hope in the future it will be as easy to set up as tortoise.

First of all I launch it as reccomended in this thread through
start.bat --tts-backend="vall-e" yaml="./training/valle/config.yaml"
Only that way it launches with more or less reasonable errors.

  1. Then I had already known error NameError: name 'VALLE_TTS' is not defined and even tho run crashed once on it I tried removing it out of try/except statement in the beginning of the untils.py file. Eventually it started to work both ways with try/except or without.

  2. The next error was the one that I saw together with previous one ModuleNotFoundError: No module named 'deepspeed'. So I went to check the modules\vall-e\vall_e\engines\__init__.py and noticed the check for the engine choice. I remembered that deepspeed doesn't work on windows or it works, but I definitely not having it pre-installed with webUI so I just commented the whole if/elif statement and just pasted from .base import Engine.

  3. After this I got the error

Loading VALL-E... (Config: None)
ar parameter count: 206884865
nar parameter count: 206882816
Traceback (most recent call last):
  File "G:\Programs Fast\ai-voice-cloning\src\main.py", line 27, in <module>
    tts = load_tts()
  File "G:\Programs Fast\ai-voice-cloning\src\utils.py", line 3651, in load_tts
    tts = VALLE_TTS(config=args.valle_model)
  File "g:\programs fast\ai-voice-cloning\modules\vall-e\vall_e\inference.py", line 63, in __init__
    self.load_models()
  File "g:\programs fast\ai-voice-cloning\modules\vall-e\vall_e\inference.py", line 74, in load_models
    engines = load_engines()
  File "g:\programs fast\ai-voice-cloning\modules\vall-e\vall_e\utils\trainer.py", line 127, in load_engines
    engines.load_checkpoint()
  File "g:\programs fast\ai-voice-cloning\modules\vall-e\vall_e\engines\base.py", line 312, in load_checkpoint
    self.set_lr(cfg.hyperparameters.learning_rate)
  File "g:\programs fast\ai-voice-cloning\modules\vall-e\vall_e\engines\base.py", line 320, in set_lr
    engine.set_lr(lr)
  File "g:\programs fast\ai-voice-cloning\modules\vall-e\vall_e\engines\base.py", line 193, in set_lr
    for param_group in self.optimizer.param_groups:
AttributeError: 'NoneType' object has no attribute 'param_groups'

So I had no idea what is it, but set_lr made me thing that it is related to config file. So instead of using config.yaml from hugging face I placed one that was pre-installed with vall-e (I have no idea why are they different). And it went further.

  1. I get the
    FileNotFoundError: [Errno 2] No such file or directory: 'training\\valle\\ckpt\\ar-retnet-4\\fp32.pth'. I remember that I downloaded everything partially from hugging face without folders. And this time I recreated folders hierarchy from hugging face here in my local valle folder.

  2. So now I launch webUI and get this on a generation RuntimeError: espeak not installed on your system. I installed it on my system from https://github.com/espeak-ng/espeak-ng/releases . Still didn't work. I added it to the PATH variable and also created 2 system variables myself "PHONEMIZER_ESPEAK_LIBRARY" with the direct path to libespeak-ng.dll, and "PHONEMIZER_ESPEAK_PATH" to the whole folder so the paths looked like "C:\Program Files\eSpeak NG\libespeak-ng.dll" and "C:\Program Files\eSpeak NG". Still didn't work! So I just went to the .\venv\Lib\site-packages\phonemizer\backend\espeak\base.py and added in the beginning where all the imports are:

_ESPEAK_LIBRARY = 'C:\Program Files\eSpeak NG\libespeak-ng.dll'
EspeakWrapper.set_library(_ESPEAK_LIBRARY)
  1. And wow it worked and even generated something, but this something was shorter than 1 second and sounded super bad. I increased samples from 4 to 16 to 64 to 256 to 513 to 1024 and sadly almost everywhere it cuts most of the final audio and on 512 where it was full sentence, it was absolutely wrong voice with tons of artefacts. Is it intended to be like that? Or it is my broken implementation where some dependencies are skipped?

Either way hope this will help someone to at least launch vall-e on windows in webUI. And wish mrq to generate a good base model! I still not sure if I can ever possibly train such complicated model myself, but I hope in the near future to see extra setting and new base model, I use tts to copy voices of a game characters which don't work on base models except fine tuned tortoise for specific voices. I haven't read the paper in details, but I remember vall-e takes only 3 seconds of a sample or smth with is very small imo. And also hope to see UI setting to pick between AR, NAR and both even if I don't know how it works yet.

Ok, I solved the issue and was able to launch it on windows. What is written bellow are my steps till finally generating something with vall-e through webUI. I am not a coder so consider that I just tried to fix my current errors without thinking how it will break other code and functions. This was quite CBT and I hope in the future it will be as easy to set up as tortoise. First of all I launch it as reccomended in this thread through ```start.bat --tts-backend="vall-e" yaml="./training/valle/config.yaml"``` Only that way it launches with more or less reasonable errors. 1. Then I had already known error ```NameError: name 'VALLE_TTS' is not defined``` and even tho run crashed once on it I tried removing it out of try/except statement in the beginning of the untils.py file. Eventually it started to work both ways with try/except or without. 2. The next error was the one that I saw together with previous one ```ModuleNotFoundError: No module named 'deepspeed'```. So I went to check the ```modules\vall-e\vall_e\engines\__init__.py``` and noticed the check for the engine choice. I remembered that deepspeed doesn't work on windows or it works, but I definitely not having it pre-installed with webUI so I just commented the whole if/elif statement and just pasted ```from .base import Engine```. 3. After this I got the error ``` Loading VALL-E... (Config: None) ar parameter count: 206884865 nar parameter count: 206882816 Traceback (most recent call last): File "G:\Programs Fast\ai-voice-cloning\src\main.py", line 27, in <module> tts = load_tts() File "G:\Programs Fast\ai-voice-cloning\src\utils.py", line 3651, in load_tts tts = VALLE_TTS(config=args.valle_model) File "g:\programs fast\ai-voice-cloning\modules\vall-e\vall_e\inference.py", line 63, in __init__ self.load_models() File "g:\programs fast\ai-voice-cloning\modules\vall-e\vall_e\inference.py", line 74, in load_models engines = load_engines() File "g:\programs fast\ai-voice-cloning\modules\vall-e\vall_e\utils\trainer.py", line 127, in load_engines engines.load_checkpoint() File "g:\programs fast\ai-voice-cloning\modules\vall-e\vall_e\engines\base.py", line 312, in load_checkpoint self.set_lr(cfg.hyperparameters.learning_rate) File "g:\programs fast\ai-voice-cloning\modules\vall-e\vall_e\engines\base.py", line 320, in set_lr engine.set_lr(lr) File "g:\programs fast\ai-voice-cloning\modules\vall-e\vall_e\engines\base.py", line 193, in set_lr for param_group in self.optimizer.param_groups: AttributeError: 'NoneType' object has no attribute 'param_groups' ``` So I had no idea what is it, but set_lr made me thing that it is related to config file. So instead of using config.yaml from hugging face I placed one that was pre-installed with vall-e (I have no idea why are they different). And it went further. 4. I get the ```FileNotFoundError: [Errno 2] No such file or directory: 'training\\valle\\ckpt\\ar-retnet-4\\fp32.pth'```. I remember that I downloaded everything partially from hugging face without folders. And this time I recreated folders hierarchy from hugging face here in my local valle folder. 5. So now I launch webUI and get this on a generation ```RuntimeError: espeak not installed on your system```. I installed it on my system from https://github.com/espeak-ng/espeak-ng/releases . Still didn't work. I added it to the PATH variable and also created 2 system variables myself "PHONEMIZER_ESPEAK_LIBRARY" with the direct path to libespeak-ng.dll, and "PHONEMIZER_ESPEAK_PATH" to the whole folder so the paths looked like "C:\Program Files\eSpeak NG\libespeak-ng.dll" and "C:\Program Files\eSpeak NG". Still didn't work! So I just went to the ```.\venv\Lib\site-packages\phonemizer\backend\espeak\base.py``` and added in the beginning where all the imports are: ```from phonemizer.backend.espeak.wrapper import EspeakWrapper _ESPEAK_LIBRARY = 'C:\Program Files\eSpeak NG\libespeak-ng.dll' EspeakWrapper.set_library(_ESPEAK_LIBRARY) ``` 6. And wow it worked and even generated something, but this something was shorter than 1 second and sounded super bad. I increased samples from 4 to 16 to 64 to 256 to 513 to 1024 and sadly almost everywhere it cuts most of the final audio and on 512 where it was full sentence, it was absolutely wrong voice with tons of artefacts. Is it intended to be like that? Or it is my broken implementation where some dependencies are skipped? Either way hope this will help someone to at least launch vall-e on windows in webUI. And wish mrq to generate a good base model! I still not sure if I can ever possibly train such complicated model myself, but I hope in the near future to see extra setting and new base model, I use tts to copy voices of a game characters which don't work on base models except fine tuned tortoise for specific voices. I haven't read the paper in details, but I remember vall-e takes only 3 seconds of a sample or smth with is very small imo. And also hope to see UI setting to pick between AR, NAR and both even if I don't know how it works yet.
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#337
No description provided.