Error when resuming a training state. #106

New Issue

ThrowawayAccount01 · 2023-03-10T02:55:46Z

2023-03-10 02:55:46 +00:00

When trying to resume a training state, I get the following error:

C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Spawning process:  train.bat ./training/21/train.yaml
[Training] [2023-03-10T10:53:29.641279]
[Training] [2023-03-10T10:53:29.645292] (venv) C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
[Training] [2023-03-10T10:53:31.537422] Using BitsAndBytes ADAMW optimizations
[Training] [2023-03-10T10:53:31.540420] Disabled distributed training.
[Training] [2023-03-10T10:53:31.543944] Traceback (most recent call last):
[Training] [2023-03-10T10:53:31.547944]   File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\train.py", line 94, in <module>
[Training] [2023-03-10T10:53:31.551452]     train(args.opt, args.launcher)
[Training] [2023-03-10T10:53:31.555462]   File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\train.py", line 80, in train
[Training] [2023-03-10T10:53:31.559460]     trainer.init(yaml, opt, launcher)
[Training] [2023-03-10T10:53:31.563556]   File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\./modules/dlas\codes\train.py", line 45, in init
[Training] [2023-03-10T10:53:31.566555]     resume_state = torch.load(opt['path']['resume_state'], map_location=map_cuda_to_correct_device)
[Training] [2023-03-10T10:53:31.571072]   File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\serialization.py", line 771, in load
[Training] [2023-03-10T10:53:31.574079]     with _open_file_like(f, 'rb') as opened_file:
[Training] [2023-03-10T10:53:31.578080]   File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\serialization.py", line 270, in _open_file_like
[Training] [2023-03-10T10:53:31.581591]     return _open_file(name_or_buffer, mode)
[Training] [2023-03-10T10:53:31.585613]   File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\serialization.py", line 251, in __init__
[Training] [2023-03-10T10:53:31.587612]     super(_open_file, self).__init__(open(name, mode))
[Training] [2023-03-10T10:53:31.589612] FileNotFoundError: [Errno 2] No such file or directory: "./training/21/finetune/training_state//158.state'"

When trying to resume a training state, I get the following error: ``` C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Spawning process: train.bat ./training/21/train.yaml [Training] [2023-03-10T10:53:29.641279] [Training] [2023-03-10T10:53:29.645292] (venv) C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat [Training] [2023-03-10T10:53:31.537422] Using BitsAndBytes ADAMW optimizations [Training] [2023-03-10T10:53:31.540420] Disabled distributed training. [Training] [2023-03-10T10:53:31.543944] Traceback (most recent call last): [Training] [2023-03-10T10:53:31.547944] File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\train.py", line 94, in <module> [Training] [2023-03-10T10:53:31.551452] train(args.opt, args.launcher) [Training] [2023-03-10T10:53:31.555462] File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\train.py", line 80, in train [Training] [2023-03-10T10:53:31.559460] trainer.init(yaml, opt, launcher) [Training] [2023-03-10T10:53:31.563556] File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\./modules/dlas\codes\train.py", line 45, in init [Training] [2023-03-10T10:53:31.566555] resume_state = torch.load(opt['path']['resume_state'], map_location=map_cuda_to_correct_device) [Training] [2023-03-10T10:53:31.571072] File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\serialization.py", line 771, in load [Training] [2023-03-10T10:53:31.574079] with _open_file_like(f, 'rb') as opened_file: [Training] [2023-03-10T10:53:31.578080] File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\serialization.py", line 270, in _open_file_like [Training] [2023-03-10T10:53:31.581591] return _open_file(name_or_buffer, mode) [Training] [2023-03-10T10:53:31.585613] File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\serialization.py", line 251, in __init__ [Training] [2023-03-10T10:53:31.587612] super(_open_file, self).__init__(open(name, mode)) [Training] [2023-03-10T10:53:31.589612] FileNotFoundError: [Errno 2] No such file or directory: "./training/21/finetune/training_state//158.state'" ```

ThrowawayAccount01 commented

2023-03-10 03:21:16 +00:00

I believe the error is with [Errno 2] No such file or directory: "./training/21/finetune/training_state//158.state'"

When saving the training configuration, an extra ' is added at the end of the path automatically. Manually editing it out of the yaml fixed it.

I believe the error is with `[Errno 2] No such file or directory: "./training/21/finetune/training_state//158.state'"` When saving the training configuration, an extra `'` is added at the end of the path automatically. Manually editing it out of the yaml fixed it.

ThrowawayAccount01 commented

2023-03-10 03:27:14 +00:00

Also, after a fresh install, voicefixer breaks again and librosa has to be downgraded again manually to 0.8.0 for it to work. 'update-force.bat' hasn't been updated for this fix yet, I believe?

mrq commented

2023-03-10 03:50:59 +00:00

When saving the training configuration, an extra ' is added at the end of the path automatically. Manually editing it out of the yaml fixed it.

Ah, I see. Fixed in commit c92b006129. Or rather, maybe the blame is on Sublime Text automatically putting the closing quote and I forgot to erase it.

'update-force.bat' hasn't been updated for this fix yet, I believe?

I have librosa frozen to 0.8.1 in either tortoise-tts' requirements or DLAS's. I suppose I'll just have the setup scripts uninstall and install the right librosa and einops.

> When saving the training configuration, an extra ' is added at the end of the path automatically. Manually editing it out of the yaml fixed it. Ah, I see. Fixed in commit c92b006129bf6b91557e3558816fc8f3a601cb3b. Or rather, maybe the blame is on Sublime Text automatically putting the closing quote and I forgot to erase it. > 'update-force.bat' hasn't been updated for this fix yet, I believe? I have librosa frozen to 0.8.1 in either tortoise-tts' requirements or DLAS's. I *suppose* I'll just have the setup scripts uninstall and install the right librosa and einops.

mrq closed this issue

2023-03-10 03:50:59 +00:00

Sign in to join this conversation.