Error when running start.bat #6

Closed
opened 2023-02-19 03:40:06 +07:00 by ThrowawayAccount01 · 24 comments

I get this error when running start.bat:

C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Initializating TorToiSe... (using model: None)
Traceback (most recent call last):
  File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\main.py", line 22, in <module>
    tts = setup_tortoise()
  File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\utils.py", line 502, in setup_tortoise
    tts = TextToSpeech(minor_optimizations=not args.low_vram, autoregressive_model_path=args.autoregressive_model)
TypeError: __init__() got an unexpected keyword argument 'autoregressive_model_path'

I am running:

-Python 3.9
-RTX 3080

I get this error when running start.bat: ``` C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Initializating TorToiSe... (using model: None) Traceback (most recent call last): File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\main.py", line 22, in <module> tts = setup_tortoise() File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\utils.py", line 502, in setup_tortoise tts = TextToSpeech(minor_optimizations=not args.low_vram, autoregressive_model_path=args.autoregressive_model) TypeError: __init__() got an unexpected keyword argument 'autoregressive_model_path' ``` I am running: -Python 3.9 -RTX 3080

You'll need to run:

call .\venv\Scripts\activate.bat
pip install -U git+https://git.ecker.tech/mrq/tortoise-tts.git

Just for my curiosity:

  • did you happen to move over .\tortoise-tts\tortoise-venv over to .\ai-voice-cloning\venv\?
  • did you use the update.bat script to update earlier, or just a git pull?

Either way, I updated mrq/tortoise-tts to add in autoregressive model loading, and I realized after the fact one of the problems with now splitting up the two is that you'll need to re-install it through PIP time I update that repo too. I might have a better way to go about it in the future instead.

You'll need to run: ``` call .\venv\Scripts\activate.bat pip install -U git+https://git.ecker.tech/mrq/tortoise-tts.git ``` Just for my curiosity: * did you happen to move over `.\tortoise-tts\tortoise-venv` over to` .\ai-voice-cloning\venv\`? * did you use the `update.bat` script to update earlier, or just a `git pull`? Either way, I updated [mrq/tortoise-tts](https://git.ecker.tech/mrq/tortoise-tts) to add in autoregressive model loading, and I realized after the fact one of the problems with now splitting up the two is that you'll need to re-install it through PIP time I update that repo too. I might have a better way to go about it in the future instead.

I'm still getting the same error:

snip

This is a clean install, no files were migrated over

I have used update.bat as well as update-force.bat before running start.bat

I'm still getting the same error: ``` snip ``` This is a clean install, no files were migrated over I have used update.bat as well as update-force.bat before running start.bat

Strange.

I suppose some surgery is needed. Save tortoise/api.py and place it under .\venv\Lib\site-packages\tortoise\api.py. I'm not sure why it's not updating even with the -U flag, although I wonder if you really need to pip uninstall tortoise first like someone else mentioned earlier.

I'll look into a better way of integrating mrq/tortoise-tts to avoid these issues in the future.

Strange. I suppose some surgery is needed. Save [tortoise/api.py](https://git.ecker.tech/mrq/tortoise-tts/src/branch/main/tortoise/api.py) and place it under `.\venv\Lib\site-packages\tortoise\api.py`. I'm not sure why it's not updating even with the `-U` flag, although I wonder if you really need to `pip uninstall tortoise` first like someone else mentioned earlier. I'll look into a better way of integrating mrq/tortoise-tts to avoid these issues in the future.

Overwriting api.py seems to have worked.

Upon trying to prepare a dataset, I get the following error:

C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Initializating TorToiSe... (using model: None)
Hardware acceleration found: cuda
TorToiSe initialized, ready for generation.
Loading Whisper model: base
Transcribing file: ./voices\h\hapi01.wav
Traceback (most recent call last):
  File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\whisper\audio.py", line 42, in load_audio
    ffmpeg.input(file, threads=0)
AttributeError: module 'ffmpeg' has no attribute 'input'

I have already set up ffmpeg in my environment path variables.

Overwriting api.py seems to have worked. Upon trying to prepare a dataset, I get the following error: ``` C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Initializating TorToiSe... (using model: None) Hardware acceleration found: cuda TorToiSe initialized, ready for generation. Loading Whisper model: base Transcribing file: ./voices\h\hapi01.wav Traceback (most recent call last): File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\whisper\audio.py", line 42, in load_audio ffmpeg.input(file, threads=0) AttributeError: module 'ffmpeg' has no attribute 'input' ``` I have already set up ffmpeg in my environment path variables.

That's some other funky thing that sometimes crops up, especially in a colab notebook, and I swear one method worked while the next time it needed a different remedy.

Try:

pip uninstall ffmpeg-python
pip install ffmpeg-python
That's some other funky thing that sometimes crops up, especially in a colab notebook, and I swear one method worked while the next time it needed a different remedy. Try: ``` pip uninstall ffmpeg-python pip install ffmpeg-python ```

I managed to prepare the dataset and generate configuration. However, I have been stuck at this step for about an hour now:

C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Initializating voice-fixer
Error occurred while tring to initialize voicefixer: PytorchStreamReader failed reading zip archive: failed finding central directory
Initializating TorToiSe... (using model: None)
Hardware acceleration found: cuda
TorToiSe initialized, ready for generation.
Loading Whisper model: base
Transcribing file: ./voices\h\hapi01.wav
Transcribed file: ./voices\h\hapi01.wav, 5 found.
Transcribing file: ./voices\h\hapi02.wav
Transcribed file: ./voices\h\hapi02.wav, 7 found.
Transcribing file: ./voices\h\hapi03.wav
Transcribed file: ./voices\h\hapi03.wav, 4 found.
Batch size is larger than your dataset, clamping...
Unloading TTS to save VRAM.
Spawning process:  train.bat ./training/h/train.yaml

Another thing to note is that turning on voice fixer also gives the error shown above.

I managed to prepare the dataset and generate configuration. However, I have been stuck at this step for about an hour now: ``` C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Initializating voice-fixer Error occurred while tring to initialize voicefixer: PytorchStreamReader failed reading zip archive: failed finding central directory Initializating TorToiSe... (using model: None) Hardware acceleration found: cuda TorToiSe initialized, ready for generation. Loading Whisper model: base Transcribing file: ./voices\h\hapi01.wav Transcribed file: ./voices\h\hapi01.wav, 5 found. Transcribing file: ./voices\h\hapi02.wav Transcribed file: ./voices\h\hapi02.wav, 7 found. Transcribing file: ./voices\h\hapi03.wav Transcribed file: ./voices\h\hapi03.wav, 4 found. Batch size is larger than your dataset, clamping... Unloading TTS to save VRAM. Spawning process: train.bat ./training/h/train.yaml ``` Another thing to note is that turning on voice fixer also gives the error shown above.

Right, I forgot to have it still print to console if Verbose Output or whatever is unchecked. Restart the UI, and run training with Verbose Output checked, and see what it's getting hung up on. I'll push a quick commit to print all output anyways Commit 485319c2bb will restore it to print to console regardless of setting.

As for the voicefixer thing, as per the documentation, the download was interrupted and for some reason it's not smart enough to restart downloading on its own. Open %USERPROFILE%\.cache\ and delete voicefixer.

Right, I forgot to have it still print to console if `Verbose Output` or whatever is unchecked. Restart the UI, and run training with `Verbose Output` checked, and see what it's getting hung up on. ~~I'll push a quick commit to print all output anyways~~ Commit 485319c2bb25e291868f385180e7bce0db4aa6fe will restore it to print to console regardless of setting. As for the voicefixer thing, as per the [documentation](https://git.ecker.tech/mrq/ai-voice-cloning/wiki/Issues#failed-reading-zip-archive-failed-finding-central-directory), the download was interrupted and for some reason it's not smart enough to restart downloading on its own. Open `%USERPROFILE%\.cache\` and delete `voicefixer`.

I deleted the voicefixer folder and upon restarting, it is stuck on this step for about 10 minutes now:

~~I deleted the voicefixer folder and upon restarting, it is stuck on this step for about 10 minutes now:~~

Ignore what I wrote above, I restarted it and this time it downloaded successfully.

I proceeded to run the training with Verbose Output checked. This is the output it is stuck on:

C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Initializating voice-fixer
initialized voice-fixer
Initializating TorToiSe... (using model: None)
Hardware acceleration found: cuda
TorToiSe initialized, ready for generation.
Loading Whisper model: base
Transcribing file: ./voices\h\hapi01.wav
Transcribed file: ./voices\h\hapi01.wav, 5 found.
Transcribing file: ./voices\h\hapi02.wav
Transcribed file: ./voices\h\hapi02.wav, 9 found.
Transcribing file: ./voices\h\hapi03.wav
Transcribed file: ./voices\h\hapi03.wav, 4 found.
Batch size is larger than your dataset, clamping...
Unloading TTS to save VRAM.
Spawning process:  train.bat ./training/h/train.yaml

[snip]

[Training] [2023-02-19T14:53:00.989793] RuntimeError: DataLoader worker (pid(s) 10424, 14060, 14396, 16924) exited unexpectedly
Ignore what I wrote above, I restarted it and this time it downloaded successfully. I proceeded to run the training with Verbose Output checked. This is the output it is stuck on: ``` C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Initializating voice-fixer initialized voice-fixer Initializating TorToiSe... (using model: None) Hardware acceleration found: cuda TorToiSe initialized, ready for generation. Loading Whisper model: base Transcribing file: ./voices\h\hapi01.wav Transcribed file: ./voices\h\hapi01.wav, 5 found. Transcribing file: ./voices\h\hapi02.wav Transcribed file: ./voices\h\hapi02.wav, 9 found. Transcribing file: ./voices\h\hapi03.wav Transcribed file: ./voices\h\hapi03.wav, 4 found. Batch size is larger than your dataset, clamping... Unloading TTS to save VRAM. Spawning process: train.bat ./training/h/train.yaml [snip] [Training] [2023-02-19T14:53:00.989793] RuntimeError: DataLoader worker (pid(s) 10424, 14060, 14396, 16924) exited unexpectedly ```

OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\lib\torch_python.dll" or one of its dependencies.

I'll assume you're OOMing (system RAM not VRAM).

Outside of the obvious of closing out processes, I'll suggest checking Defer TTS Load in the web UI and restarting the UI, just to make sure nothing TTS gets loaded during training.

> OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\lib\torch_python.dll" or one of its dependencies. I'll assume you're OOMing (system RAM not VRAM). Outside of the obvious of closing out processes, I'll suggest checking `Defer TTS Load` in the web UI and restarting the UI, just to make sure nothing TTS gets loaded during training.

I have closed most processes and checked Defer TTS Load and tried to run it again. I am stuck on this error:

C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Unloading TTS to save VRAM.
Spawning process:  train.bat ./training/h/train.yaml

[snip]

[Training] [2023-02-19T15:33:53.651694] RuntimeError: [enforce fail at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 16777216 bytes.
I have closed most processes and checked Defer TTS Load and tried to run it again. I am stuck on this error: ``` C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Unloading TTS to save VRAM. Spawning process: train.bat ./training/h/train.yaml [snip] [Training] [2023-02-19T15:33:53.651694] RuntimeError: [enforce fail at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 16777216 bytes. ```

Given the revamped configuration generation process and (attempting to) unload all models on training, you might be able to get it working again if you lower the Mega Batch Factor or whatever setting down to 1.

Given the revamped configuration generation process and (attempting to) unload all models on training, you *might* be able to get it working again if you lower the `Mega Batch Factor` or whatever setting down to 1.

Still no dice I'm afraid. It keeps giving me Pytorch/Memory OOM errors no matter how low I tweak the settings. For now I'll have to use colab to train I suppose.

Still no dice I'm afraid. It keeps giving me Pytorch/Memory OOM errors no matter how low I tweak the settings. For now I'll have to use colab to train I suppose.

I guess even 3080s can't train it, as there's another user that can't train off a 3080 either.

I'll cobble together a way to load and finetune it as float16 and see if that gets VRAM consumption down, maybe. I only worry about the performance/quality problems from it, but I suppose it's better than nothing.

I guess even 3080s can't train it, as there's another user that can't train off a 3080 either. I'll cobble together a way to load and finetune it as float16 and see if that gets VRAM consumption down, maybe. I only worry about the performance/quality problems from it, but I suppose it's better than nothing.

As I mentioned in #17 (comment), I added an experimental way to train fully at half-precision that will convert the original model to one at float16. It fails with 8GiB a few steps in training, but peaks at 52% VRAM usage on a machine with 16GiB of VRAM, so it might work on a 3080. You're welcome to try, but I have zero guarantees it'll work, or produce usable output yet, as I still need to actually finetune a model at half precision.

As I mentioned in https://git.ecker.tech/mrq/ai-voice-cloning/issues/17#issuecomment-300_, I added an experimental way to train fully at half-precision that will convert the original model to one at float16. It fails with 8GiB a few steps in training, but peaks at 52% VRAM usage on a machine with 16GiB of VRAM, so it might work on a 3080. You're welcome to try, but I have zero guarantees it'll work, or produce usable output yet, as I still need to actually finetune a model at half precision.

I tried to generate configuration with half-precision enabled, but it gives me this error:

C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 384, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1024, in process_api
    result = await self.call_function(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 836, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\webui.py", line 220, in save_training_settings_proxy
    messages.append(save_training_settings(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\utils.py", line 695, in save_training_settings
    if not os.path.exists(get_halfp_model()):
NameError: name 'get_halfp_model' is not defined
I tried to generate configuration with half-precision enabled, but it gives me this error: ``` C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Traceback (most recent call last): File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 384, in run_predict output = await app.get_blocks().process_api( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1024, in process_api result = await self.call_function( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 836, in call_function prediction = await anyio.to_thread.run_sync( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run result = context.run(func, *args) File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\webui.py", line 220, in save_training_settings_proxy messages.append(save_training_settings( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\utils.py", line 695, in save_training_settings if not os.path.exists(get_halfp_model()): NameError: name 'get_halfp_model' is not defined ```

Oops, don't know how I managed to get it working for me, fixed in commit 93b061fb4d.

Oops, don't know how I managed to get it working for me, fixed in commit 93b061fb4d85fb4c26bf88cf47de879a7ab187ad.

Updated and upon running start.bat I get this error:

C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Traceback (most recent call last):
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\main.py", line 19, in <module>
    webui = setup_gradio()
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\webui.py", line 339, in setup_gradio
    history_voices = gr.Dropdown(choices=result_voices, label="Voice", type="value", value=result_voices[0] if len(results_voices) > 0 else "")
NameError: name 'results_voices' is not defined
Updated and upon running start.bat I get this error: ``` C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Traceback (most recent call last): File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\main.py", line 19, in <module> webui = setup_gradio() File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\webui.py", line 339, in setup_gradio history_voices = gr.Dropdown(choices=result_voices, label="Voice", type="value", value=result_voices[0] if len(results_voices) > 0 else "") NameError: name 'results_voices' is not defined ```

Fixed, I had it right, but I copied over the typo fix.

Fixed, I had it right, but I copied over the typo fix.

I still get an error when trying to save the training configuration. I think it's still the same one from before?

C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 384, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1024, in process_api
    result = await self.call_function(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 836, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\webui.py", line 220, in save_training_settings_proxy
    messages.append(save_training_settings(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\utils.py", line 695, in save_training_settings
    if not os.path.exists(get_halfp_model()):
NameError: name 'get_halfp_model' is not defined
I still get an error when trying to save the training configuration. I think it's still the same one from before? ``` C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Traceback (most recent call last): File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 384, in run_predict output = await app.get_blocks().process_api( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1024, in process_api result = await self.call_function( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 836, in call_function prediction = await anyio.to_thread.run_sync( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run result = context.run(func, *args) File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\webui.py", line 220, in save_training_settings_proxy messages.append(save_training_settings( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\utils.py", line 695, in save_training_settings if not os.path.exists(get_halfp_model()): NameError: name 'get_halfp_model' is not defined ```

For sure really fixed in 526a430c2a or I'm blowing my brains out. I don't know how it reverted.

For sure really fixed in 526a430c2adf2287d22697b8e2800128e7a42060 or I'm blowing my brains out. I don't know how it reverted.

I tried running a very small dataset of 16 clips on half precision. Unfortunately it still gives OOM Errors.

C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Spawning process:  train.bat ./training/h/train.yaml

[snip]

[Training] [2023-02-22T22:06:16.263674] 23-02-22 22:06:16.262 - INFO: [epoch:  0, iter:       0, lr:(1.000e-05,1.000e-05,)] step: 0.0000e+00 samples: 1.7000e+01 megasamples: 1.7000e-05 iteration_rate: 3.9558e-01 loss_text_ce: 4.1261e+00 loss_mel_ce: 2.8033e+00 loss_gpt_total: 2.8446e+00 grad_scaler_scale: 6.5536e+04 learning_rate_gpt_0: 1.0000e-05 learning_rate_gpt_1: 1.0000e-05 total_samples_loaded: 1.7000e+01 percent_skipped_samples: 1.0526e-01 percent_conditioning_is_self: 8.9474e-01 gpt_conditioning_encoder: 1.5581e+00 gpt_gpt: 2.1085e+00 gpt_heads: 5.0641e-01
[Training] [2023-02-22T22:06:16.263674] 23-02-22 22:06:16.262 - INFO: Saving models and training states.
[Training] [2023-02-22T22:06:23.283171]
[Training] [2023-02-22T22:06:23.866625] 100%|##########| 1/1 [00:21<00:00, 21.42s/it]
[Training] [2023-02-22T22:06:23.866625] 100%|##########| 1/1 [00:22<00:00, 22.00s/it]
[Training] [2023-02-22T22:06:23.866625]
[Training] [2023-02-22T22:06:32.493621]   0%|          | 0/1 [00:00<?, ?it/s]C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\models\audio\tts\tacotron2\taco_utils.py:17: WavFileWarning: Chunk (non-data) not understood, skipping it.
[Training] [2023-02-22T22:06:32.493621]   sampling_rate, data = read(full_path)
[Training] [2023-02-22T22:06:38.031323]
[Training] [2023-02-22T22:06:38.031323]   0%|          | 0/1 [00:14<?, ?it/s]
[Training] [2023-02-22T22:06:38.031323] Traceback (most recent call last):
[Training] [2023-02-22T22:06:38.031323]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\train.py", line 62, in <module>
[Training] [2023-02-22T22:06:38.031323]     train(args.opt, args.launcher)
[Training] [2023-02-22T22:06:38.031323]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\train.py", line 53, in train
[Training] [2023-02-22T22:06:38.031323]     trainer.do_training()
[Training] [2023-02-22T22:06:38.031323]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas\codes\train.py", line 330, in do_training
[Training] [2023-02-22T22:06:38.032827]     self.do_step(train_data)
[Training] [2023-02-22T22:06:38.032827]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas\codes\train.py", line 211, in do_step
[Training] [2023-02-22T22:06:38.032827]     gradient_norms_dict = self.model.optimize_parameters(self.current_step, return_grad_norms=will_log)
[Training] [2023-02-22T22:06:38.032827]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\trainer\ExtensibleTrainer.py", line 302, in optimize_parameters
[Training] [2023-02-22T22:06:38.032827]     ns = step.do_forward_backward(state, m, step_num, train=train_step, no_ddp_sync=(m+1 < self.batch_factor))
[Training] [2023-02-22T22:06:38.032827]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\trainer\steps.py", line 246, in do_forward_backward
[Training] [2023-02-22T22:06:38.033832]     injected = inj(local_state)
[Training] [2023-02-22T22:06:38.033832]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
[Training] [2023-02-22T22:06:38.033832]     return forward_call(*input, **kwargs)
[Training] [2023-02-22T22:06:38.033832]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\trainer\injectors\base_injectors.py", line 93, in forward
[Training] [2023-02-22T22:06:38.046344]     results = method(*params, **self.args)
[Training] [2023-02-22T22:06:38.046344]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
[Training] [2023-02-22T22:06:38.046344]     return forward_call(*input, **kwargs)
[Training] [2023-02-22T22:06:38.046344]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\parallel\data_parallel.py", line 169, in forward
[Training] [2023-02-22T22:06:38.052852]     return self.module(*inputs[0], **kwargs[0])
[Training] [2023-02-22T22:06:38.052852]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
[Training] [2023-02-22T22:06:38.052852]     return forward_call(*input, **kwargs)
[Training] [2023-02-22T22:06:38.052852]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\models\audio\tts\unified_voice2.py", line 425, in forward
[Training] [2023-02-22T22:06:38.058357]     loss_mel = F.cross_entropy(mel_logits, mel_targets.long())
[Training] [2023-02-22T22:06:38.058357]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\functional.py", line 3026, in cross_entropy
[Training] [2023-02-22T22:06:38.059362]     return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
[Training] [2023-02-22T22:06:38.059362] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 10.00 GiB total capacity; 8.85 GiB already allocated; 0 bytes free; 9.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I tried running a very small dataset of 16 clips on half precision. Unfortunately it still gives OOM Errors. ``` C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Spawning process: train.bat ./training/h/train.yaml [snip] [Training] [2023-02-22T22:06:16.263674] 23-02-22 22:06:16.262 - INFO: [epoch: 0, iter: 0, lr:(1.000e-05,1.000e-05,)] step: 0.0000e+00 samples: 1.7000e+01 megasamples: 1.7000e-05 iteration_rate: 3.9558e-01 loss_text_ce: 4.1261e+00 loss_mel_ce: 2.8033e+00 loss_gpt_total: 2.8446e+00 grad_scaler_scale: 6.5536e+04 learning_rate_gpt_0: 1.0000e-05 learning_rate_gpt_1: 1.0000e-05 total_samples_loaded: 1.7000e+01 percent_skipped_samples: 1.0526e-01 percent_conditioning_is_self: 8.9474e-01 gpt_conditioning_encoder: 1.5581e+00 gpt_gpt: 2.1085e+00 gpt_heads: 5.0641e-01 [Training] [2023-02-22T22:06:16.263674] 23-02-22 22:06:16.262 - INFO: Saving models and training states. [Training] [2023-02-22T22:06:23.283171] [Training] [2023-02-22T22:06:23.866625] 100%|##########| 1/1 [00:21<00:00, 21.42s/it] [Training] [2023-02-22T22:06:23.866625] 100%|##########| 1/1 [00:22<00:00, 22.00s/it] [Training] [2023-02-22T22:06:23.866625] [Training] [2023-02-22T22:06:32.493621] 0%| | 0/1 [00:00<?, ?it/s]C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\models\audio\tts\tacotron2\taco_utils.py:17: WavFileWarning: Chunk (non-data) not understood, skipping it. [Training] [2023-02-22T22:06:32.493621] sampling_rate, data = read(full_path) [Training] [2023-02-22T22:06:38.031323] [Training] [2023-02-22T22:06:38.031323] 0%| | 0/1 [00:14<?, ?it/s] [Training] [2023-02-22T22:06:38.031323] Traceback (most recent call last): [Training] [2023-02-22T22:06:38.031323] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\train.py", line 62, in <module> [Training] [2023-02-22T22:06:38.031323] train(args.opt, args.launcher) [Training] [2023-02-22T22:06:38.031323] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\train.py", line 53, in train [Training] [2023-02-22T22:06:38.031323] trainer.do_training() [Training] [2023-02-22T22:06:38.031323] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas\codes\train.py", line 330, in do_training [Training] [2023-02-22T22:06:38.032827] self.do_step(train_data) [Training] [2023-02-22T22:06:38.032827] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas\codes\train.py", line 211, in do_step [Training] [2023-02-22T22:06:38.032827] gradient_norms_dict = self.model.optimize_parameters(self.current_step, return_grad_norms=will_log) [Training] [2023-02-22T22:06:38.032827] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\trainer\ExtensibleTrainer.py", line 302, in optimize_parameters [Training] [2023-02-22T22:06:38.032827] ns = step.do_forward_backward(state, m, step_num, train=train_step, no_ddp_sync=(m+1 < self.batch_factor)) [Training] [2023-02-22T22:06:38.032827] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\trainer\steps.py", line 246, in do_forward_backward [Training] [2023-02-22T22:06:38.033832] injected = inj(local_state) [Training] [2023-02-22T22:06:38.033832] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl [Training] [2023-02-22T22:06:38.033832] return forward_call(*input, **kwargs) [Training] [2023-02-22T22:06:38.033832] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\trainer\injectors\base_injectors.py", line 93, in forward [Training] [2023-02-22T22:06:38.046344] results = method(*params, **self.args) [Training] [2023-02-22T22:06:38.046344] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl [Training] [2023-02-22T22:06:38.046344] return forward_call(*input, **kwargs) [Training] [2023-02-22T22:06:38.046344] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\parallel\data_parallel.py", line 169, in forward [Training] [2023-02-22T22:06:38.052852] return self.module(*inputs[0], **kwargs[0]) [Training] [2023-02-22T22:06:38.052852] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl [Training] [2023-02-22T22:06:38.052852] return forward_call(*input, **kwargs) [Training] [2023-02-22T22:06:38.052852] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\models\audio\tts\unified_voice2.py", line 425, in forward [Training] [2023-02-22T22:06:38.058357] loss_mel = F.cross_entropy(mel_logits, mel_targets.long()) [Training] [2023-02-22T22:06:38.058357] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\functional.py", line 3026, in cross_entropy [Training] [2023-02-22T22:06:38.059362] return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing) [Training] [2023-02-22T22:06:38.059362] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 10.00 GiB total capacity; 8.85 GiB already allocated; 0 bytes free; 9.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ```

It seems to at least be training somewhat (albeit slowly with that 20s/it) before OOMing, try lowering the batch size to 2.
If not, then use this copium in place of your train.bat:

call .\venv\Scripts\activate.bat
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:64
python ./src/train.py -opt "%1"
deactivate
pause

And if it absolutely will not let you after trying each of those, then I suppose I'll need to find some more VRAM savings somewhere else, like training out a model with smaller network parameters.

It seems to at least be training somewhat (albeit slowly with that 20s/it) before OOMing, try lowering the batch size to 2. If not, then use this copium in place of your `train.bat`: ``` call .\venv\Scripts\activate.bat set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:64 python ./src/train.py -opt "%1" deactivate pause ``` And if it absolutely will not let you after trying each of those, then I suppose I'll need to find some more VRAM savings somewhere else, like training out a model with smaller network parameters.

I've had a breakthrough with being able to train even on my 2060.

Refer to #25.

I've had a breakthrough with being able to train even on my 2060. Refer to https://git.ecker.tech/mrq/ai-voice-cloning/issues/25.
mrq closed this issue 2023-02-23 06:26:15 +07:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#6
There is no content yet.