Error when running start.bat #6

New Issue

ThrowawayAccount01 · 2023-02-19T03:40:06Z

2023-02-19 03:40:06 +00:00

I get this error when running start.bat:

C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Initializating TorToiSe... (using model: None)
Traceback (most recent call last):
  File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\main.py", line 22, in <module>
    tts = setup_tortoise()
  File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\utils.py", line 502, in setup_tortoise
    tts = TextToSpeech(minor_optimizations=not args.low_vram, autoregressive_model_path=args.autoregressive_model)
TypeError: __init__() got an unexpected keyword argument 'autoregressive_model_path'

I am running:

-Python 3.9
-RTX 3080

I get this error when running start.bat: ``` C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Initializating TorToiSe... (using model: None) Traceback (most recent call last): File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\main.py", line 22, in <module> tts = setup_tortoise() File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\utils.py", line 502, in setup_tortoise tts = TextToSpeech(minor_optimizations=not args.low_vram, autoregressive_model_path=args.autoregressive_model) TypeError: __init__() got an unexpected keyword argument 'autoregressive_model_path' ``` I am running: -Python 3.9 -RTX 3080

mrq commented

2023-02-19 03:44:41 +00:00

You'll need to run:

call .\venv\Scripts\activate.bat
pip install -U git+https://git.ecker.tech/mrq/tortoise-tts.git

Just for my curiosity:

did you happen to move over .\tortoise-tts\tortoise-venv over to .\ai-voice-cloning\venv\?
did you use the update.bat script to update earlier, or just a git pull?

Either way, I updated mrq/tortoise-tts to add in autoregressive model loading, and I realized after the fact one of the problems with now splitting up the two is that you'll need to re-install it through PIP time I update that repo too. I might have a better way to go about it in the future instead.

You'll need to run: ``` call .\venv\Scripts\activate.bat pip install -U git+https://git.ecker.tech/mrq/tortoise-tts.git ``` Just for my curiosity: * did you happen to move over `.\tortoise-tts\tortoise-venv` over to` .\ai-voice-cloning\venv\`? * did you use the `update.bat` script to update earlier, or just a `git pull`? Either way, I updated [mrq/tortoise-tts](https://git.ecker.tech/mrq/tortoise-tts) to add in autoregressive model loading, and I realized after the fact one of the problems with now splitting up the two is that you'll need to re-install it through PIP time I update that repo too. I might have a better way to go about it in the future instead.

ThrowawayAccount01 commented

2023-02-19 03:51:45 +00:00

I'm still getting the same error:

snip

This is a clean install, no files were migrated over

I have used update.bat as well as update-force.bat before running start.bat

I'm still getting the same error: ``` snip ``` This is a clean install, no files were migrated over I have used update.bat as well as update-force.bat before running start.bat

mrq commented

2023-02-19 04:07:42 +00:00

Strange.

I suppose some surgery is needed. Save tortoise/api.py and place it under .\venv\Lib\site-packages\tortoise\api.py. I'm not sure why it's not updating even with the -U flag, although I wonder if you really need to pip uninstall tortoise first like someone else mentioned earlier.

I'll look into a better way of integrating mrq/tortoise-tts to avoid these issues in the future.

Strange. I suppose some surgery is needed. Save [tortoise/api.py](https://git.ecker.tech/mrq/tortoise-tts/src/branch/main/tortoise/api.py) and place it under `.\venv\Lib\site-packages\tortoise\api.py`. I'm not sure why it's not updating even with the `-U` flag, although I wonder if you really need to `pip uninstall tortoise` first like someone else mentioned earlier. I'll look into a better way of integrating mrq/tortoise-tts to avoid these issues in the future.

ThrowawayAccount01 commented

2023-02-19 04:51:18 +00:00

Overwriting api.py seems to have worked.

Upon trying to prepare a dataset, I get the following error:

C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Initializating TorToiSe... (using model: None)
Hardware acceleration found: cuda
TorToiSe initialized, ready for generation.
Loading Whisper model: base
Transcribing file: ./voices\h\hapi01.wav
Traceback (most recent call last):
  File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\whisper\audio.py", line 42, in load_audio
    ffmpeg.input(file, threads=0)
AttributeError: module 'ffmpeg' has no attribute 'input'

I have already set up ffmpeg in my environment path variables.

Overwriting api.py seems to have worked. Upon trying to prepare a dataset, I get the following error: ``` C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Initializating TorToiSe... (using model: None) Hardware acceleration found: cuda TorToiSe initialized, ready for generation. Loading Whisper model: base Transcribing file: ./voices\h\hapi01.wav Traceback (most recent call last): File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\whisper\audio.py", line 42, in load_audio ffmpeg.input(file, threads=0) AttributeError: module 'ffmpeg' has no attribute 'input' ``` I have already set up ffmpeg in my environment path variables.

mrq commented

2023-02-19 05:01:15 +00:00

That's some other funky thing that sometimes crops up, especially in a colab notebook, and I swear one method worked while the next time it needed a different remedy.

Try:

pip uninstall ffmpeg-python
pip install ffmpeg-python

That's some other funky thing that sometimes crops up, especially in a colab notebook, and I swear one method worked while the next time it needed a different remedy. Try: ``` pip uninstall ffmpeg-python pip install ffmpeg-python ```

ThrowawayAccount01 commented

2023-02-19 06:21:01 +00:00

I managed to prepare the dataset and generate configuration. However, I have been stuck at this step for about an hour now:

C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Initializating voice-fixer
Error occurred while tring to initialize voicefixer: PytorchStreamReader failed reading zip archive: failed finding central directory
Initializating TorToiSe... (using model: None)
Hardware acceleration found: cuda
TorToiSe initialized, ready for generation.
Loading Whisper model: base
Transcribing file: ./voices\h\hapi01.wav
Transcribed file: ./voices\h\hapi01.wav, 5 found.
Transcribing file: ./voices\h\hapi02.wav
Transcribed file: ./voices\h\hapi02.wav, 7 found.
Transcribing file: ./voices\h\hapi03.wav
Transcribed file: ./voices\h\hapi03.wav, 4 found.
Batch size is larger than your dataset, clamping...
Unloading TTS to save VRAM.
Spawning process:  train.bat ./training/h/train.yaml

Another thing to note is that turning on voice fixer also gives the error shown above.

I managed to prepare the dataset and generate configuration. However, I have been stuck at this step for about an hour now: ``` C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Initializating voice-fixer Error occurred while tring to initialize voicefixer: PytorchStreamReader failed reading zip archive: failed finding central directory Initializating TorToiSe... (using model: None) Hardware acceleration found: cuda TorToiSe initialized, ready for generation. Loading Whisper model: base Transcribing file: ./voices\h\hapi01.wav Transcribed file: ./voices\h\hapi01.wav, 5 found. Transcribing file: ./voices\h\hapi02.wav Transcribed file: ./voices\h\hapi02.wav, 7 found. Transcribing file: ./voices\h\hapi03.wav Transcribed file: ./voices\h\hapi03.wav, 4 found. Batch size is larger than your dataset, clamping... Unloading TTS to save VRAM. Spawning process: train.bat ./training/h/train.yaml ``` Another thing to note is that turning on voice fixer also gives the error shown above.

mrq commented

2023-02-19 06:27:51 +00:00

Right, I forgot to have it still print to console if Verbose Output or whatever is unchecked. Restart the UI, and run training with Verbose Output checked, and see what it's getting hung up on. ~~I'll push a quick commit to print all output anyways~~ Commit 485319c2bb25e291868f385180e7bce0db4aa6fe will restore it to print to console regardless of setting.

As for the voicefixer thing, as per the documentation, the download was interrupted and for some reason it's not smart enough to restart downloading on its own. Open %USERPROFILE%\.cache\ and delete voicefixer.

Right, I forgot to have it still print to console if `Verbose Output` or whatever is unchecked. Restart the UI, and run training with `Verbose Output` checked, and see what it's getting hung up on. ~~I'll push a quick commit to print all output anyways~~ Commit 485319c2bb25e291868f385180e7bce0db4aa6fe will restore it to print to console regardless of setting. As for the voicefixer thing, as per the [documentation](https://git.ecker.tech/mrq/ai-voice-cloning/wiki/Issues#failed-reading-zip-archive-failed-finding-central-directory), the download was interrupted and for some reason it's not smart enough to restart downloading on its own. Open `%USERPROFILE%\.cache\` and delete `voicefixer`.

ThrowawayAccount01 commented

2023-02-19 06:40:39 +00:00

~~I deleted the voicefixer folder and upon restarting, it is stuck on this step for about 10 minutes now:~~

~~I deleted the voicefixer folder and upon restarting, it is stuck on this step for about 10 minutes now:~~

ThrowawayAccount01 commented

2023-02-19 06:56:51 +00:00

Ignore what I wrote above, I restarted it and this time it downloaded successfully.

I proceeded to run the training with Verbose Output checked. This is the output it is stuck on:

C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Initializating voice-fixer
initialized voice-fixer
Initializating TorToiSe... (using model: None)
Hardware acceleration found: cuda
TorToiSe initialized, ready for generation.
Loading Whisper model: base
Transcribing file: ./voices\h\hapi01.wav
Transcribed file: ./voices\h\hapi01.wav, 5 found.
Transcribing file: ./voices\h\hapi02.wav
Transcribed file: ./voices\h\hapi02.wav, 9 found.
Transcribing file: ./voices\h\hapi03.wav
Transcribed file: ./voices\h\hapi03.wav, 4 found.
Batch size is larger than your dataset, clamping...
Unloading TTS to save VRAM.
Spawning process:  train.bat ./training/h/train.yaml

[snip]

[Training] [2023-02-19T14:53:00.989793] RuntimeError: DataLoader worker (pid(s) 10424, 14060, 14396, 16924) exited unexpectedly

Ignore what I wrote above, I restarted it and this time it downloaded successfully. I proceeded to run the training with Verbose Output checked. This is the output it is stuck on: ``` C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Initializating voice-fixer initialized voice-fixer Initializating TorToiSe... (using model: None) Hardware acceleration found: cuda TorToiSe initialized, ready for generation. Loading Whisper model: base Transcribing file: ./voices\h\hapi01.wav Transcribed file: ./voices\h\hapi01.wav, 5 found. Transcribing file: ./voices\h\hapi02.wav Transcribed file: ./voices\h\hapi02.wav, 9 found. Transcribing file: ./voices\h\hapi03.wav Transcribed file: ./voices\h\hapi03.wav, 4 found. Batch size is larger than your dataset, clamping... Unloading TTS to save VRAM. Spawning process: train.bat ./training/h/train.yaml [snip] [Training] [2023-02-19T14:53:00.989793] RuntimeError: DataLoader worker (pid(s) 10424, 14060, 14396, 16924) exited unexpectedly ```

mrq commented

2023-02-19 07:04:37 +00:00

OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\lib\torch_python.dll" or one of its dependencies.

I'll assume you're OOMing (system RAM not VRAM).

Outside of the obvious of closing out processes, I'll suggest checking Defer TTS Load in the web UI and restarting the UI, just to make sure nothing TTS gets loaded during training.

> OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\lib\torch_python.dll" or one of its dependencies. I'll assume you're OOMing (system RAM not VRAM). Outside of the obvious of closing out processes, I'll suggest checking `Defer TTS Load` in the web UI and restarting the UI, just to make sure nothing TTS gets loaded during training.

ThrowawayAccount01 commented

2023-02-19 07:38:30 +00:00

I have closed most processes and checked Defer TTS Load and tried to run it again. I am stuck on this error:

C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Unloading TTS to save VRAM.
Spawning process:  train.bat ./training/h/train.yaml

[snip]

[Training] [2023-02-19T15:33:53.651694] RuntimeError: [enforce fail at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 16777216 bytes.

I have closed most processes and checked Defer TTS Load and tried to run it again. I am stuck on this error: ``` C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Unloading TTS to save VRAM. Spawning process: train.bat ./training/h/train.yaml [snip] [Training] [2023-02-19T15:33:53.651694] RuntimeError: [enforce fail at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 16777216 bytes. ```

mrq commented

2023-02-20 15:48:45 +00:00

Given the revamped configuration generation process and (attempting to) unload all models on training, you might be able to get it working again if you lower the Mega Batch Factor or whatever setting down to 1.

Given the revamped configuration generation process and (attempting to) unload all models on training, you *might* be able to get it working again if you lower the `Mega Batch Factor` or whatever setting down to 1.

ThrowawayAccount01 commented

2023-02-21 14:07:57 +00:00

Still no dice I'm afraid. It keeps giving me Pytorch/Memory OOM errors no matter how low I tweak the settings. For now I'll have to use colab to train I suppose.

mrq commented

2023-02-21 16:18:55 +00:00

I guess even 3080s can't train it, as there's another user that can't train off a 3080 either.

I'll cobble together a way to load and finetune it as float16 and see if that gets VRAM consumption down, maybe. I only worry about the performance/quality problems from it, but I suppose it's better than nothing.

I guess even 3080s can't train it, as there's another user that can't train off a 3080 either. I'll cobble together a way to load and finetune it as float16 and see if that gets VRAM consumption down, maybe. I only worry about the performance/quality problems from it, but I suppose it's better than nothing.

mrq commented

2023-02-21 20:18:08 +00:00

As I mentioned in #17 (comment), I added an experimental way to train fully at half-precision that will convert the original model to one at float16. It fails with 8GiB a few steps in training, but peaks at 52% VRAM usage on a machine with 16GiB of VRAM, so it might work on a 3080. You're welcome to try, but I have zero guarantees it'll work, or produce usable output yet, as I still need to actually finetune a model at half precision.

As I mentioned in https://git.ecker.tech/mrq/ai-voice-cloning/issues/17#issuecomment-300_, I added an experimental way to train fully at half-precision that will convert the original model to one at float16. It fails with 8GiB a few steps in training, but peaks at 52% VRAM usage on a machine with 16GiB of VRAM, so it might work on a 3080. You're welcome to try, but I have zero guarantees it'll work, or produce usable output yet, as I still need to actually finetune a model at half precision.

ThrowawayAccount01 commented

2023-02-22 03:16:58 +00:00

I tried to generate configuration with half-precision enabled, but it gives me this error:

C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 384, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1024, in process_api
    result = await self.call_function(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 836, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\webui.py", line 220, in save_training_settings_proxy
    messages.append(save_training_settings(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\utils.py", line 695, in save_training_settings
    if not os.path.exists(get_halfp_model()):
NameError: name 'get_halfp_model' is not defined

I tried to generate configuration with half-precision enabled, but it gives me this error: ``` C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Traceback (most recent call last): File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 384, in run_predict output = await app.get_blocks().process_api( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1024, in process_api result = await self.call_function( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 836, in call_function prediction = await anyio.to_thread.run_sync( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run result = context.run(func, *args) File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\webui.py", line 220, in save_training_settings_proxy messages.append(save_training_settings( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\utils.py", line 695, in save_training_settings if not os.path.exists(get_halfp_model()): NameError: name 'get_halfp_model' is not defined ```

mrq commented

2023-02-22 03:21:31 +00:00

Oops, don't know how I managed to get it working for me, fixed in commit 93b061fb4d85fb4c26bf88cf47de879a7ab187ad.

ThrowawayAccount01 commented

2023-02-22 03:26:24 +00:00

Updated and upon running start.bat I get this error:

C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Traceback (most recent call last):
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\main.py", line 19, in <module>
    webui = setup_gradio()
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\webui.py", line 339, in setup_gradio
    history_voices = gr.Dropdown(choices=result_voices, label="Voice", type="value", value=result_voices[0] if len(results_voices) > 0 else "")
NameError: name 'results_voices' is not defined

Updated and upon running start.bat I get this error: ``` C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Traceback (most recent call last): File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\main.py", line 19, in <module> webui = setup_gradio() File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\webui.py", line 339, in setup_gradio history_voices = gr.Dropdown(choices=result_voices, label="Voice", type="value", value=result_voices[0] if len(results_voices) > 0 else "") NameError: name 'results_voices' is not defined ```

mrq commented

2023-02-22 03:27:53 +00:00

Fixed, I had it right, but I copied over the typo fix.

ThrowawayAccount01 commented

2023-02-22 08:18:38 +00:00

I still get an error when trying to save the training configuration. I think it's still the same one from before?

C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 384, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1024, in process_api
    result = await self.call_function(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 836, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\webui.py", line 220, in save_training_settings_proxy
    messages.append(save_training_settings(
  File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\utils.py", line 695, in save_training_settings
    if not os.path.exists(get_halfp_model()):
NameError: name 'get_halfp_model' is not defined

I still get an error when trying to save the training configuration. I think it's still the same one from before? ``` C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Traceback (most recent call last): File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 384, in run_predict output = await app.get_blocks().process_api( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1024, in process_api result = await self.call_function( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 836, in call_function prediction = await anyio.to_thread.run_sync( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run result = context.run(func, *args) File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\webui.py", line 220, in save_training_settings_proxy messages.append(save_training_settings( File "C:\Users\LXC PC\Desktop\mrqtts\ai-voice-cloning\src\utils.py", line 695, in save_training_settings if not os.path.exists(get_halfp_model()): NameError: name 'get_halfp_model' is not defined ```

mrq commented

2023-02-22 13:24:30 +00:00

For sure really fixed in 526a430c2adf2287d22697b8e2800128e7a42060 or I'm blowing my brains out. I don't know how it reverted.

ThrowawayAccount01 commented

2023-02-22 14:18:09 +00:00

I tried running a very small dataset of 16 clips on half precision. Unfortunately it still gives OOM Errors.

C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Spawning process:  train.bat ./training/h/train.yaml

[snip]

[Training] [2023-02-22T22:06:16.263674] 23-02-22 22:06:16.262 - INFO: [epoch:  0, iter:       0, lr:(1.000e-05,1.000e-05,)] step: 0.0000e+00 samples: 1.7000e+01 megasamples: 1.7000e-05 iteration_rate: 3.9558e-01 loss_text_ce: 4.1261e+00 loss_mel_ce: 2.8033e+00 loss_gpt_total: 2.8446e+00 grad_scaler_scale: 6.5536e+04 learning_rate_gpt_0: 1.0000e-05 learning_rate_gpt_1: 1.0000e-05 total_samples_loaded: 1.7000e+01 percent_skipped_samples: 1.0526e-01 percent_conditioning_is_self: 8.9474e-01 gpt_conditioning_encoder: 1.5581e+00 gpt_gpt: 2.1085e+00 gpt_heads: 5.0641e-01
[Training] [2023-02-22T22:06:16.263674] 23-02-22 22:06:16.262 - INFO: Saving models and training states.
[Training] [2023-02-22T22:06:23.283171]
[Training] [2023-02-22T22:06:23.866625] 100%|##########| 1/1 [00:21<00:00, 21.42s/it]
[Training] [2023-02-22T22:06:23.866625] 100%|##########| 1/1 [00:22<00:00, 22.00s/it]
[Training] [2023-02-22T22:06:23.866625]
[Training] [2023-02-22T22:06:32.493621]   0%|          | 0/1 [00:00<?, ?it/s]C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\models\audio\tts\tacotron2\taco_utils.py:17: WavFileWarning: Chunk (non-data) not understood, skipping it.
[Training] [2023-02-22T22:06:32.493621]   sampling_rate, data = read(full_path)
[Training] [2023-02-22T22:06:38.031323]
[Training] [2023-02-22T22:06:38.031323]   0%|          | 0/1 [00:14<?, ?it/s]
[Training] [2023-02-22T22:06:38.031323] Traceback (most recent call last):
[Training] [2023-02-22T22:06:38.031323]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\train.py", line 62, in <module>
[Training] [2023-02-22T22:06:38.031323]     train(args.opt, args.launcher)
[Training] [2023-02-22T22:06:38.031323]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\train.py", line 53, in train
[Training] [2023-02-22T22:06:38.031323]     trainer.do_training()
[Training] [2023-02-22T22:06:38.031323]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas\codes\train.py", line 330, in do_training
[Training] [2023-02-22T22:06:38.032827]     self.do_step(train_data)
[Training] [2023-02-22T22:06:38.032827]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas\codes\train.py", line 211, in do_step
[Training] [2023-02-22T22:06:38.032827]     gradient_norms_dict = self.model.optimize_parameters(self.current_step, return_grad_norms=will_log)
[Training] [2023-02-22T22:06:38.032827]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\trainer\ExtensibleTrainer.py", line 302, in optimize_parameters
[Training] [2023-02-22T22:06:38.032827]     ns = step.do_forward_backward(state, m, step_num, train=train_step, no_ddp_sync=(m+1 < self.batch_factor))
[Training] [2023-02-22T22:06:38.032827]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\trainer\steps.py", line 246, in do_forward_backward
[Training] [2023-02-22T22:06:38.033832]     injected = inj(local_state)
[Training] [2023-02-22T22:06:38.033832]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
[Training] [2023-02-22T22:06:38.033832]     return forward_call(*input, **kwargs)
[Training] [2023-02-22T22:06:38.033832]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\trainer\injectors\base_injectors.py", line 93, in forward
[Training] [2023-02-22T22:06:38.046344]     results = method(*params, **self.args)
[Training] [2023-02-22T22:06:38.046344]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
[Training] [2023-02-22T22:06:38.046344]     return forward_call(*input, **kwargs)
[Training] [2023-02-22T22:06:38.046344]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\parallel\data_parallel.py", line 169, in forward
[Training] [2023-02-22T22:06:38.052852]     return self.module(*inputs[0], **kwargs[0])
[Training] [2023-02-22T22:06:38.052852]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
[Training] [2023-02-22T22:06:38.052852]     return forward_call(*input, **kwargs)
[Training] [2023-02-22T22:06:38.052852]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\models\audio\tts\unified_voice2.py", line 425, in forward
[Training] [2023-02-22T22:06:38.058357]     loss_mel = F.cross_entropy(mel_logits, mel_targets.long())
[Training] [2023-02-22T22:06:38.058357]   File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\functional.py", line 3026, in cross_entropy
[Training] [2023-02-22T22:06:38.059362]     return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
[Training] [2023-02-22T22:06:38.059362] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 10.00 GiB total capacity; 8.85 GiB already allocated; 0 bytes free; 9.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried running a very small dataset of 16 clips on half precision. Unfortunately it still gives OOM Errors. ``` C:\Users\User\Desktop\mrqtts\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Spawning process: train.bat ./training/h/train.yaml [snip] [Training] [2023-02-22T22:06:16.263674] 23-02-22 22:06:16.262 - INFO: [epoch: 0, iter: 0, lr:(1.000e-05,1.000e-05,)] step: 0.0000e+00 samples: 1.7000e+01 megasamples: 1.7000e-05 iteration_rate: 3.9558e-01 loss_text_ce: 4.1261e+00 loss_mel_ce: 2.8033e+00 loss_gpt_total: 2.8446e+00 grad_scaler_scale: 6.5536e+04 learning_rate_gpt_0: 1.0000e-05 learning_rate_gpt_1: 1.0000e-05 total_samples_loaded: 1.7000e+01 percent_skipped_samples: 1.0526e-01 percent_conditioning_is_self: 8.9474e-01 gpt_conditioning_encoder: 1.5581e+00 gpt_gpt: 2.1085e+00 gpt_heads: 5.0641e-01 [Training] [2023-02-22T22:06:16.263674] 23-02-22 22:06:16.262 - INFO: Saving models and training states. [Training] [2023-02-22T22:06:23.283171] [Training] [2023-02-22T22:06:23.866625] 100%|##########| 1/1 [00:21<00:00, 21.42s/it] [Training] [2023-02-22T22:06:23.866625] 100%|##########| 1/1 [00:22<00:00, 22.00s/it] [Training] [2023-02-22T22:06:23.866625] [Training] [2023-02-22T22:06:32.493621] 0%| | 0/1 [00:00<?, ?it/s]C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\models\audio\tts\tacotron2\taco_utils.py:17: WavFileWarning: Chunk (non-data) not understood, skipping it. [Training] [2023-02-22T22:06:32.493621] sampling_rate, data = read(full_path) [Training] [2023-02-22T22:06:38.031323] [Training] [2023-02-22T22:06:38.031323] 0%| | 0/1 [00:14<?, ?it/s] [Training] [2023-02-22T22:06:38.031323] Traceback (most recent call last): [Training] [2023-02-22T22:06:38.031323] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\train.py", line 62, in <module> [Training] [2023-02-22T22:06:38.031323] train(args.opt, args.launcher) [Training] [2023-02-22T22:06:38.031323] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\src\train.py", line 53, in train [Training] [2023-02-22T22:06:38.031323] trainer.do_training() [Training] [2023-02-22T22:06:38.031323] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas\codes\train.py", line 330, in do_training [Training] [2023-02-22T22:06:38.032827] self.do_step(train_data) [Training] [2023-02-22T22:06:38.032827] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas\codes\train.py", line 211, in do_step [Training] [2023-02-22T22:06:38.032827] gradient_norms_dict = self.model.optimize_parameters(self.current_step, return_grad_norms=will_log) [Training] [2023-02-22T22:06:38.032827] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\trainer\ExtensibleTrainer.py", line 302, in optimize_parameters [Training] [2023-02-22T22:06:38.032827] ns = step.do_forward_backward(state, m, step_num, train=train_step, no_ddp_sync=(m+1 < self.batch_factor)) [Training] [2023-02-22T22:06:38.032827] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\trainer\steps.py", line 246, in do_forward_backward [Training] [2023-02-22T22:06:38.033832] injected = inj(local_state) [Training] [2023-02-22T22:06:38.033832] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl [Training] [2023-02-22T22:06:38.033832] return forward_call(*input, **kwargs) [Training] [2023-02-22T22:06:38.033832] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\trainer\injectors\base_injectors.py", line 93, in forward [Training] [2023-02-22T22:06:38.046344] results = method(*params, **self.args) [Training] [2023-02-22T22:06:38.046344] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl [Training] [2023-02-22T22:06:38.046344] return forward_call(*input, **kwargs) [Training] [2023-02-22T22:06:38.046344] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\parallel\data_parallel.py", line 169, in forward [Training] [2023-02-22T22:06:38.052852] return self.module(*inputs[0], **kwargs[0]) [Training] [2023-02-22T22:06:38.052852] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl [Training] [2023-02-22T22:06:38.052852] return forward_call(*input, **kwargs) [Training] [2023-02-22T22:06:38.052852] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\./dlas/codes\models\audio\tts\unified_voice2.py", line 425, in forward [Training] [2023-02-22T22:06:38.058357] loss_mel = F.cross_entropy(mel_logits, mel_targets.long()) [Training] [2023-02-22T22:06:38.058357] File "C:\Users\User\Desktop\mrqtts\ai-voice-cloning\venv\lib\site-packages\torch\nn\functional.py", line 3026, in cross_entropy [Training] [2023-02-22T22:06:38.059362] return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing) [Training] [2023-02-22T22:06:38.059362] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 10.00 GiB total capacity; 8.85 GiB already allocated; 0 bytes free; 9.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ```

mrq commented

2023-02-22 15:18:52 +00:00

It seems to at least be training somewhat (albeit slowly with that 20s/it) before OOMing, try lowering the batch size to 2.
If not, then use this copium in place of your train.bat:

call .\venv\Scripts\activate.bat
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:64
python ./src/train.py -opt "%1"
deactivate
pause

And if it absolutely will not let you after trying each of those, then I suppose I'll need to find some more VRAM savings somewhere else, like training out a model with smaller network parameters.

It seems to at least be training somewhat (albeit slowly with that 20s/it) before OOMing, try lowering the batch size to 2. If not, then use this copium in place of your `train.bat`: ``` call .\venv\Scripts\activate.bat set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:64 python ./src/train.py -opt "%1" deactivate pause ``` And if it absolutely will not let you after trying each of those, then I suppose I'll need to find some more VRAM savings somewhere else, like training out a model with smaller network parameters.

mrq referenced this issue

2023-02-23 00:15:56 +00:00

3080 running out of memory trying to train 10MB of voice files #17

mrq commented

2023-02-23 06:26:15 +00:00

I've had a breakthrough with being able to train even on my 2060.

Refer to #25.

I've had a breakthrough with being able to train even on my 2060. Refer to https://git.ecker.tech/mrq/ai-voice-cloning/issues/25.

mrq closed this issue

2023-02-23 06:26:15 +00:00

Sign in to join this conversation.