Cuda OOM error when running start.bat #121

Open
opened 2023-03-12 09:12:01 +00:00 by Dulappy · 5 comments

I have a GTX 1650 Super (4GiB VRAM) and I seem to not be able to fully run start.bat with Hardware acceleration on.

Loading autoregressive model: D:\AI\ai-voice-cloning\models\tortoise\autoregressive.pth
Traceback (most recent call last):
  File "D:\AI\ai-voice-cloning\src\utils.py", line 1901, in load_tts
    tts = TextToSpeech(minor_optimizations=not args.low_vram, autoregressive_model_path=autoregressive_model, vocoder_model=args.vocoder_model)
  File "d:\ai\ai-voice-cloning\modules\tortoise-tts\tortoise\api.py", line 320, in __init__
    self.clvp = self.clvp.to(self.device)
  File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 989, in to
    return self._apply(convert)
  File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
    module._apply(fn)
  File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
    module._apply(fn)
  File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
    module._apply(fn)
  [Previous line repeated 5 more times]
  File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 664, in _apply
    param_applied = fn(param)
  File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 987, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.41 GiB already allocated; 0 bytes free; 3.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\AI\ai-voice-cloning\src\main.py", line 22, in <module>
    tts = setup_tortoise()
  File "D:\AI\ai-voice-cloning\src\utils.py", line 1903, in load_tts
    tts = TextToSpeech(minor_optimizations=not args.low_vram)
  File "d:\ai\ai-voice-cloning\modules\tortoise-tts\tortoise\api.py", line 295, in __init__
    self.load_autoregressive_model(autoregressive_model_path)
  File "d:\ai\ai-voice-cloning\modules\tortoise-tts\tortoise\api.py", line 344, in load_autoregressive_model
    self.autoregressive = self.autoregressive.to(self.device)
  File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 989, in to
    return self._apply(convert)
  File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
    module._apply(fn)
  File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
    module._apply(fn)
  File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 664, in _apply
    param_applied = fn(param)
  File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 987, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 4.00 GiB total capacity; 3.41 GiB already allocated; 0 bytes free; 3.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The issue seems to occur when running autoregressive.pth. I looked through the issues page on the wiki, but I couldn't find anything related to fixing OOM errors before gaining access to the WebUI.

I have a GTX 1650 Super (4GiB VRAM) and I seem to not be able to fully run start.bat with Hardware acceleration on. ``` Loading autoregressive model: D:\AI\ai-voice-cloning\models\tortoise\autoregressive.pth Traceback (most recent call last): File "D:\AI\ai-voice-cloning\src\utils.py", line 1901, in load_tts tts = TextToSpeech(minor_optimizations=not args.low_vram, autoregressive_model_path=autoregressive_model, vocoder_model=args.vocoder_model) File "d:\ai\ai-voice-cloning\modules\tortoise-tts\tortoise\api.py", line 320, in __init__ self.clvp = self.clvp.to(self.device) File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 989, in to return self._apply(convert) File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply module._apply(fn) File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply module._apply(fn) File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply module._apply(fn) [Previous line repeated 5 more times] File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 664, in _apply param_applied = fn(param) File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 987, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.41 GiB already allocated; 0 bytes free; 3.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF During handling of the above exception, another exception occurred: Traceback (most recent call last): File "D:\AI\ai-voice-cloning\src\main.py", line 22, in <module> tts = setup_tortoise() File "D:\AI\ai-voice-cloning\src\utils.py", line 1903, in load_tts tts = TextToSpeech(minor_optimizations=not args.low_vram) File "d:\ai\ai-voice-cloning\modules\tortoise-tts\tortoise\api.py", line 295, in __init__ self.load_autoregressive_model(autoregressive_model_path) File "d:\ai\ai-voice-cloning\modules\tortoise-tts\tortoise\api.py", line 344, in load_autoregressive_model self.autoregressive = self.autoregressive.to(self.device) File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 989, in to return self._apply(convert) File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply module._apply(fn) File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply module._apply(fn) File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply module._apply(fn) [Previous line repeated 1 more time] File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 664, in _apply param_applied = fn(param) File "D:\AI\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 987, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 4.00 GiB total capacity; 3.41 GiB already allocated; 0 bytes free; 3.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ``` The issue seems to occur when running autoregressive.pth. I looked through the issues page on the wiki, but I couldn't find anything related to fixing OOM errors **before** gaining access to the WebUI.
Contributor

Put this in ai-voice-cloning/config/exec.json

{
	"listen": null,
	"share": false,
	"low-vram": false,
	"check-for-updates": false,
	"models-from-local-only": false,
	"force-cpu-for-conditioning-latents": true,
	"defer-tts-load": true,
	"prune-nonfinal-outputs": true,
	"device-override": "",
	"sample-batch-size": 4,
	"embed-output-metadata": false,
	"latents-lean-and-mean": false,
	"voice-fixer": true,
	"voice-fixer-use-cuda": true,
	"concurrency-count": 4,
	"output-sample-rate": 44000,
	"autocalculate-voice-chunk-duration-size": 0,
	"output-volume": 1,
	"autoregressive-model": "/home/user/aivoice/models/tortoise/autoregressive.pth",
	"vocoder-model": "bigvgan_24khz_100band",
	"whisper-backend": "openai/whisper",
	"whisper-model": "large",
	"training-default-halfp": false,
	"training-default-bnb": true
}

This should let you load into the webui. Particularly "defer-tts-load": true, which should be default imo for exactly this type of thing.

Then you can try low vram, no voice fixer, force cpu for conditioning latents. I still see vram spike to ~5.6GB, so I think 6GB is minimum atm.

Perhaps some day we'll fix the memory leak which will allow us to completely unload autoregressive.pth from memory after we're done with it, which might make it fit in 4gb.

Put this in ai-voice-cloning/config/exec.json ``` { "listen": null, "share": false, "low-vram": false, "check-for-updates": false, "models-from-local-only": false, "force-cpu-for-conditioning-latents": true, "defer-tts-load": true, "prune-nonfinal-outputs": true, "device-override": "", "sample-batch-size": 4, "embed-output-metadata": false, "latents-lean-and-mean": false, "voice-fixer": true, "voice-fixer-use-cuda": true, "concurrency-count": 4, "output-sample-rate": 44000, "autocalculate-voice-chunk-duration-size": 0, "output-volume": 1, "autoregressive-model": "/home/user/aivoice/models/tortoise/autoregressive.pth", "vocoder-model": "bigvgan_24khz_100band", "whisper-backend": "openai/whisper", "whisper-model": "large", "training-default-halfp": false, "training-default-bnb": true } ``` This should let you load into the webui. Particularly "defer-tts-load": true, which should be default imo for exactly this type of thing. Then you can try low vram, no voice fixer, force cpu for conditioning latents. I still see vram spike to ~5.6GB, so I think 6GB is minimum atm. Perhaps some day we'll fix the memory leak which will allow us to completely unload autoregressive.pth from memory after we're done with it, which might make it fit in 4gb.
Author

Thank you! I managed to get into the WebUI and tried to generate a voice, though for some reason the only thing that was generated was some static noise.

Thank you! I managed to get into the WebUI and tried to generate a voice, though for some reason the only thing that was generated was some static noise.

I wanna give out helpful info if you happen to use a nvidia graphics card that has 6GB or more of VRAM and you set the batch size to low and you still get this error message
RuntimeError: CUDA out of memory.
Then use nvitop to monitor your gpu memory usage.

I wanna give out helpful info if you happen to use a nvidia graphics card that has 6GB or more of VRAM and you set the batch size to low and you still get this error message `RuntimeError: CUDA out of memory.` Then use nvitop to monitor your gpu memory usage.

Thank you @zim33 , it worked

Thank you @zim33 , it worked

I also just found out that there's a "Settings" tab in the web UI where you can change the sample batch size :)

I also just found out that there's a "Settings" tab in the web UI where you can change the sample batch size :)
Sign in to join this conversation.
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#121
No description provided.