3080 running out of memory trying to train 10MB of voice files #17

New Issue

AI_Pleb · 2023-02-21T05:06:58Z

AI_Pleb commented

2023-02-21 05:06:58 +00:00

(I've uploaded the full stack trace as a text file just to save space here.)

I've been trying to train a voice and keep running out of memory on my 10GB 3080:

-The original voices are 9 files totalling 5MB put together.

-After the dataset is prepared they become 20 split files totalling 10MB.

-After I start training no matter the configuration it gets to the same point and fails each time (I'm not experienced in this so I might be getting some settings wrong, mostly default + validated before saving but I have also tried pushing the batch size down as low as 2 and it still hasn't worked)

Hoping its not that my card is too weak, but I have a feeling a 3080 should be able to train at least a small amount of voices?

(I've uploaded the full stack trace as a text file just to save space here.) I've been trying to train a voice and keep running out of memory on my 10GB 3080: -The original voices are 9 files totalling 5MB put together. -After the dataset is prepared they become 20 split files totalling 10MB. -After I start training no matter the configuration it gets to the same point and fails each time (I'm not experienced in this so I might be getting some settings wrong, mostly default + validated before saving but I have also tried pushing the batch size down as low as 2 and it still hasn't worked) Hoping its not that my card is too weak, but I have a feeling a 3080 should be able to train at least a small amount of voices?

stack trace.txt

58 KiB

mrq commented

2023-02-21 13:36:26 +00:00

(GPU 0; 10.00 GiB total capacity; 5.88 GiB already allocated; 0 bytes free; 6.06 GiB reserved in total by PyTorch)

Yeah, that doesn't sound right.

Do you already have Do Not Load TTS On Startup checked under Settings? It should guarantee that TorToiSe does not load at all on startup, since it seems rather pernicious with staying in memory despite trying my best to get it to unload.

~~At worst, you can always just try training from the command line with the first line it prints out: .\train.bat ./training/Bang_Shishigami/train.yaml.~~

~~If that fails, I suppose the absolute last thing to do is changing both your batch_size to 2 and mega_batch_factor to 2 (or 1) to squeeze out as much as you can.~~

And if that fails, then I guess I'll have to dig into DLAS and figure out how to get some VRAM savings. A quick and dirty idea in mind is loading the model as float16, since it should definitely cut down on VRAM usage, but with some caveats that I haven't quite explored yet.

Seems it can't be trained on a 3080 either as another user mentioned.

I'll have to convert the base model to float16 and see how that fares on VRAM starved cards (ironic). There's training at half precision, but just flipping that on doesn't reap much VRAM back on its own.

> (GPU 0; 10.00 GiB total capacity; 5.88 GiB already allocated; 0 bytes free; 6.06 GiB reserved in total by PyTorch) Yeah, that doesn't sound right. ~~Do you already have `Do Not Load TTS On Startup` checked under Settings? It should guarantee that TorToiSe does not load at all on startup, since it seems rather pernicious with staying in memory despite trying my best to get it to unload.~~ ~~At worst, you can always just try training from the command line with the first line it prints out: `.\train.bat ./training/Bang_Shishigami/train.yaml`.~~ ~~If that fails, I suppose the absolute last thing to do is changing both your `batch_size` to 2 and `mega_batch_factor` to 2 (or 1) to squeeze out as much as you can.~~ ~~And if that fails, then I guess I'll have to dig into DLAS and figure out how to get some VRAM savings. A quick and dirty idea in mind is loading the model as float16, since it should definitely cut down on VRAM usage, but with some caveats that I haven't quite explored yet.~~ Seems it can't be trained on a 3080 either as another user mentioned. I'll have to convert the base model to float16 and see how that fares on VRAM starved cards (ironic). There's training at half precision, but just flipping that on doesn't reap much VRAM back on its own.

mrq commented

2023-02-21 20:16:04 +00:00

I've added/exposed a very experimental training setting: Half Precision in commit 8a1a48f31e. It'll convert the original training model to float16 and ~~enable~~ hint at training at half precision.

I've tested this on a machine with 16GiB of VRAM, but I don't have access to one at 10GiB of VRAM to validate. It fails on a machine with 8GiB a three steps into training, but also peaks at 52% VRAM utilization on the absolute lowest settings on a machine with 16GiB of VRAM, so it might work on a 3080.

You're welcome to try it, but I have zero guarantees in it being usable (I honestly haven't even tested generating with the default model converted to float16 yet).

I've added/exposed a very experimental training setting: `Half Precision` in commit 8a1a48f31e30957196213a193c9aab45f1c25520. It'll convert the original training model to float16 and ~~enable~~ hint at training at half precision. I've tested this on a machine with 16GiB of VRAM, but I don't have access to one at 10GiB of VRAM to validate. It fails on a machine with 8GiB a three steps into training, but also peaks at 52% VRAM utilization on the absolute lowest settings on a machine with 16GiB of VRAM, so it might work on a 3080. You're welcome to try it, but I have zero guarantees in it being usable (I honestly haven't even tested generating with the default model converted to float16 yet).

mrq referenced this issue

2023-02-21 20:18:08 +00:00

Error when running start.bat #6

n3ong referenced this issue

2023-02-22 02:44:04 +00:00

gr.Dropdown list index out of range when starting server #22

AI_Pleb commented

2023-02-23 00:09:42 +00:00

I've added/exposed a very experimental training setting: Half Precision in commit 8a1a48f31e. It'll convert the original training model to float16 and ~~enable~~ hint at training at half precision.

I've tested this on a machine with 16GiB of VRAM, but I don't have access to one at 10GiB of VRAM to validate. It fails on a machine with 8GiB a three steps into training, but also peaks at 52% VRAM utilization on the absolute lowest settings on a machine with 16GiB of VRAM, so it might work on a 3080.

You're welcome to try it, but I have zero guarantees in it being usable (I honestly haven't even tested generating with the default model converted to float16 yet).

I tried to run training but got this error:

Loading Whisper model: base
Transcribing file: ./voices\Bang_Shishigami_S\BNG_03_01_002.wav
Traceback (most recent call last):
  File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 384, in run_predict
    output = await app.get_blocks().process_api(
  File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1024, in process_api
    result = await self.call_function(
  File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 836, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\gradio\helpers.py", line 584, in tracked_fn
    response = fn(*args)
  File "I:\Tortoise_TTS\ai-voice-cloning\src\webui.py", line 182, in prepare_dataset_proxy
    return prepare_dataset( get_voices(load_latents=False)[voice], outdir=f"./training/{voice}/", language=language, progress=progress )
  File "I:\Tortoise_TTS\ai-voice-cloning\src\utils.py", line 570, in prepare_dataset
    result = whisper_model.transcribe(file, language=language if language else "English")
  File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\whisper\transcribe.py", line 85, in transcribe
    mel = log_mel_spectrogram(audio)
  File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\whisper\audio.py", line 111, in log_mel_spectrogram
    audio = load_audio(audio)
  File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\whisper\audio.py", line 42, in load_audio
    ffmpeg.input(file, threads=0)
  File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\ffmpeg\_run.py", line 313, in run
    process = run_async(
  File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\ffmpeg\_run.py", line 284, in run_async
    return subprocess.Popen(
  File "C:\Users\at-st\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\at-st\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 1420, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

> I've added/exposed a very experimental training setting: `Half Precision` in commit 8a1a48f31e30957196213a193c9aab45f1c25520. It'll convert the original training model to float16 and ~~enable~~ hint at training at half precision. > > I've tested this on a machine with 16GiB of VRAM, but I don't have access to one at 10GiB of VRAM to validate. It fails on a machine with 8GiB a three steps into training, but also peaks at 52% VRAM utilization on the absolute lowest settings on a machine with 16GiB of VRAM, so it might work on a 3080. > > You're welcome to try it, but I have zero guarantees in it being usable (I honestly haven't even tested generating with the default model converted to float16 yet). I tried to run training but got this error: ``` Loading Whisper model: base Transcribing file: ./voices\Bang_Shishigami_S\BNG_03_01_002.wav Traceback (most recent call last): File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 384, in run_predict output = await app.get_blocks().process_api( File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1024, in process_api result = await self.call_function( File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 836, in call_function prediction = await anyio.to_thread.run_sync( File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run result = context.run(func, *args) File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\gradio\helpers.py", line 584, in tracked_fn response = fn(*args) File "I:\Tortoise_TTS\ai-voice-cloning\src\webui.py", line 182, in prepare_dataset_proxy return prepare_dataset( get_voices(load_latents=False)[voice], outdir=f"./training/{voice}/", language=language, progress=progress ) File "I:\Tortoise_TTS\ai-voice-cloning\src\utils.py", line 570, in prepare_dataset result = whisper_model.transcribe(file, language=language if language else "English") File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\whisper\transcribe.py", line 85, in transcribe mel = log_mel_spectrogram(audio) File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\whisper\audio.py", line 111, in log_mel_spectrogram audio = load_audio(audio) File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\whisper\audio.py", line 42, in load_audio ffmpeg.input(file, threads=0) File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\ffmpeg\_run.py", line 313, in run process = run_async( File "I:\Tortoise_TTS\ai-voice-cloning\venv\lib\site-packages\ffmpeg\_run.py", line 284, in run_async return subprocess.Popen( File "C:\Users\at-st\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 951, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\at-st\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 1420, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified ```

mrq commented

2023-02-23 00:14:37 +00:00

I'm guessing you're missing FFMPEG? Grab a copy from https://ffmpeg.org/download.html#build-windows and plop it either in:

.\ai-voice-cloning\
.\ai-voice-cloning\bin\ <= should be here
.\ai-voice-cloning\venv\Scripts\

I'm not expecting it to work, since #6 didn't seem to have any luck on his 3080.

I'm guessing you're missing FFMPEG? Grab a copy from https://ffmpeg.org/download.html#build-windows and plop it either in: * `.\ai-voice-cloning\` * `.\ai-voice-cloning\bin\` <= should be here * `.\ai-voice-cloning\venv\Scripts\` I'm not expecting it to work, since https://git.ecker.tech/mrq/ai-voice-cloning/issues/6 didn't seem to have any luck on his 3080.

mrq commented

2023-02-23 06:26:29 +00:00

I've had a breakthrough with being able to train even on my 2060.

Refer to #25.

I've had a breakthrough with being able to train even on my 2060. Refer to https://git.ecker.tech/mrq/ai-voice-cloning/issues/25.

mrq closed this issue

2023-02-23 06:26:29 +00:00

Sign in to join this conversation.