3080 running out of memory trying to train 10MB of voice files #17
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#17
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
(I've uploaded the full stack trace as a text file just to save space here.)
I've been trying to train a voice and keep running out of memory on my 10GB 3080:
-The original voices are 9 files totalling 5MB put together.
-After the dataset is prepared they become 20 split files totalling 10MB.
-After I start training no matter the configuration it gets to the same point and fails each time (I'm not experienced in this so I might be getting some settings wrong, mostly default + validated before saving but I have also tried pushing the batch size down as low as 2 and it still hasn't worked)
Hoping its not that my card is too weak, but I have a feeling a 3080 should be able to train at least a small amount of voices?
Yeah, that doesn't sound right.
Do you already haveDo Not Load TTS On Startup
checked under Settings? It should guarantee that TorToiSe does not load at all on startup, since it seems rather pernicious with staying in memory despite trying my best to get it to unload.At worst, you can always just try training from the command line with the first line it prints out:.\train.bat ./training/Bang_Shishigami/train.yaml
.If that fails, I suppose the absolute last thing to do is changing both yourbatch_size
to 2 andmega_batch_factor
to 2 (or 1) to squeeze out as much as you can.And if that fails, then I guess I'll have to dig into DLAS and figure out how to get some VRAM savings. A quick and dirty idea in mind is loading the model as float16, since it should definitely cut down on VRAM usage, but with some caveats that I haven't quite explored yet.Seems it can't be trained on a 3080 either as another user mentioned.
I'll have to convert the base model to float16 and see how that fares on VRAM starved cards (ironic). There's training at half precision, but just flipping that on doesn't reap much VRAM back on its own.
I've added/exposed a very experimental training setting:
Half Precision
in commit8a1a48f31e
. It'll convert the original training model to float16 andenablehint at training at half precision.I've tested this on a machine with 16GiB of VRAM, but I don't have access to one at 10GiB of VRAM to validate. It fails on a machine with 8GiB a three steps into training, but also peaks at 52% VRAM utilization on the absolute lowest settings on a machine with 16GiB of VRAM, so it might work on a 3080.
You're welcome to try it, but I have zero guarantees in it being usable (I honestly haven't even tested generating with the default model converted to float16 yet).
I tried to run training but got this error:
I'm guessing you're missing FFMPEG? Grab a copy from https://ffmpeg.org/download.html#build-windows and plop it either in:
.\ai-voice-cloning\
.\ai-voice-cloning\bin\
<= should be here.\ai-voice-cloning\venv\Scripts\
I'm not expecting it to work, since #6 didn't seem to have any luck on his 3080.
I've had a breakthrough with being able to train even on my 2060.
Refer to #25.