Why does my AMD GPU eat up too much vram? #395
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#395
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I'm using a RX 7800 XT. I installed rocm with rocm-hip-sdk and rocm-opencl-sdk on my artix linux distro. I added the line export HSA_OVERRIDE_GFX_VERSION=10.3.0 to my .bashrc file. I installed pytorch 2.1.0 and rocm 5.5 (pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.5). I get this error message:
HIP out of memory. Tried to allocate 183.50 GiB. GPU 0 has a total capacty of 15.98 GiB of which 9.90 GiB is free. Of the allocated memory 5.49 GiB is allocated by PyTorch, and 159.47 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF
Well I don't got a good stack trace to work with. I need to know where exactly it's OOMing before evaluating it.
However, I will preface that a 7800XT isn't guaranteed to have ROCm support at the moment, especially with rocm5.5, given that it released before the 7800XT. You're also trying to set it to be treated as a gfx1030 (6800XT) which is going to have issues due to differences in the ISAs at minimum. Even trying to set it to 11.0.0 (gfx1100, a 7900XTX) isn't guaranteed to work either because they're technically not the same die type.