Why does my AMD GPU eat up too much vram? #395

Closed
opened 2023-09-20 06:59:58 +00:00 by Bluebomber182 · 1 comment

I'm using a RX 7800 XT. I installed rocm with rocm-hip-sdk and rocm-opencl-sdk on my artix linux distro. I added the line export HSA_OVERRIDE_GFX_VERSION=10.3.0 to my .bashrc file. I installed pytorch 2.1.0 and rocm 5.5 (pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.5). I get this error message:

HIP out of memory. Tried to allocate 183.50 GiB. GPU 0 has a total capacty of 15.98 GiB of which 9.90 GiB is free. Of the allocated memory 5.49 GiB is allocated by PyTorch, and 159.47 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF

I'm using a RX 7800 XT. I installed rocm with rocm-hip-sdk and rocm-opencl-sdk on my artix linux distro. I added the line export HSA_OVERRIDE_GFX_VERSION=10.3.0 to my .bashrc file. I installed pytorch 2.1.0 and rocm 5.5 (pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.5). I get this error message: HIP out of memory. Tried to allocate 183.50 GiB. GPU 0 has a total capacty of 15.98 GiB of which 9.90 GiB is free. Of the allocated memory 5.49 GiB is allocated by PyTorch, and 159.47 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF
Owner

Well I don't got a good stack trace to work with. I need to know where exactly it's OOMing before evaluating it.

However, I will preface that a 7800XT isn't guaranteed to have ROCm support at the moment, especially with rocm5.5, given that it released before the 7800XT. You're also trying to set it to be treated as a gfx1030 (6800XT) which is going to have issues due to differences in the ISAs at minimum. Even trying to set it to 11.0.0 (gfx1100, a 7900XTX) isn't guaranteed to work either because they're technically not the same die type.

Well I don't got a good stack trace to work with. I need to know where exactly it's OOMing before evaluating it. However, I will preface that a 7800XT isn't guaranteed to have ROCm support at the moment, especially with rocm5.5, given that it released before the 7800XT. You're also trying to set it to be treated as a gfx1030 (6800XT) which is going to have issues due to differences in the ISAs at minimum. Even trying to set it to 11.0.0 (gfx1100, a 7900XTX) isn't guaranteed to work either because they're technically not the same die type.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#395
No description provided.