bitsandbytes-rocm/csrc
arlo-phoenix 0b481bfcc2 Use workaround for ROCm wave32 recognition
just sets __AMDGCN_WAVEFRONT_SIZE forcefully to 32.
Not correct (some GPU's don't support wave32), but works
on the supported GPU's. Can disable with DISABLE_WARP_32

With this blockwise quantize works and with that nf4 is supported.
2023-08-08 18:50:26 +00:00
..
common.cpp
common.h
cpu_ops.cpp
cpu_ops.h
kernels.cu Use workaround for ROCm wave32 recognition 2023-08-08 18:50:26 +00:00
kernels.cuh
ops.cu Use workaround for ROCm wave32 recognition 2023-08-08 18:50:26 +00:00
ops.cuh Use workaround for ROCm wave32 recognition 2023-08-08 18:50:26 +00:00
pythonInterface.c