bitsandbytes-rocm/csrc
arlo-phoenix 0b481bfcc2 Use workaround for ROCm wave32 recognition
just sets __AMDGCN_WAVEFRONT_SIZE forcefully to 32.
Not correct (some GPU's don't support wave32), but works
on the supported GPU's. Can disable with DISABLE_WARP_32

With this blockwise quantize works and with that nf4 is supported.
2023-08-08 18:50:26 +00:00
..
common.cpp Fixed 2^31 max size issue for cpu blockwise quant. 2022-09-11 11:55:09 -07:00
common.h Fixed 2^31 max size issue for cpu blockwise quant. 2022-09-11 11:55:09 -07:00
cpu_ops.cpp Remove trailing whitespace & ensure newline at EOF 2022-10-27 13:11:29 +02:00
cpu_ops.h Fixed 2^31 max size issue for cpu blockwise quant. 2022-09-11 11:55:09 -07:00
kernels.cu Use workaround for ROCm wave32 recognition 2023-08-08 18:50:26 +00:00
kernels.cuh Added fp32 compute type for gemv_4bit. 2023-07-09 21:06:01 -07:00
ops.cu Use workaround for ROCm wave32 recognition 2023-08-08 18:50:26 +00:00
ops.cuh Use workaround for ROCm wave32 recognition 2023-08-08 18:50:26 +00:00
pythonInterface.c Add HIP to cuda defines 2023-08-05 02:11:46 +02:00