0b481bfcc2
just sets __AMDGCN_WAVEFRONT_SIZE forcefully to 32. Not correct (some GPU's don't support wave32), but works on the supported GPU's. Can disable with DISABLE_WARP_32 With this blockwise quantize works and with that nf4 is supported. |
||
---|---|---|
.. | ||
common.cpp | ||
common.h | ||
cpu_ops.cpp | ||
cpu_ops.h | ||
kernels.cu | ||
kernels.cuh | ||
ops.cu | ||
ops.cuh | ||
pythonInterface.c |