Commit Graph

37 Commits

Author SHA1 Message Date
arlo-phoenix
0b481bfcc2 Use workaround for ROCm wave32 recognition
just sets __AMDGCN_WAVEFRONT_SIZE forcefully to 32.
Not correct (some GPU's don't support wave32), but works
on the supported GPU's. Can disable with DISABLE_WARP_32

With this blockwise quantize works and with that nf4 is supported.
2023-08-08 18:50:26 +00:00
arlo-phoenix
705bc024d2 Makefile: Add make hip 2023-08-05 02:41:58 +02:00
Tim Dettmers
7b6cfe1738 Added H100 support for CUDA 11.8 precompiled binaries. 2023-07-13 21:16:23 -07:00
Tim Dettmers
73aa4e0a33 Fixed Makefile and added CUDA 12.2 install. 2023-07-10 06:34:04 -07:00
Tim Dettmers
5f492d437e Merge remote-tracking branch 'origin/inference' 2023-07-10 06:24:24 -07:00
Tim Dettmers
4395d68cf6 Release 0.39.1. 2023-06-19 19:40:41 -07:00
Tim Dettmers
b7f04e2a20 Added lookup table. 2023-05-30 20:07:05 -07:00
Tim Dettmers
ac5550a023 Added changes for deployment. 2023-05-30 19:06:59 -07:00
Tim Dettmers
0f40fa3f0a Bumped version. 2023-05-23 19:55:52 -07:00
Tim Dettmers
2bce175d15 Fixed Makefile. 2023-05-23 18:42:19 -07:00
Tim Dettmers
d1c4c20568 Added non-cutlass template. 2023-04-27 15:11:26 -07:00
Tim Dettmers
0afc8e9e2f Best attempt at cutlass3. 2023-04-26 17:12:34 -07:00
Tim Dettmers
84964db937 CUTLASS compiles. 2023-04-25 17:15:51 -07:00
Tim Dettmers
6bfd7a405f Initial template. 2023-04-25 16:13:43 -07:00
Tim Dettmers
c4cfe4fbdd Added bf16 Adam. 2023-04-01 10:33:03 -07:00
Tim Dettmers
de53588934 Added Int8 matmul support for all GPUs. Full backward support. 2023-02-01 20:09:31 -08:00
Tim Dettmers
3901ebf7ae Added CUDA 12.0 support; removed CC 3.0 support. 2023-01-04 02:28:33 -08:00
Tom Aarsen
1eec77d34c Remove trailing whitespace & ensure newline at EOF 2022-10-27 13:11:29 +02:00
Tim Dettmers
758c7175a2 Merge branch 'debug' into cuda-bin-switch-and-cli 2022-08-04 08:03:00 -07:00
Tim Dettmers
2f01865a2f Added CUDA block assert and is_on_gpu check. 2022-08-03 09:05:37 -07:00
Tim Dettmers
4a6ea7e24b Added adjusted build file. 2022-07-31 20:59:34 -07:00
Tim Dettmers
28d1e7dc01 Initial build script changes (untested on PyPi). 2022-07-31 19:41:56 -07:00
Tim Dettmers
a409213656 Fixed make default to compile with cublaslt. 2022-07-26 19:38:17 -07:00
Tim Dettmers
f2dd703251 Added matmul build and flags. 2022-07-25 22:34:14 -07:00
Tim Dettmers
9268dc9d88 Some progress on build script; added multi-cuda install script. 2022-07-25 19:30:37 -07:00
Tim Dettmers
1e88edd8c0 Removed rowscale (segfaults on ampere). 2022-07-25 17:27:57 -07:00
Tim Dettmers
8b1fd32e3e Fixed makefile; fixed Ampere igemmlt_8 bug. 2022-07-25 14:02:14 -07:00
Max Ryabinin
8258b4364a Add a CPU-only build option 2022-07-01 17:16:10 +03:00
Tim Dettmers
4e60e7dc62 Fixed makefile compute capabilities. 2021-11-29 09:54:19 -08:00
Tim Dettmers
b3fe8a6d0f Upgraded to -std=c++14; printing gpp version. #12 2021-11-28 21:31:03 -08:00
Tim Dettmers
2f8083bd8b Added AdamW. #10 #13 2021-11-28 21:18:11 -08:00
Tim Dettmers
c1ed5d39b9 Fixed compilation flag for CUDA 11.0. 2021-10-21 22:30:55 -07:00
Tim Dettmers
0fb378b4ee Added compilation from source instructions; easier compilation. 2021-10-21 17:22:43 -07:00
Tim Dettmers
a6eae2e7f2 Added skip_zeros; tests are passing. 2021-10-20 19:15:47 -07:00
Tim Dettmers
8400b58cbb Added Kepler and fixed V100+CUDA101 support. #4 #5 2021-10-17 21:21:39 -07:00
Tim Dettmers
7923c4a066 Changed from testpypi to pypi. Release 0.0.24 2021-10-07 08:39:38 -07:00
Tim Dettmers
7439924891 Initial commit 2021-10-05 19:16:20 -07:00