Commit Graph

26 Commits

Author SHA1 Message Date
Tim Dettmers
0afc8e9e2f Best attempt at cutlass3. 2023-04-26 17:12:34 -07:00
Tim Dettmers
84964db937 CUTLASS compiles. 2023-04-25 17:15:51 -07:00
Tim Dettmers
6bfd7a405f Initial template. 2023-04-25 16:13:43 -07:00
Tim Dettmers
c4cfe4fbdd Added bf16 Adam. 2023-04-01 10:33:03 -07:00
Tim Dettmers
de53588934 Added Int8 matmul support for all GPUs. Full backward support. 2023-02-01 20:09:31 -08:00
Tim Dettmers
3901ebf7ae Added CUDA 12.0 support; removed CC 3.0 support. 2023-01-04 02:28:33 -08:00
Tom Aarsen
1eec77d34c Remove trailing whitespace & ensure newline at EOF 2022-10-27 13:11:29 +02:00
Tim Dettmers
758c7175a2 Merge branch 'debug' into cuda-bin-switch-and-cli 2022-08-04 08:03:00 -07:00
Tim Dettmers
2f01865a2f Added CUDA block assert and is_on_gpu check. 2022-08-03 09:05:37 -07:00
Tim Dettmers
4a6ea7e24b Added adjusted build file. 2022-07-31 20:59:34 -07:00
Tim Dettmers
28d1e7dc01 Initial build script changes (untested on PyPi). 2022-07-31 19:41:56 -07:00
Tim Dettmers
a409213656 Fixed make default to compile with cublaslt. 2022-07-26 19:38:17 -07:00
Tim Dettmers
f2dd703251 Added matmul build and flags. 2022-07-25 22:34:14 -07:00
Tim Dettmers
9268dc9d88 Some progress on build script; added multi-cuda install script. 2022-07-25 19:30:37 -07:00
Tim Dettmers
1e88edd8c0 Removed rowscale (segfaults on ampere). 2022-07-25 17:27:57 -07:00
Tim Dettmers
8b1fd32e3e Fixed makefile; fixed Ampere igemmlt_8 bug. 2022-07-25 14:02:14 -07:00
Max Ryabinin
8258b4364a Add a CPU-only build option 2022-07-01 17:16:10 +03:00
Tim Dettmers
4e60e7dc62 Fixed makefile compute capabilities. 2021-11-29 09:54:19 -08:00
Tim Dettmers
b3fe8a6d0f Upgraded to -std=c++14; printing gpp version. #12 2021-11-28 21:31:03 -08:00
Tim Dettmers
2f8083bd8b Added AdamW. #10 #13 2021-11-28 21:18:11 -08:00
Tim Dettmers
c1ed5d39b9 Fixed compilation flag for CUDA 11.0. 2021-10-21 22:30:55 -07:00
Tim Dettmers
0fb378b4ee Added compilation from source instructions; easier compilation. 2021-10-21 17:22:43 -07:00
Tim Dettmers
a6eae2e7f2 Added skip_zeros; tests are passing. 2021-10-20 19:15:47 -07:00
Tim Dettmers
8400b58cbb Added Kepler and fixed V100+CUDA101 support. #4 #5 2021-10-17 21:21:39 -07:00
Tim Dettmers
7923c4a066 Changed from testpypi to pypi. Release 0.0.24 2021-10-07 08:39:38 -07:00
Tim Dettmers
7439924891 Initial commit 2021-10-05 19:16:20 -07:00