Tim Dettmers
|
0afc8e9e2f
|
Best attempt at cutlass3.
|
2023-04-26 17:12:34 -07:00 |
|
Tim Dettmers
|
84964db937
|
CUTLASS compiles.
|
2023-04-25 17:15:51 -07:00 |
|
Tim Dettmers
|
6bfd7a405f
|
Initial template.
|
2023-04-25 16:13:43 -07:00 |
|
Tim Dettmers
|
c4cfe4fbdd
|
Added bf16 Adam.
|
2023-04-01 10:33:03 -07:00 |
|
Tim Dettmers
|
de53588934
|
Added Int8 matmul support for all GPUs. Full backward support.
|
2023-02-01 20:09:31 -08:00 |
|
Tim Dettmers
|
3901ebf7ae
|
Added CUDA 12.0 support; removed CC 3.0 support.
|
2023-01-04 02:28:33 -08:00 |
|
Tom Aarsen
|
1eec77d34c
|
Remove trailing whitespace & ensure newline at EOF
|
2022-10-27 13:11:29 +02:00 |
|
Tim Dettmers
|
758c7175a2
|
Merge branch 'debug' into cuda-bin-switch-and-cli
|
2022-08-04 08:03:00 -07:00 |
|
Tim Dettmers
|
2f01865a2f
|
Added CUDA block assert and is_on_gpu check.
|
2022-08-03 09:05:37 -07:00 |
|
Tim Dettmers
|
4a6ea7e24b
|
Added adjusted build file.
|
2022-07-31 20:59:34 -07:00 |
|
Tim Dettmers
|
28d1e7dc01
|
Initial build script changes (untested on PyPi).
|
2022-07-31 19:41:56 -07:00 |
|
Tim Dettmers
|
a409213656
|
Fixed make default to compile with cublaslt.
|
2022-07-26 19:38:17 -07:00 |
|
Tim Dettmers
|
f2dd703251
|
Added matmul build and flags.
|
2022-07-25 22:34:14 -07:00 |
|
Tim Dettmers
|
9268dc9d88
|
Some progress on build script; added multi-cuda install script.
|
2022-07-25 19:30:37 -07:00 |
|
Tim Dettmers
|
1e88edd8c0
|
Removed rowscale (segfaults on ampere).
|
2022-07-25 17:27:57 -07:00 |
|
Tim Dettmers
|
8b1fd32e3e
|
Fixed makefile; fixed Ampere igemmlt_8 bug.
|
2022-07-25 14:02:14 -07:00 |
|
Max Ryabinin
|
8258b4364a
|
Add a CPU-only build option
|
2022-07-01 17:16:10 +03:00 |
|
Tim Dettmers
|
4e60e7dc62
|
Fixed makefile compute capabilities.
|
2021-11-29 09:54:19 -08:00 |
|
Tim Dettmers
|
b3fe8a6d0f
|
Upgraded to -std=c++14; printing gpp version. #12
|
2021-11-28 21:31:03 -08:00 |
|
Tim Dettmers
|
2f8083bd8b
|
Added AdamW. #10 #13
|
2021-11-28 21:18:11 -08:00 |
|
Tim Dettmers
|
c1ed5d39b9
|
Fixed compilation flag for CUDA 11.0.
|
2021-10-21 22:30:55 -07:00 |
|
Tim Dettmers
|
0fb378b4ee
|
Added compilation from source instructions; easier compilation.
|
2021-10-21 17:22:43 -07:00 |
|
Tim Dettmers
|
a6eae2e7f2
|
Added skip_zeros; tests are passing.
|
2021-10-20 19:15:47 -07:00 |
|
Tim Dettmers
|
8400b58cbb
|
Added Kepler and fixed V100+CUDA101 support. #4 #5
|
2021-10-17 21:21:39 -07:00 |
|
Tim Dettmers
|
7923c4a066
|
Changed from testpypi to pypi. Release 0.0.24
|
2021-10-07 08:39:38 -07:00 |
|
Tim Dettmers
|
7439924891
|
Initial commit
|
2021-10-05 19:16:20 -07:00 |
|