arlo-phoenix
0b481bfcc2
Use workaround for ROCm wave32 recognition
...
just sets __AMDGCN_WAVEFRONT_SIZE forcefully to 32.
Not correct (some GPU's don't support wave32), but works
on the supported GPU's. Can disable with DISABLE_WARP_32
With this blockwise quantize works and with that nf4 is supported.
2023-08-08 18:50:26 +00:00
arlo-phoenix
705bc024d2
Makefile: Add make hip
2023-08-05 02:41:58 +02:00
Tim Dettmers
7b6cfe1738
Added H100 support for CUDA 11.8 precompiled binaries.
2023-07-13 21:16:23 -07:00
Tim Dettmers
73aa4e0a33
Fixed Makefile and added CUDA 12.2 install.
2023-07-10 06:34:04 -07:00
Tim Dettmers
5f492d437e
Merge remote-tracking branch 'origin/inference'
2023-07-10 06:24:24 -07:00
Tim Dettmers
4395d68cf6
Release 0.39.1.
2023-06-19 19:40:41 -07:00
Tim Dettmers
b7f04e2a20
Added lookup table.
2023-05-30 20:07:05 -07:00
Tim Dettmers
ac5550a023
Added changes for deployment.
2023-05-30 19:06:59 -07:00
Tim Dettmers
0f40fa3f0a
Bumped version.
2023-05-23 19:55:52 -07:00
Tim Dettmers
2bce175d15
Fixed Makefile.
2023-05-23 18:42:19 -07:00
Tim Dettmers
d1c4c20568
Added non-cutlass template.
2023-04-27 15:11:26 -07:00
Tim Dettmers
0afc8e9e2f
Best attempt at cutlass3.
2023-04-26 17:12:34 -07:00
Tim Dettmers
84964db937
CUTLASS compiles.
2023-04-25 17:15:51 -07:00
Tim Dettmers
6bfd7a405f
Initial template.
2023-04-25 16:13:43 -07:00
Tim Dettmers
c4cfe4fbdd
Added bf16 Adam.
2023-04-01 10:33:03 -07:00
Tim Dettmers
de53588934
Added Int8 matmul support for all GPUs. Full backward support.
2023-02-01 20:09:31 -08:00
Tim Dettmers
3901ebf7ae
Added CUDA 12.0 support; removed CC 3.0 support.
2023-01-04 02:28:33 -08:00
Tom Aarsen
1eec77d34c
Remove trailing whitespace & ensure newline at EOF
2022-10-27 13:11:29 +02:00
Tim Dettmers
758c7175a2
Merge branch 'debug' into cuda-bin-switch-and-cli
2022-08-04 08:03:00 -07:00
Tim Dettmers
2f01865a2f
Added CUDA block assert and is_on_gpu check.
2022-08-03 09:05:37 -07:00
Tim Dettmers
4a6ea7e24b
Added adjusted build file.
2022-07-31 20:59:34 -07:00
Tim Dettmers
28d1e7dc01
Initial build script changes (untested on PyPi).
2022-07-31 19:41:56 -07:00
Tim Dettmers
a409213656
Fixed make default to compile with cublaslt.
2022-07-26 19:38:17 -07:00
Tim Dettmers
f2dd703251
Added matmul build and flags.
2022-07-25 22:34:14 -07:00
Tim Dettmers
9268dc9d88
Some progress on build script; added multi-cuda install script.
2022-07-25 19:30:37 -07:00
Tim Dettmers
1e88edd8c0
Removed rowscale (segfaults on ampere).
2022-07-25 17:27:57 -07:00
Tim Dettmers
8b1fd32e3e
Fixed makefile; fixed Ampere igemmlt_8 bug.
2022-07-25 14:02:14 -07:00
Max Ryabinin
8258b4364a
Add a CPU-only build option
2022-07-01 17:16:10 +03:00
Tim Dettmers
4e60e7dc62
Fixed makefile compute capabilities.
2021-11-29 09:54:19 -08:00
Tim Dettmers
b3fe8a6d0f
Upgraded to -std=c++14; printing gpp version. #12
2021-11-28 21:31:03 -08:00
Tim Dettmers
2f8083bd8b
Added AdamW. #10 #13
2021-11-28 21:18:11 -08:00
Tim Dettmers
c1ed5d39b9
Fixed compilation flag for CUDA 11.0.
2021-10-21 22:30:55 -07:00
Tim Dettmers
0fb378b4ee
Added compilation from source instructions; easier compilation.
2021-10-21 17:22:43 -07:00
Tim Dettmers
a6eae2e7f2
Added skip_zeros; tests are passing.
2021-10-20 19:15:47 -07:00
Tim Dettmers
8400b58cbb
Added Kepler and fixed V100+CUDA101 support. #4 #5
2021-10-17 21:21:39 -07:00
Tim Dettmers
7923c4a066
Changed from testpypi to pypi. Release 0.0.24
2021-10-07 08:39:38 -07:00
Tim Dettmers
7439924891
Initial commit
2021-10-05 19:16:20 -07:00