Tim Dettmers
|
de53588934
|
Added Int8 matmul support for all GPUs. Full backward support.
|
2023-02-01 20:09:31 -08:00 |
|
Tim Dettmers
|
336e24696c
|
CUDASetup only executed once + fixed circular import.
|
2023-01-02 03:31:43 -08:00 |
|
Tim Dettmers
|
c91f592ad7
|
Merge branch 'main' into cleanup
|
2023-01-02 11:19:16 +01:00 |
|
Tim Dettmers
|
eb028e6ebc
|
Fixed k-bit quantization maps.
|
2022-11-19 07:24:03 -08:00 |
|
Tom Aarsen
|
b104ce3b62
|
Merge branch 'main' into cleanup
|
2022-11-17 15:22:29 +01:00 |
|
Tim Dettmers
|
08fa2e7b01
|
Fixed bug in cpu quant; faster GPU dequant.
|
2022-11-07 18:06:18 -08:00 |
|
Tim Dettmers
|
e0e697b150
|
Fixed blockwise test and logic.
|
2022-11-06 16:36:31 -08:00 |
|
Tim Dettmers
|
6bc2b992be
|
Added blocksizes 2048, 1024, and 512 to blockwise quant.
|
2022-11-06 16:27:48 -08:00 |
|
Tim Dettmers
|
2f2063bac2
|
Added k<256 quantile estimate.
|
2022-11-06 13:05:25 -08:00 |
|
Tim Dettmers
|
98cbc4bc4f
|
Added k-bit fp8 map.
|
2022-11-06 11:59:37 -08:00 |
|
Tim Dettmers
|
caf1832526
|
Added k-bit linear quantization.
|
2022-11-06 11:47:54 -08:00 |
|
Tim Dettmers
|
1efb87d89d
|
Added FP8 quantization map.
|
2022-11-03 19:49:50 -07:00 |
|
Tom Aarsen
|
7a3c9af05d
|
Sort imports
Via isort
|
2022-10-27 13:15:21 +02:00 |
|
Tom Aarsen
|
0b078403ee
|
Simplify statements into equivalent, modern variants
via pyupgrade --py37-plus. The changes e.g. are subclassing from object, calling super() with super(ThisClass, self), or old-style syntax formatting.
|
2022-10-27 13:14:13 +02:00 |
|
Tom Aarsen
|
1eec77d34c
|
Remove trailing whitespace & ensure newline at EOF
|
2022-10-27 13:11:29 +02:00 |
|
Tim Dettmers
|
a371be302d
|
Added CUDA SETUP instruction generator.
|
2022-10-25 08:01:19 -07:00 |
|
Tim Dettmers
|
df86625a93
|
Isolated CUDASetup logging; all tests green.
|
2022-10-24 11:54:25 -07:00 |
|
justheuristic
|
76ce9aa6da
|
try fp32
|
2022-09-20 06:51:25 +03:00 |
|
Tim Dettmers
|
292a478716
|
set threshold
|
2022-09-20 06:42:05 +03:00 |
|
justheuristic
|
a07825ac31
|
review
|
2022-09-20 06:40:36 +03:00 |
|
justheuristic
|
cff3a71599
|
cast device
|
2022-09-18 01:26:25 +03:00 |
|
justheuristic
|
32a9a88f98
|
cast device
|
2022-09-18 01:26:12 +03:00 |
|
justheuristic
|
01b4c6a048
|
cast device
|
2022-09-18 01:25:56 +03:00 |
|
justheuristic
|
e4086a2758
|
cast device
|
2022-09-18 01:24:57 +03:00 |
|
justheuristic
|
725cc72993
|
cast device
|
2022-09-18 01:24:44 +03:00 |
|
justheuristic
|
28a9313ddc
|
cast before allclose
|
2022-09-18 01:24:27 +03:00 |
|
justheuristic
|
95dafc6475
|
cast before allclose
|
2022-09-18 01:22:31 +03:00 |
|
justheuristic
|
37f805bb44
|
debug
|
2022-09-18 01:21:12 +03:00 |
|
justheuristic
|
6a826c41a6
|
pre-cast
|
2022-09-18 01:20:34 +03:00 |
|
justheuristic
|
d9b8789818
|
debug
|
2022-09-18 01:13:58 +03:00 |
|
justheuristic
|
2cd047e35d
|
run backward
|
2022-09-18 00:55:53 +03:00 |
|
justheuristic
|
591f60395a
|
add memory efficient backward
|
2022-09-18 00:52:53 +03:00 |
|
justheuristic
|
f6670329fb
|
bump threshold to 0.21
|
2022-09-18 00:42:23 +03:00 |
|
justheuristic
|
fa8e07c7c5
|
more lenient threshold
|
2022-09-18 00:38:02 +03:00 |
|
justheuristic
|
e35e2c665a
|
cast properly
|
2022-09-18 00:35:03 +03:00 |
|
justheuristic
|
d9ca0ed905
|
un-fuse bias
|
2022-09-17 23:44:28 +03:00 |
|
justheuristic
|
7facedda38
|
copypaste tolerances
|
2022-09-17 23:41:40 +03:00 |
|
justheuristic
|
e29c5f5c41
|
clearer assertions
|
2022-09-17 23:22:04 +03:00 |
|
justheuristic
|
9379df85d2
|
check dtypes first
|
2022-09-17 23:13:23 +03:00 |
|
justheuristic
|
140cdbe876
|
check dtypes first
|
2022-09-17 23:12:58 +03:00 |
|
justheuristic
|
a9c7953e0a
|
cast to half before double_quant
|
2022-09-17 23:10:21 +03:00 |
|
justheuristic
|
469d5a631d
|
test_bf16
|
2022-09-17 23:06:57 +03:00 |
|
Tim Dettmers
|
c05dd42ddd
|
Fixed cpu blockwise quantization for small input tensors.
|
2022-09-13 10:37:53 -07:00 |
|
Tim Dettmers
|
19a7adca7a
|
Fixed 2^31 max size issue for cpu blockwise quant.
|
2022-09-11 11:55:09 -07:00 |
|
Tim Dettmers
|
7e0fb655e1
|
Some initial code. Needs to be tested.
|
2022-08-23 13:59:34 -07:00 |
|
Tim Dettmers
|
9d60b3c527
|
Fixed bug in Linear8bitLt, when the bias is None.
|
2022-08-17 03:45:57 -07:00 |
|
Tim Dettmers
|
de354f7ded
|
Added fused bias to matmullt.
|
2022-08-16 12:00:54 -07:00 |
|
Tim Dettmers
|
dede343033
|
Added fused bias in dequant_mm.
|
2022-08-16 11:12:09 -07:00 |
|
Tim Dettmers
|
1ed2fa2f21
|
Removed storage() from get_ptr; added boilerplate for bias dequant_mm.
|
2022-08-16 10:56:17 -07:00 |
|
Tim Dettmers
|
c472bd56f0
|
Added the case that all env variables are empty (CUDA docker).
|
2022-08-05 08:57:52 -07:00 |
|