Tom Aarsen
|
0b078403ee
|
Simplify statements into equivalent, modern variants
via pyupgrade --py37-plus. The changes e.g. are subclassing from object, calling super() with super(ThisClass, self), or old-style syntax formatting.
|
2022-10-27 13:14:13 +02:00 |
|
Tom Aarsen
|
1eec77d34c
|
Remove trailing whitespace & ensure newline at EOF
|
2022-10-27 13:11:29 +02:00 |
|
Tim Dettmers
|
a371be302d
|
Added CUDA SETUP instruction generator.
|
2022-10-25 08:01:19 -07:00 |
|
Tim Dettmers
|
df86625a93
|
Isolated CUDASetup logging; all tests green.
|
2022-10-24 11:54:25 -07:00 |
|
justheuristic
|
76ce9aa6da
|
try fp32
|
2022-09-20 06:51:25 +03:00 |
|
Tim Dettmers
|
292a478716
|
set threshold
|
2022-09-20 06:42:05 +03:00 |
|
justheuristic
|
a07825ac31
|
review
|
2022-09-20 06:40:36 +03:00 |
|
justheuristic
|
cff3a71599
|
cast device
|
2022-09-18 01:26:25 +03:00 |
|
justheuristic
|
32a9a88f98
|
cast device
|
2022-09-18 01:26:12 +03:00 |
|
justheuristic
|
01b4c6a048
|
cast device
|
2022-09-18 01:25:56 +03:00 |
|
justheuristic
|
e4086a2758
|
cast device
|
2022-09-18 01:24:57 +03:00 |
|
justheuristic
|
725cc72993
|
cast device
|
2022-09-18 01:24:44 +03:00 |
|
justheuristic
|
28a9313ddc
|
cast before allclose
|
2022-09-18 01:24:27 +03:00 |
|
justheuristic
|
95dafc6475
|
cast before allclose
|
2022-09-18 01:22:31 +03:00 |
|
justheuristic
|
37f805bb44
|
debug
|
2022-09-18 01:21:12 +03:00 |
|
justheuristic
|
6a826c41a6
|
pre-cast
|
2022-09-18 01:20:34 +03:00 |
|
justheuristic
|
d9b8789818
|
debug
|
2022-09-18 01:13:58 +03:00 |
|
justheuristic
|
2cd047e35d
|
run backward
|
2022-09-18 00:55:53 +03:00 |
|
justheuristic
|
591f60395a
|
add memory efficient backward
|
2022-09-18 00:52:53 +03:00 |
|
justheuristic
|
f6670329fb
|
bump threshold to 0.21
|
2022-09-18 00:42:23 +03:00 |
|
justheuristic
|
fa8e07c7c5
|
more lenient threshold
|
2022-09-18 00:38:02 +03:00 |
|
justheuristic
|
e35e2c665a
|
cast properly
|
2022-09-18 00:35:03 +03:00 |
|
justheuristic
|
d9ca0ed905
|
un-fuse bias
|
2022-09-17 23:44:28 +03:00 |
|
justheuristic
|
7facedda38
|
copypaste tolerances
|
2022-09-17 23:41:40 +03:00 |
|
justheuristic
|
e29c5f5c41
|
clearer assertions
|
2022-09-17 23:22:04 +03:00 |
|
justheuristic
|
9379df85d2
|
check dtypes first
|
2022-09-17 23:13:23 +03:00 |
|
justheuristic
|
140cdbe876
|
check dtypes first
|
2022-09-17 23:12:58 +03:00 |
|
justheuristic
|
a9c7953e0a
|
cast to half before double_quant
|
2022-09-17 23:10:21 +03:00 |
|
justheuristic
|
469d5a631d
|
test_bf16
|
2022-09-17 23:06:57 +03:00 |
|
Tim Dettmers
|
c05dd42ddd
|
Fixed cpu blockwise quantization for small input tensors.
|
2022-09-13 10:37:53 -07:00 |
|
Tim Dettmers
|
19a7adca7a
|
Fixed 2^31 max size issue for cpu blockwise quant.
|
2022-09-11 11:55:09 -07:00 |
|
Tim Dettmers
|
7e0fb655e1
|
Some initial code. Needs to be tested.
|
2022-08-23 13:59:34 -07:00 |
|
Tim Dettmers
|
9d60b3c527
|
Fixed bug in Linear8bitLt, when the bias is None.
|
2022-08-17 03:45:57 -07:00 |
|
Tim Dettmers
|
de354f7ded
|
Added fused bias to matmullt.
|
2022-08-16 12:00:54 -07:00 |
|
Tim Dettmers
|
dede343033
|
Added fused bias in dequant_mm.
|
2022-08-16 11:12:09 -07:00 |
|
Tim Dettmers
|
1ed2fa2f21
|
Removed storage() from get_ptr; added boilerplate for bias dequant_mm.
|
2022-08-16 10:56:17 -07:00 |
|
Tim Dettmers
|
c472bd56f0
|
Added the case that all env variables are empty (CUDA docker).
|
2022-08-05 08:57:52 -07:00 |
|
Tim Dettmers
|
8f84674d67
|
Fixed bugs in cuda setup.
|
2022-08-04 09:16:00 -07:00 |
|
Tim Dettmers
|
758c7175a2
|
Merge branch 'debug' into cuda-bin-switch-and-cli
|
2022-08-04 08:03:00 -07:00 |
|
Tim Dettmers
|
cc5b323876
|
Merge branch 'extract_outliers' into debug
|
2022-08-04 07:40:48 -07:00 |
|
Tim Dettmers
|
451fd9506e
|
Added fixes for the case that matmullt dim A is zero, e.g. [0, 768].
|
2022-08-03 11:54:01 -07:00 |
|
Titus von Koeller
|
59a615b386
|
factored cuda_setup.main out into smaller modules and functions
|
2022-08-02 21:26:50 -07:00 |
|
Tim Dettmers
|
3479d02a76
|
Added some more docs and comments.
|
2022-08-01 19:43:09 -07:00 |
|
Tim Dettmers
|
8bf3e9faab
|
Added full env variable search; CONDA_PREFIX priority.
|
2022-08-01 19:22:41 -07:00 |
|
Titus von Koeller
|
ea7c14f8ef
|
reran black with linelength 80 for greater readability
|
2022-08-01 09:32:47 -07:00 |
|
Titus von Koeller
|
bfa0e33294
|
ran black and isort for coherent code formatting
|
2022-08-01 03:31:48 -07:00 |
|
Tim Dettmers
|
dd50382b32
|
Full evaluate_cuda setup with integration test.
|
2022-07-31 17:47:44 -07:00 |
|
Titus von Koeller
|
5d90b38c4d
|
adding CLI tool for CUDA install debugging - intermediate commit
|
2022-07-27 21:16:04 -07:00 |
|
Tim Dettmers
|
5737f2b027
|
Merge branch 'patch_merge' into extract_outliers
|
2022-07-26 19:38:01 -07:00 |
|
Tim Dettmers
|
32fa459ed7
|
Added col_ampere outlier extraction kernel.
|
2022-07-26 18:15:51 -07:00 |
|