Commit Graph

166 Commits

Author SHA1 Message Date
Tim Dettmers
98cbc4bc4f Added k-bit fp8 map. 2022-11-06 11:59:37 -08:00
Tim Dettmers
caf1832526 Added k-bit linear quantization. 2022-11-06 11:47:54 -08:00
Tim Dettmers
1efb87d89d Added FP8 quantization map. 2022-11-03 19:49:50 -07:00
Tom Aarsen
7a3c9af05d Sort imports
Via isort
2022-10-27 13:15:21 +02:00
Tom Aarsen
0b078403ee Simplify statements into equivalent, modern variants
via pyupgrade --py37-plus. The changes e.g. are subclassing from object, calling super() with super(ThisClass, self), or old-style syntax formatting.
2022-10-27 13:14:13 +02:00
Tom Aarsen
1eec77d34c Remove trailing whitespace & ensure newline at EOF 2022-10-27 13:11:29 +02:00
Tim Dettmers
a371be302d Added CUDA SETUP instruction generator. 2022-10-25 08:01:19 -07:00
Tim Dettmers
df86625a93 Isolated CUDASetup logging; all tests green. 2022-10-24 11:54:25 -07:00
justheuristic
76ce9aa6da try fp32 2022-09-20 06:51:25 +03:00
Tim Dettmers
292a478716 set threshold 2022-09-20 06:42:05 +03:00
justheuristic
a07825ac31 review 2022-09-20 06:40:36 +03:00
justheuristic
cff3a71599 cast device 2022-09-18 01:26:25 +03:00
justheuristic
32a9a88f98 cast device 2022-09-18 01:26:12 +03:00
justheuristic
01b4c6a048 cast device 2022-09-18 01:25:56 +03:00
justheuristic
e4086a2758 cast device 2022-09-18 01:24:57 +03:00
justheuristic
725cc72993 cast device 2022-09-18 01:24:44 +03:00
justheuristic
28a9313ddc cast before allclose 2022-09-18 01:24:27 +03:00
justheuristic
95dafc6475 cast before allclose 2022-09-18 01:22:31 +03:00
justheuristic
37f805bb44 debug 2022-09-18 01:21:12 +03:00
justheuristic
6a826c41a6 pre-cast 2022-09-18 01:20:34 +03:00
justheuristic
d9b8789818 debug 2022-09-18 01:13:58 +03:00
justheuristic
2cd047e35d run backward 2022-09-18 00:55:53 +03:00
justheuristic
591f60395a add memory efficient backward 2022-09-18 00:52:53 +03:00
justheuristic
f6670329fb bump threshold to 0.21 2022-09-18 00:42:23 +03:00
justheuristic
fa8e07c7c5 more lenient threshold 2022-09-18 00:38:02 +03:00
justheuristic
e35e2c665a cast properly 2022-09-18 00:35:03 +03:00
justheuristic
d9ca0ed905 un-fuse bias 2022-09-17 23:44:28 +03:00
justheuristic
7facedda38 copypaste tolerances 2022-09-17 23:41:40 +03:00
justheuristic
e29c5f5c41 clearer assertions 2022-09-17 23:22:04 +03:00
justheuristic
9379df85d2 check dtypes first 2022-09-17 23:13:23 +03:00
justheuristic
140cdbe876 check dtypes first 2022-09-17 23:12:58 +03:00
justheuristic
a9c7953e0a cast to half before double_quant 2022-09-17 23:10:21 +03:00
justheuristic
469d5a631d test_bf16 2022-09-17 23:06:57 +03:00
Tim Dettmers
c05dd42ddd Fixed cpu blockwise quantization for small input tensors. 2022-09-13 10:37:53 -07:00
Tim Dettmers
19a7adca7a Fixed 2^31 max size issue for cpu blockwise quant. 2022-09-11 11:55:09 -07:00
Tim Dettmers
7e0fb655e1 Some initial code. Needs to be tested. 2022-08-23 13:59:34 -07:00
Tim Dettmers
9d60b3c527 Fixed bug in Linear8bitLt, when the bias is None. 2022-08-17 03:45:57 -07:00
Tim Dettmers
de354f7ded Added fused bias to matmullt. 2022-08-16 12:00:54 -07:00
Tim Dettmers
dede343033 Added fused bias in dequant_mm. 2022-08-16 11:12:09 -07:00
Tim Dettmers
1ed2fa2f21 Removed storage() from get_ptr; added boilerplate for bias dequant_mm. 2022-08-16 10:56:17 -07:00
Tim Dettmers
c472bd56f0 Added the case that all env variables are empty (CUDA docker). 2022-08-05 08:57:52 -07:00
Tim Dettmers
8f84674d67 Fixed bugs in cuda setup. 2022-08-04 09:16:00 -07:00
Tim Dettmers
758c7175a2 Merge branch 'debug' into cuda-bin-switch-and-cli 2022-08-04 08:03:00 -07:00
Tim Dettmers
cc5b323876 Merge branch 'extract_outliers' into debug 2022-08-04 07:40:48 -07:00
Tim Dettmers
451fd9506e Added fixes for the case that matmullt dim A is zero, e.g. [0, 768]. 2022-08-03 11:54:01 -07:00
Titus von Koeller
59a615b386 factored cuda_setup.main out into smaller modules and functions 2022-08-02 21:26:50 -07:00
Tim Dettmers
3479d02a76 Added some more docs and comments. 2022-08-01 19:43:09 -07:00
Tim Dettmers
8bf3e9faab Added full env variable search; CONDA_PREFIX priority. 2022-08-01 19:22:41 -07:00
Titus von Koeller
ea7c14f8ef reran black with linelength 80 for greater readability 2022-08-01 09:32:47 -07:00
Titus von Koeller
bfa0e33294 ran black and isort for coherent code formatting 2022-08-01 03:31:48 -07:00
Tim Dettmers
dd50382b32 Full evaluate_cuda setup with integration test. 2022-07-31 17:47:44 -07:00
Titus von Koeller
5d90b38c4d adding CLI tool for CUDA install debugging - intermediate commit 2022-07-27 21:16:04 -07:00
Tim Dettmers
5737f2b027 Merge branch 'patch_merge' into extract_outliers 2022-07-26 19:38:01 -07:00
Tim Dettmers
32fa459ed7 Added col_ampere outlier extraction kernel. 2022-07-26 18:15:51 -07:00
Tim Dettmers
bcab99ec87 Working outlier extraction for Turing. 2022-07-26 17:39:30 -07:00
Tim Dettmers
cbb901ac51 Boilerplate and test for extract_outliers. 2022-07-26 12:12:38 -07:00
Tim Dettmers
1e88edd8c0 Removed rowscale (segfaults on ampere). 2022-07-25 17:27:57 -07:00
Tim Dettmers
8b1fd32e3e Fixed makefile; fixed Ampere igemmlt_8 bug. 2022-07-25 14:02:14 -07:00
Tim Dettmers
c771b3a75a Most tests passing. 2022-07-22 14:41:05 -07:00
Max Ryabinin
33efe4a09f Remove unused imports, fix NotImplementedError 2022-06-30 18:14:20 +03:00
Tim Dettmers
20e1677dfd Added module override, bnb.nn.Embedding #13 #15 #19 2021-11-29 09:32:13 -08:00
Tim Dettmers
108cf9fc1f Fixed unsafe use of eval. #8 2021-11-29 08:21:05 -08:00
Tim Dettmers
2f8083bd8b Added AdamW. #10 #13 2021-11-28 21:18:11 -08:00
Tim Dettmers
8b3c0f355c Added adagrad with tests (no clipping). 2021-11-10 15:10:02 -08:00
Tim Dettmers
bb34fd50a1 Initial plumbing for skip_zeros. 2021-10-20 18:37:44 -07:00
Tim Dettmers
7439924891 Initial commit 2021-10-05 19:16:20 -07:00