Commit Graph

145 Commits

Author SHA1 Message Date
Tim Dettmers
69810521d3 Some small changes. 2023-03-27 09:12:57 -07:00
Phil Wang
a43cd2008d add some code in test_optim.py, although it seems to be failing 2023-03-22 09:14:05 -07:00
Max Ryabinin
dcecbb26ca Add force_no_igemmlt to test params 2023-03-22 00:28:49 +01:00
Phil Wang
8de29fc364 forget about tests for now, will test live on local enwik8 training 2023-03-09 10:11:32 -08:00
Phil Wang
cb4c3c8c66 do a bunch of typical bookkeeping before getting to main lion logic 2023-03-09 10:10:19 -08:00
Max Ryabinin
ac3ab281e3 Handle more cases in test_linear_serialization 2023-02-25 06:01:04 +01:00
Tim Dettmers
c5c38ca19c Added matmul_mixed. 2023-02-23 10:45:18 -08:00
Max Ryabinin
58b09ee1b1 [WIP] Implement proper serialization of Linear8bitLt 2023-02-21 12:04:47 +01:00
Tim Dettmers
2489d819c5 Added more blocksizes for stochastic rounding; fixed dequant blocksize. 2023-02-14 13:55:17 -08:00
Tim Dettmers
2dfa3ce16d Fixed LinearFP8 and added tests. 2023-02-13 17:48:52 -08:00
Tim Dettmers
ca3236587a Added forward/backward tests; removed bias. 2023-02-13 17:20:52 -08:00
Tim Dettmers
6bdb6c351e Added fp8 simulation layer. 2023-02-13 16:53:07 -08:00
Tim Dettmers
7f0773aede Added backprop test for Linear8bitLt and LinearFP4. 2023-02-05 06:49:54 -08:00
Tim Dettmers
c0c352b379 Added bias test for LinearFP4 and basic test. 2023-02-05 06:29:52 -08:00
Tim Dettmers
c361f84239 Fixed matmul_fp4 transpose. 2023-02-05 06:16:56 -08:00
Tim Dettmers
cfe4705e32 Added matmul_fp4 to the benchmark. 2023-02-04 22:00:04 -08:00
Tim Dettmers
13c0a4dc5d Backward matmul_fp4 passes. 2023-02-04 21:35:43 -08:00
Tim Dettmers
160a83580d Forward matmul_fp4 tests pass. 2023-02-04 21:11:21 -08:00
Tim Dettmers
3ac5840c03 Added fp4 quant/dequant and dequant optimizations. 2023-02-04 14:52:04 -08:00
Tim Dettmers
de53588934 Added Int8 matmul support for all GPUs. Full backward support. 2023-02-01 20:09:31 -08:00
Tim Dettmers
c9f505064e Added outlier detector and fake quantization layer. 2023-01-28 17:05:22 -08:00
Tim Dettmers
336e24696c CUDASetup only executed once + fixed circular import. 2023-01-02 03:31:43 -08:00
Tim Dettmers
c91f592ad7
Merge branch 'main' into cleanup 2023-01-02 11:19:16 +01:00
Tim Dettmers
eb028e6ebc Fixed k-bit quantization maps. 2022-11-19 07:24:03 -08:00
Tom Aarsen
b104ce3b62
Merge branch 'main' into cleanup 2022-11-17 15:22:29 +01:00
Tim Dettmers
08fa2e7b01 Fixed bug in cpu quant; faster GPU dequant. 2022-11-07 18:06:18 -08:00
Tim Dettmers
e0e697b150 Fixed blockwise test and logic. 2022-11-06 16:36:31 -08:00
Tim Dettmers
6bc2b992be Added blocksizes 2048, 1024, and 512 to blockwise quant. 2022-11-06 16:27:48 -08:00
Tim Dettmers
2f2063bac2 Added k<256 quantile estimate. 2022-11-06 13:05:25 -08:00
Tim Dettmers
98cbc4bc4f Added k-bit fp8 map. 2022-11-06 11:59:37 -08:00
Tim Dettmers
caf1832526 Added k-bit linear quantization. 2022-11-06 11:47:54 -08:00
Tim Dettmers
1efb87d89d Added FP8 quantization map. 2022-11-03 19:49:50 -07:00
Tom Aarsen
7a3c9af05d Sort imports
Via isort
2022-10-27 13:15:21 +02:00
Tom Aarsen
0b078403ee Simplify statements into equivalent, modern variants
via pyupgrade --py37-plus. The changes e.g. are subclassing from object, calling super() with super(ThisClass, self), or old-style syntax formatting.
2022-10-27 13:14:13 +02:00
Tom Aarsen
1eec77d34c Remove trailing whitespace & ensure newline at EOF 2022-10-27 13:11:29 +02:00
Tim Dettmers
a371be302d Added CUDA SETUP instruction generator. 2022-10-25 08:01:19 -07:00
Tim Dettmers
df86625a93 Isolated CUDASetup logging; all tests green. 2022-10-24 11:54:25 -07:00
justheuristic
76ce9aa6da try fp32 2022-09-20 06:51:25 +03:00
Tim Dettmers
292a478716 set threshold 2022-09-20 06:42:05 +03:00
justheuristic
a07825ac31 review 2022-09-20 06:40:36 +03:00
justheuristic
cff3a71599 cast device 2022-09-18 01:26:25 +03:00
justheuristic
32a9a88f98 cast device 2022-09-18 01:26:12 +03:00
justheuristic
01b4c6a048 cast device 2022-09-18 01:25:56 +03:00
justheuristic
e4086a2758 cast device 2022-09-18 01:24:57 +03:00
justheuristic
725cc72993 cast device 2022-09-18 01:24:44 +03:00
justheuristic
28a9313ddc cast before allclose 2022-09-18 01:24:27 +03:00
justheuristic
95dafc6475 cast before allclose 2022-09-18 01:22:31 +03:00
justheuristic
37f805bb44 debug 2022-09-18 01:21:12 +03:00
justheuristic
6a826c41a6 pre-cast 2022-09-18 01:20:34 +03:00
justheuristic
d9b8789818 debug 2022-09-18 01:13:58 +03:00
justheuristic
2cd047e35d run backward 2022-09-18 00:55:53 +03:00
justheuristic
591f60395a add memory efficient backward 2022-09-18 00:52:53 +03:00
justheuristic
f6670329fb bump threshold to 0.21 2022-09-18 00:42:23 +03:00
justheuristic
fa8e07c7c5 more lenient threshold 2022-09-18 00:38:02 +03:00
justheuristic
e35e2c665a cast properly 2022-09-18 00:35:03 +03:00
justheuristic
d9ca0ed905 un-fuse bias 2022-09-17 23:44:28 +03:00
justheuristic
7facedda38 copypaste tolerances 2022-09-17 23:41:40 +03:00
justheuristic
e29c5f5c41 clearer assertions 2022-09-17 23:22:04 +03:00
justheuristic
9379df85d2 check dtypes first 2022-09-17 23:13:23 +03:00
justheuristic
140cdbe876 check dtypes first 2022-09-17 23:12:58 +03:00
justheuristic
a9c7953e0a cast to half before double_quant 2022-09-17 23:10:21 +03:00
justheuristic
469d5a631d test_bf16 2022-09-17 23:06:57 +03:00
Tim Dettmers
c05dd42ddd Fixed cpu blockwise quantization for small input tensors. 2022-09-13 10:37:53 -07:00
Tim Dettmers
19a7adca7a Fixed 2^31 max size issue for cpu blockwise quant. 2022-09-11 11:55:09 -07:00
Tim Dettmers
7e0fb655e1 Some initial code. Needs to be tested. 2022-08-23 13:59:34 -07:00
Tim Dettmers
9d60b3c527 Fixed bug in Linear8bitLt, when the bias is None. 2022-08-17 03:45:57 -07:00
Tim Dettmers
de354f7ded Added fused bias to matmullt. 2022-08-16 12:00:54 -07:00
Tim Dettmers
dede343033 Added fused bias in dequant_mm. 2022-08-16 11:12:09 -07:00
Tim Dettmers
1ed2fa2f21 Removed storage() from get_ptr; added boilerplate for bias dequant_mm. 2022-08-16 10:56:17 -07:00
Tim Dettmers
c472bd56f0 Added the case that all env variables are empty (CUDA docker). 2022-08-05 08:57:52 -07:00
Tim Dettmers
8f84674d67 Fixed bugs in cuda setup. 2022-08-04 09:16:00 -07:00
Tim Dettmers
758c7175a2 Merge branch 'debug' into cuda-bin-switch-and-cli 2022-08-04 08:03:00 -07:00
Tim Dettmers
cc5b323876 Merge branch 'extract_outliers' into debug 2022-08-04 07:40:48 -07:00
Tim Dettmers
451fd9506e Added fixes for the case that matmullt dim A is zero, e.g. [0, 768]. 2022-08-03 11:54:01 -07:00
Titus von Koeller
59a615b386 factored cuda_setup.main out into smaller modules and functions 2022-08-02 21:26:50 -07:00
Tim Dettmers
3479d02a76 Added some more docs and comments. 2022-08-01 19:43:09 -07:00
Tim Dettmers
8bf3e9faab Added full env variable search; CONDA_PREFIX priority. 2022-08-01 19:22:41 -07:00
Titus von Koeller
ea7c14f8ef reran black with linelength 80 for greater readability 2022-08-01 09:32:47 -07:00
Titus von Koeller
bfa0e33294 ran black and isort for coherent code formatting 2022-08-01 03:31:48 -07:00
Tim Dettmers
dd50382b32 Full evaluate_cuda setup with integration test. 2022-07-31 17:47:44 -07:00
Titus von Koeller
5d90b38c4d adding CLI tool for CUDA install debugging - intermediate commit 2022-07-27 21:16:04 -07:00
Tim Dettmers
5737f2b027 Merge branch 'patch_merge' into extract_outliers 2022-07-26 19:38:01 -07:00
Tim Dettmers
32fa459ed7 Added col_ampere outlier extraction kernel. 2022-07-26 18:15:51 -07:00
Tim Dettmers
bcab99ec87 Working outlier extraction for Turing. 2022-07-26 17:39:30 -07:00
Tim Dettmers
cbb901ac51 Boilerplate and test for extract_outliers. 2022-07-26 12:12:38 -07:00
Tim Dettmers
1e88edd8c0 Removed rowscale (segfaults on ampere). 2022-07-25 17:27:57 -07:00
Tim Dettmers
8b1fd32e3e Fixed makefile; fixed Ampere igemmlt_8 bug. 2022-07-25 14:02:14 -07:00
Tim Dettmers
c771b3a75a Most tests passing. 2022-07-22 14:41:05 -07:00
Max Ryabinin
33efe4a09f Remove unused imports, fix NotImplementedError 2022-06-30 18:14:20 +03:00
Tim Dettmers
20e1677dfd Added module override, bnb.nn.Embedding #13 #15 #19 2021-11-29 09:32:13 -08:00
Tim Dettmers
108cf9fc1f Fixed unsafe use of eval. #8 2021-11-29 08:21:05 -08:00
Tim Dettmers
2f8083bd8b Added AdamW. #10 #13 2021-11-28 21:18:11 -08:00
Tim Dettmers
8b3c0f355c Added adagrad with tests (no clipping). 2021-11-10 15:10:02 -08:00
Tim Dettmers
bb34fd50a1 Initial plumbing for skip_zeros. 2021-10-20 18:37:44 -07:00
Tim Dettmers
7439924891 Initial commit 2021-10-05 19:16:20 -07:00