Tim Dettmers
|
a371be302d
|
Added CUDA SETUP instruction generator.
|
2022-10-25 08:01:19 -07:00 |
|
Tim Dettmers
|
df86625a93
|
Isolated CUDASetup logging; all tests green.
|
2022-10-24 11:54:25 -07:00 |
|
justheuristic
|
76ce9aa6da
|
try fp32
|
2022-09-20 06:51:25 +03:00 |
|
Tim Dettmers
|
292a478716
|
set threshold
|
2022-09-20 06:42:05 +03:00 |
|
justheuristic
|
a07825ac31
|
review
|
2022-09-20 06:40:36 +03:00 |
|
justheuristic
|
cff3a71599
|
cast device
|
2022-09-18 01:26:25 +03:00 |
|
justheuristic
|
32a9a88f98
|
cast device
|
2022-09-18 01:26:12 +03:00 |
|
justheuristic
|
01b4c6a048
|
cast device
|
2022-09-18 01:25:56 +03:00 |
|
justheuristic
|
e4086a2758
|
cast device
|
2022-09-18 01:24:57 +03:00 |
|
justheuristic
|
725cc72993
|
cast device
|
2022-09-18 01:24:44 +03:00 |
|
justheuristic
|
28a9313ddc
|
cast before allclose
|
2022-09-18 01:24:27 +03:00 |
|
justheuristic
|
95dafc6475
|
cast before allclose
|
2022-09-18 01:22:31 +03:00 |
|
justheuristic
|
37f805bb44
|
debug
|
2022-09-18 01:21:12 +03:00 |
|
justheuristic
|
6a826c41a6
|
pre-cast
|
2022-09-18 01:20:34 +03:00 |
|
justheuristic
|
d9b8789818
|
debug
|
2022-09-18 01:13:58 +03:00 |
|
justheuristic
|
2cd047e35d
|
run backward
|
2022-09-18 00:55:53 +03:00 |
|
justheuristic
|
591f60395a
|
add memory efficient backward
|
2022-09-18 00:52:53 +03:00 |
|
justheuristic
|
f6670329fb
|
bump threshold to 0.21
|
2022-09-18 00:42:23 +03:00 |
|
justheuristic
|
fa8e07c7c5
|
more lenient threshold
|
2022-09-18 00:38:02 +03:00 |
|
justheuristic
|
e35e2c665a
|
cast properly
|
2022-09-18 00:35:03 +03:00 |
|
justheuristic
|
d9ca0ed905
|
un-fuse bias
|
2022-09-17 23:44:28 +03:00 |
|
justheuristic
|
7facedda38
|
copypaste tolerances
|
2022-09-17 23:41:40 +03:00 |
|
justheuristic
|
e29c5f5c41
|
clearer assertions
|
2022-09-17 23:22:04 +03:00 |
|
justheuristic
|
9379df85d2
|
check dtypes first
|
2022-09-17 23:13:23 +03:00 |
|
justheuristic
|
140cdbe876
|
check dtypes first
|
2022-09-17 23:12:58 +03:00 |
|
justheuristic
|
a9c7953e0a
|
cast to half before double_quant
|
2022-09-17 23:10:21 +03:00 |
|
justheuristic
|
469d5a631d
|
test_bf16
|
2022-09-17 23:06:57 +03:00 |
|
Tim Dettmers
|
c05dd42ddd
|
Fixed cpu blockwise quantization for small input tensors.
|
2022-09-13 10:37:53 -07:00 |
|
Tim Dettmers
|
19a7adca7a
|
Fixed 2^31 max size issue for cpu blockwise quant.
|
2022-09-11 11:55:09 -07:00 |
|
Tim Dettmers
|
7e0fb655e1
|
Some initial code. Needs to be tested.
|
2022-08-23 13:59:34 -07:00 |
|
Tim Dettmers
|
9d60b3c527
|
Fixed bug in Linear8bitLt, when the bias is None.
|
2022-08-17 03:45:57 -07:00 |
|
Tim Dettmers
|
de354f7ded
|
Added fused bias to matmullt.
|
2022-08-16 12:00:54 -07:00 |
|
Tim Dettmers
|
dede343033
|
Added fused bias in dequant_mm.
|
2022-08-16 11:12:09 -07:00 |
|
Tim Dettmers
|
1ed2fa2f21
|
Removed storage() from get_ptr; added boilerplate for bias dequant_mm.
|
2022-08-16 10:56:17 -07:00 |
|
Tim Dettmers
|
c472bd56f0
|
Added the case that all env variables are empty (CUDA docker).
|
2022-08-05 08:57:52 -07:00 |
|
Tim Dettmers
|
8f84674d67
|
Fixed bugs in cuda setup.
|
2022-08-04 09:16:00 -07:00 |
|
Tim Dettmers
|
758c7175a2
|
Merge branch 'debug' into cuda-bin-switch-and-cli
|
2022-08-04 08:03:00 -07:00 |
|
Tim Dettmers
|
cc5b323876
|
Merge branch 'extract_outliers' into debug
|
2022-08-04 07:40:48 -07:00 |
|
Tim Dettmers
|
451fd9506e
|
Added fixes for the case that matmullt dim A is zero, e.g. [0, 768].
|
2022-08-03 11:54:01 -07:00 |
|
Titus von Koeller
|
59a615b386
|
factored cuda_setup.main out into smaller modules and functions
|
2022-08-02 21:26:50 -07:00 |
|
Tim Dettmers
|
3479d02a76
|
Added some more docs and comments.
|
2022-08-01 19:43:09 -07:00 |
|
Tim Dettmers
|
8bf3e9faab
|
Added full env variable search; CONDA_PREFIX priority.
|
2022-08-01 19:22:41 -07:00 |
|
Titus von Koeller
|
ea7c14f8ef
|
reran black with linelength 80 for greater readability
|
2022-08-01 09:32:47 -07:00 |
|
Titus von Koeller
|
bfa0e33294
|
ran black and isort for coherent code formatting
|
2022-08-01 03:31:48 -07:00 |
|
Tim Dettmers
|
dd50382b32
|
Full evaluate_cuda setup with integration test.
|
2022-07-31 17:47:44 -07:00 |
|
Titus von Koeller
|
5d90b38c4d
|
adding CLI tool for CUDA install debugging - intermediate commit
|
2022-07-27 21:16:04 -07:00 |
|
Tim Dettmers
|
5737f2b027
|
Merge branch 'patch_merge' into extract_outliers
|
2022-07-26 19:38:01 -07:00 |
|
Tim Dettmers
|
32fa459ed7
|
Added col_ampere outlier extraction kernel.
|
2022-07-26 18:15:51 -07:00 |
|
Tim Dettmers
|
bcab99ec87
|
Working outlier extraction for Turing.
|
2022-07-26 17:39:30 -07:00 |
|
Tim Dettmers
|
cbb901ac51
|
Boilerplate and test for extract_outliers.
|
2022-07-26 12:12:38 -07:00 |
|
Tim Dettmers
|
1e88edd8c0
|
Removed rowscale (segfaults on ampere).
|
2022-07-25 17:27:57 -07:00 |
|
Tim Dettmers
|
8b1fd32e3e
|
Fixed makefile; fixed Ampere igemmlt_8 bug.
|
2022-07-25 14:02:14 -07:00 |
|
Tim Dettmers
|
c771b3a75a
|
Most tests passing.
|
2022-07-22 14:41:05 -07:00 |
|
Max Ryabinin
|
33efe4a09f
|
Remove unused imports, fix NotImplementedError
|
2022-06-30 18:14:20 +03:00 |
|
Tim Dettmers
|
20e1677dfd
|
Added module override, bnb.nn.Embedding #13 #15 #19
|
2021-11-29 09:32:13 -08:00 |
|
Tim Dettmers
|
108cf9fc1f
|
Fixed unsafe use of eval. #8
|
2021-11-29 08:21:05 -08:00 |
|
Tim Dettmers
|
2f8083bd8b
|
Added AdamW. #10 #13
|
2021-11-28 21:18:11 -08:00 |
|
Tim Dettmers
|
8b3c0f355c
|
Added adagrad with tests (no clipping).
|
2021-11-10 15:10:02 -08:00 |
|
Tim Dettmers
|
bb34fd50a1
|
Initial plumbing for skip_zeros.
|
2021-10-20 18:37:44 -07:00 |
|
Tim Dettmers
|
7439924891
|
Initial commit
|
2021-10-05 19:16:20 -07:00 |
|