Tim Dettmers
|
4cd63deff3
|
Fixed CUDA Conda PyTorch 2.0 issues.
|
2023-04-11 12:10:20 -07:00 |
|
Tim Dettmers
|
2bb5c00ba9
|
Added pre/post call to all lib calls. Fixes #120
|
2023-04-11 09:36:56 -07:00 |
|
Tim Dettmers
|
2eb3108356
|
Fixed bug where beta2 was not passed into Lion 32-bit.
|
2023-04-11 09:16:01 -07:00 |
|
Tim Dettmers
|
ed6f3eb146
|
Merge pull request #159 from TimDettmers/serialize_8bit
Implement proper serialization of Linear8bitLt
|
2023-04-11 07:24:51 -07:00 |
|
Tim Dettmers
|
b0ec20c3b3
|
Merge pull request #188 from lucidrains/main
Lion 8 bit
|
2023-04-11 07:22:45 -07:00 |
|
Tim Dettmers
|
d3e0e39def
|
Merge pull request #190 from svgsponer/Fix#157
Fix #157; Add XDG_GREETER_DATA_DIR to ignorelist
|
2023-04-11 07:20:16 -07:00 |
|
Tim Dettmers
|
c7875533ce
|
Merge pull request #213 from tonylins/dev/fix_no_absmax
Gix a bug in (de)quantize_no_absmax with multiple GPUs
|
2023-04-11 07:18:24 -07:00 |
|
Tim Dettmers
|
6b4c5afe21
|
Merge pull request #260 from rapsealk/fix_libsbitsandbytes_cpu_so
Fixed typo libsbitsandbytes_cpu.so
|
2023-04-11 07:15:42 -07:00 |
|
justheuristic
|
5e456be50e
|
Support 1650, 1660
|
2023-04-10 21:26:52 +03:00 |
|
Mitchell Wortsman
|
d677a71607
|
typo
|
2023-04-08 19:36:17 +00:00 |
|
Mitchell Wortsman
|
da524d97c9
|
mem efficient"
|
2023-04-08 19:34:18 +00:00 |
|
Tim Dettmers
|
e9fa03b717
|
Some fixed for loading PEFT modules with Params4bit.
|
2023-04-07 09:59:21 -07:00 |
|
Jeongseok Kang
|
8cceff72db
|
Fixed typo libsbitsandbytes_cpu.so
|
2023-04-05 09:28:41 +09:00 |
|
Tim Dettmers
|
1ccb7bdec6
|
Fixed ParamsIn4 init; fixed PyTorch 2.0 test failure.
|
2023-04-03 18:47:00 -07:00 |
|
Tim Dettmers
|
4ea489d3bf
|
Refactor FP4 into 4Bit and integrate NF4 data type.
|
2023-04-03 11:00:12 -07:00 |
|
Tim Dettmers
|
64cc05920d
|
First draft of NF4.
|
2023-04-02 16:10:35 -07:00 |
|
Tim Dettmers
|
4ad999d144
|
Added quantization tree generation.
|
2023-04-02 14:42:45 -07:00 |
|
Tim Dettmers
|
0d332a641f
|
Added normal with extra value.
|
2023-04-02 14:09:08 -07:00 |
|
Tim Dettmers
|
51a21df728
|
Added 8-bit compression to quantization statistics.
|
2023-04-01 16:10:18 -07:00 |
|
Mitchell Wortsman
|
7f87ba83ee
|
cleaning and refactor
|
2023-04-01 18:46:04 +00:00 |
|
Tim Dettmers
|
c4cfe4fbdd
|
Added bf16 Adam.
|
2023-04-01 10:33:03 -07:00 |
|
Tim Dettmers
|
a13a522c4c
|
Added first triton test.
|
2023-03-31 11:20:54 -07:00 |
|
Tim Dettmers
|
8645d1f71c
|
Added normal quant.
|
2023-03-29 18:41:37 -07:00 |
|
Mitchell Wortsman
|
5f3d9ada8d
|
triton-v1
|
2023-03-29 06:47:08 +00:00 |
|
Tim Dettmers
|
69810521d3
|
Some small changes.
|
2023-03-27 09:12:57 -07:00 |
|
Mitchell Wortsman
|
51f8bb7133
|
pre-triton update
|
2023-03-24 05:44:42 +00:00 |
|
Ji Lin
|
b6383ba116
|
fix a bug in quantize_no_absmax and dequantize_no_absmax with multiple gpus
|
2023-03-22 22:14:57 -04:00 |
|
Severin Gsponer
|
c4866ab06e
|
Fix #157; Add XDG_GREETER_DATA_DIR to ignorelist
|
2023-03-11 15:35:23 +01:00 |
|
Phil Wang
|
19b9ef34b9
|
whoops
|
2023-03-10 08:59:49 -08:00 |
|
Phil Wang
|
c99b44f774
|
do the epsilon beta2 switcharoo within the cuda code, and not within the python class (so that the state dict still makes sense)
|
2023-03-10 08:57:59 -08:00 |
|
Phil Wang
|
c83888aa1a
|
use epsilon as beta2 for lion, complete most of the logic in kernel.cu for all functions
|
2023-03-09 11:54:54 -08:00 |
|
Phil Wang
|
cb4c3c8c66
|
do a bunch of typical bookkeeping before getting to main lion logic
|
2023-03-09 10:10:19 -08:00 |
|
Phil Wang
|
d43ea9722c
|
make sure interface is correct
|
2023-03-09 09:45:33 -08:00 |
|
Phil Wang
|
7247cb4554
|
initial commit, slowly work from interface into the kernel
|
2023-03-09 08:08:46 -08:00 |
|
Artidoro Pagnoni
|
6c31a5fe99
|
t5 model fix
|
2023-02-27 14:23:21 -08:00 |
|
Max Ryabinin
|
24609b66af
|
Reduce diff
|
2023-02-25 06:24:58 +01:00 |
|
Max Ryabinin
|
d15822a54b
|
Refactor _tile_indices into a cached property, fix device bug
|
2023-02-25 06:23:07 +01:00 |
|
Max Ryabinin
|
cc608c04c2
|
Revert the layout if weights were reordered
|
2023-02-25 06:02:06 +01:00 |
|
Max Ryabinin
|
cd4d904a4c
|
Raise an error when loading a quantized checkpoint before quantization
|
2023-02-25 06:01:34 +01:00 |
|
Tim Dettmers
|
9851a10b46
|
Added cast to fp4 layer for speed.
|
2023-02-24 10:17:57 -08:00 |
|
Mitchell Wortsman
|
75377d125e
|
new experiments
|
2023-02-24 00:10:15 +00:00 |
|
Tim Dettmers
|
5d2e23e8d6
|
Merge branch 'fp8sim' of github.com:TimDettmers/bitsandbytes into fp8sim
|
2023-02-23 10:56:49 -08:00 |
|
Tim Dettmers
|
c5c38ca19c
|
Added matmul_mixed.
|
2023-02-23 10:45:18 -08:00 |
|
Mitchell Wortsman
|
3fbf60ad83
|
sim now worse than real
|
2023-02-23 08:27:15 +00:00 |
|
Max Ryabinin
|
58b09ee1b1
|
[WIP] Implement proper serialization of Linear8bitLt
|
2023-02-21 12:04:47 +01:00 |
|
Mitchell Wortsman
|
7b764d3569
|
adding half() cast
|
2023-02-21 03:53:44 +00:00 |
|
Tim Dettmers
|
2489d819c5
|
Added more blocksizes for stochastic rounding; fixed dequant blocksize.
|
2023-02-14 13:55:17 -08:00 |
|
Tim Dettmers
|
c93a90d075
|
Fixed FP4 import and data type conversion in backward.
|
2023-02-14 13:31:39 -08:00 |
|
Tim Dettmers
|
2dfa3ce16d
|
Fixed LinearFP8 and added tests.
|
2023-02-13 17:48:52 -08:00 |
|
Tim Dettmers
|
fa255cbc56
|
Added missing import.
|
2023-02-13 17:29:39 -08:00 |
|
Tim Dettmers
|
ca3236587a
|
Added forward/backward tests; removed bias.
|
2023-02-13 17:20:52 -08:00 |
|
Tim Dettmers
|
6bdb6c351e
|
Added fp8 simulation layer.
|
2023-02-13 16:53:07 -08:00 |
|
Tim Dettmers
|
c0c352b379
|
Added bias test for LinearFP4 and basic test.
|
2023-02-05 06:29:52 -08:00 |
|
Tim Dettmers
|
c361f84239
|
Fixed matmul_fp4 transpose.
|
2023-02-05 06:16:56 -08:00 |
|
Tim Dettmers
|
cfe4705e32
|
Added matmul_fp4 to the benchmark.
|
2023-02-04 22:00:04 -08:00 |
|
Tim Dettmers
|
13c0a4dc5d
|
Backward matmul_fp4 passes.
|
2023-02-04 21:35:43 -08:00 |
|
Tim Dettmers
|
160a83580d
|
Forward matmul_fp4 tests pass.
|
2023-02-04 21:11:21 -08:00 |
|
Tim Dettmers
|
3ac5840c03
|
Added fp4 quant/dequant and dequant optimizations.
|
2023-02-04 14:52:04 -08:00 |
|
Kashif Rasul
|
c52365ac1d
|
Merge branch 'main' into patch-1
|
2023-02-03 09:01:48 +01:00 |
|
Tim Dettmers
|
0f5c394870
|
Added version 0.37.0.
|
2023-02-01 20:27:01 -08:00 |
|
Tim Dettmers
|
de53588934
|
Added Int8 matmul support for all GPUs. Full backward support.
|
2023-02-01 20:09:31 -08:00 |
|
Tim Dettmers
|
c9f505064e
|
Added outlier detector and fake quantization layer.
|
2023-01-28 17:05:22 -08:00 |
|
Kashif Rasul
|
59bf8fcff2
|
fix CUDASetup call
|
2023-01-04 17:47:18 +01:00 |
|
Kashif Rasul
|
792f6213a7
|
Fix for python 3.7
|
2023-01-04 17:38:33 +01:00 |
|
Tim Dettmers
|
1341fb44ad
|
Fixed issue where the CUDA SETUP was not printed.
|
2023-01-04 03:50:53 -08:00 |
|
Tim Dettmers
|
b3de19218e
|
Added error message for unexpected CUDA exception.
|
2023-01-03 06:57:07 -08:00 |
|
Tim Dettmers
|
81990491ff
|
Merge pull request #113 from Borzik/fix-warnings
Import missing warn function
|
2023-01-03 15:46:58 +01:00 |
|
Tim Dettmers
|
9180b4cc11
|
Added additional error message for cudart error #85
|
2023-01-03 06:44:11 -08:00 |
|
Tim Dettmers
|
211ad594df
|
Added error+instructions for unsupported CUDA 10.0 version #82
|
2023-01-03 06:07:35 -08:00 |
|
Felix Borzik
|
f3800bab75
|
import warn function
|
2023-01-03 13:23:34 +00:00 |
|
Tim Dettmers
|
9d353ca786
|
Merge pull request #87 from lostmsu/main
Add `device` and `dtype` parameters to `StableEmbedding`
|
2023-01-02 13:22:45 +01:00 |
|
Tim Dettmers
|
7a6563b6c8
|
Default to CPU library on CUDA error+small refactor.
|
2023-01-02 03:47:09 -08:00 |
|
Tim Dettmers
|
d9112dc55b
|
Merge pull request #110 from BlackHC/cublaslt_version
Improve cc version detection for cublaslt
|
2023-01-02 12:35:53 +01:00 |
|
Tim Dettmers
|
336e24696c
|
CUDASetup only executed once + fixed circular import.
|
2023-01-02 03:31:43 -08:00 |
|
Tim Dettmers
|
be5cecb88f
|
Merge branch 'main' into main
|
2023-01-02 11:23:17 +01:00 |
|
Tim Dettmers
|
c91f592ad7
|
Merge branch 'main' into cleanup
|
2023-01-02 11:19:16 +01:00 |
|
blackhc
|
ed17aa9a31
|
Don't mark it as failure though.
|
2022-12-29 23:50:48 +00:00 |
|
blackhc
|
7b39a5511d
|
Fix issue #97
|
2022-12-29 23:47:21 +00:00 |
|
Tim Dettmers
|
c059bd2848
|
Added additional blocksizes: {64, 128, 256}.
|
2022-11-20 14:18:15 -08:00 |
|
Tim Dettmers
|
eb028e6ebc
|
Fixed k-bit quantization maps.
|
2022-11-19 07:24:03 -08:00 |
|
Tom Aarsen
|
b104ce3b62
|
Merge branch 'main' into cleanup
|
2022-11-17 15:22:29 +01:00 |
|
Tim Dettmers
|
08fa2e7b01
|
Fixed bug in cpu quant; faster GPU dequant.
|
2022-11-07 18:06:18 -08:00 |
|
Tim Dettmers
|
62a333ac40
|
Added pre/post calls do quantize_blockwise.
|
2022-11-06 17:17:51 -08:00 |
|
Tim Dettmers
|
e0e697b150
|
Fixed blockwise test and logic.
|
2022-11-06 16:36:31 -08:00 |
|
Tim Dettmers
|
6bc2b992be
|
Added blocksizes 2048, 1024, and 512 to blockwise quant.
|
2022-11-06 16:27:48 -08:00 |
|
Tim Dettmers
|
2f2063bac2
|
Added k<256 quantile estimate.
|
2022-11-06 13:05:25 -08:00 |
|
Tim Dettmers
|
98cbc4bc4f
|
Added k-bit fp8 map.
|
2022-11-06 11:59:37 -08:00 |
|
Tim Dettmers
|
caf1832526
|
Added k-bit linear quantization.
|
2022-11-06 11:47:54 -08:00 |
|
Victor Nova
|
62d39a237c
|
add device and dtype parameters to StableEmbedding
|
2022-11-04 14:12:46 -07:00 |
|
Tim Dettmers
|
1efb87d89d
|
Added FP8 quantization map.
|
2022-11-03 19:49:50 -07:00 |
|
Tom Aarsen
|
62c0bd2278
|
Fix several typos in logging and comments
Via codespell
|
2022-11-01 09:53:47 +01:00 |
|
Tom Aarsen
|
d504050ff7
|
Call isort over cuda_setup/main.py
|
2022-11-01 09:46:03 +01:00 |
|
Tom Aarsen
|
30f28b94a0
|
Merge branch 'main' into cleanup
|
2022-11-01 09:43:49 +01:00 |
|
Tim Dettmers
|
8d87c0b852
|
Fixed CUDA setup bugs, including #81.
|
2022-10-31 18:04:49 -07:00 |
|
adpkadspokasdk
|
8724c990c7
|
allow hiding of the welcome message
|
2022-10-27 16:04:49 -06:00 |
|
Tim Dettmers
|
4844aef4ff
|
Fixing bad error when GPU was not detected for #73.
|
2022-10-27 08:54:30 -07:00 |
|
Tom Aarsen
|
c6dad28a08
|
Remove extraneous get_ptr calls
|
2022-10-27 13:53:16 +02:00 |
|
Tom Aarsen
|
7727fa4c8c
|
Remove f-prefix from strings that don't use formatting
|
2022-10-27 13:36:39 +02:00 |
|
Tom Aarsen
|
54bd6ed1d6
|
Remove unused imports
|
2022-10-27 13:32:01 +02:00 |
|
Tom Aarsen
|
ef70f2adcd
|
Fix bad indentation
|
2022-10-27 13:27:18 +02:00 |
|