Commit Graph

310 Commits

Author SHA1 Message Date
arlo-phoenix
e38b9e91b7 Revert get_cuda_version ROCM version change
not called anymore
2023-08-08 21:31:20 +02:00
arlo-phoenix
0b481bfcc2 Use workaround for ROCm wave32 recognition
just sets __AMDGCN_WAVEFRONT_SIZE forcefully to 32.
Not correct (some GPU's don't support wave32), but works
on the supported GPU's. Can disable with DISABLE_WARP_32

With this blockwise quantize works and with that nf4 is supported.
2023-08-08 18:50:26 +00:00
arlo-phoenix
40361ecfbb Adapt python to work with HIP 2023-08-05 02:12:48 +02:00
Tim Dettmers
3c9aca9124 Fixed two bugs in dynamic data type creation. 2023-08-03 19:47:15 -07:00
Tim Dettmers
412fd0e717 Added better default compute_dtype handling for Linear4bit layers. 2023-07-22 12:56:29 -07:00
Tim Dettmers
c82f51c0f7 Increased occupancy. 2023-07-19 16:08:37 -07:00
Tim Dettmers
f3232d1391 Fixed bug where read-permission was assumed for a file. #497 2023-07-16 21:08:13 -07:00
Tim Dettmers
37c25c1e0d Merge branch 'main' of github.com:TimDettmers/bitsandbytes into main 2023-07-15 10:22:45 -07:00
Tim Dettmers
f4996978db Added missing check if LD_LIBRARY_PATH exists. #588 2023-07-15 10:22:08 -07:00
Tim Dettmers
6102029ab9
Merge pull request #587 from BramVanroy/patch-1
replace private with public https repo URL
2023-07-15 10:04:34 -07:00
ihsanturk
ce126d462d deleted references to get_cuda_lib_handle 2023-07-15 02:49:57 -07:00
ihsanturk
2f0f0e5dba get_cuda_lib_handle brought back so import works 2023-07-15 02:24:46 -07:00
Tim Dettmers
6ec4f0c374 Changed CUDA_INSTALL variable to BNB_CUDA_INSTALL. 2023-07-14 18:16:45 -07:00
Bilel Omrani
35dbb1ff52 Fix bitsandbytes import error when CUDA is unavailable 2023-07-15 02:04:26 +02:00
Tim Dettmers
55f4c398a0 Polished CUDA SETUP replacement and added docs. 2023-07-14 12:50:59 -07:00
Tim Dettmers
1ab6758b36 Changed CUDA setup to use PyTorch default; added a weak test. 2023-07-13 23:58:41 -07:00
Tim Dettmers
ac155f7415 Merge branch 'main' into bugfixes 2023-07-13 21:55:35 -07:00
Tim Dettmers
e8df8d64a2
Merge pull request #375 from rapsealk/fix/libcuda-to-torch
Replace libcudart.so with PyTorch's CUDA APIs
2023-07-13 21:54:47 -07:00
Tim Dettmers
c00402f17e Fixed a bug in absmax float conversion. 2023-07-13 21:47:38 -07:00
Tim Dettmers
6689afaec4
Merge pull request #567 from apbard/patch-1
[BugFix] replace view+continuous with reshape
2023-07-13 21:45:00 -07:00
Tim Dettmers
67475257a9 Added documentation for NF4; failing 8-bit matmul; fixed absmax bug. #529 #543 2023-07-13 21:41:43 -07:00
Tim Dettmers
097b1cc5da Fixed bug caused by undefined default type of absmax. #553 2023-07-13 21:23:33 -07:00
Bram Vanroy
91c4fd844b
add public git repo URL 2023-07-14 00:51:05 +02:00
Tim Dettmers
90b0ac57b0 Fixed missing bias in bnb.matmul_4bit for inference; more tests. 2023-07-11 17:13:33 -07:00
Tim Dettmers
ba51d95d43 Added more extensive gemv tests; blocksize guard for gemv. 2023-07-11 05:55:49 -07:00
Tim Dettmers
5f492d437e Merge remote-tracking branch 'origin/inference' 2023-07-10 06:24:24 -07:00
Tim Dettmers
196d6f5dc1
Merge pull request #469 from shadeMe/linear-layer-device
Add `device` parameter to `Linear` subclasses and `Embedding`
2023-07-10 06:17:13 -07:00
Tim Dettmers
5fab673442 Added fp32 compute type for gemv_4bit. 2023-07-09 21:06:01 -07:00
Tim Dettmers
cef519c89e Added test for Param4bit.to() and fixed double quant behavior. 2023-07-09 17:16:50 -07:00
Tim Dettmers
6a905be5ce Fixed a bug where gemv_4bit would return a wrongly sized tensor. 2023-07-09 15:34:02 -07:00
Tim Dettmers
0f0390acb2 Added double quantization support and tests. 2023-07-09 15:32:03 -07:00
Tim Dettmers
94168d79d7 Added FP4 fast inference support. 2023-07-09 14:46:19 -07:00
Tim Dettmers
4b88d69de7 Added abitrary data types; fixed a bug for small matrices. 2023-07-09 12:04:09 -07:00
Alessandro Pietro Bardelli
463630dc73
[BugFix] replace view+continuous with reshape 2023-07-06 12:26:03 +02:00
Jeongseok Kang
a24aae30bf Merge branch 'main' into fix/libcuda-to-torch 2023-07-06 15:43:42 +09:00
Tim Dettmers
02fd80cb81 Added bfloat16 quantizations and tests. 2023-07-04 19:58:31 -07:00
Tim Dettmers
f89ff93e26 Initial 4-bit naive batch size 1, 81 vs 185. 2023-07-03 18:45:38 -07:00
Max Ryabinin
b599fdb197 Only rearrange weight if it exists 2023-06-14 19:27:13 +02:00
Max Ryabinin
c1f3f56d2c Rearrange the weights directly in state dict before loading 2023-06-09 21:58:39 +02:00
Max Ryabinin
f734076e94 Improve memory efficiency of 8-bit serialization 2023-06-09 21:39:57 +02:00
Max Ryabinin
4fb37d45c1 Extract get_tile_inds to a separate function 2023-06-09 21:39:37 +02:00
shadeMe
db49ad43ab
Add device parameter to Embedding 2023-06-01 17:43:49 +02:00
shadeMe
9cac5dd1b6
Add device parameter to Linear subclasses 2023-06-01 17:43:30 +02:00
Tim Dettmers
1b8772a8f3 Added PagedLion and bf16 Lion. 2023-05-23 19:37:38 -07:00
Tim Dettmers
2bce175d15 Fixed Makefile. 2023-05-23 18:42:19 -07:00
Tim Dettmers
4bd1151829 Fixed gradient accumulation test. 2023-05-07 15:06:17 -07:00
Tim Dettmers
675baa79d2 Merge remote-tracking branch 'origin/main' into merge 2023-05-07 13:34:03 -07:00
Tim Dettmers
f64cfe65aa Fixed prefetch bug for non-paged tensors; added benchmark. 2023-05-06 21:49:16 -07:00
Tim Dettmers
41a9c70814 Changed prefetching. 2023-05-06 18:59:59 -07:00
Tim Dettmers
44d68ff29c Added paged optimizers. 2023-05-06 14:59:29 -07:00