bitsandbytes-rocm

mrq/bitsandbytes-rocm

Fork 1

c88f97a9c8 drop support for gfx903 because depending on hipblaslt gums up too many things master mrq 2023-10-12 19:16:14 -0500
e38b9e91b7 Revert get_cuda_version ROCM version change arlo-phoenix 2023-08-08 21:31:20 +0200
c97c78bd66 Update README rocm quickstart arlo-phoenix 2023-08-08 21:28:37 +0200
0b481bfcc2 Use workaround for ROCm wave32 recognition arlo-phoenix 2023-08-08 18:50:26 +0000
615d47583f README: Add quickstart and info section arlo-phoenix 2023-08-05 02:13:25 +0200
705bc024d2 Makefile: Add make hip arlo-phoenix 2023-08-05 02:41:58 +0200
40361ecfbb Adapt python to work with HIP arlo-phoenix 2023-08-05 02:12:48 +0200
3682106eb0 Algo-Direct2.h: fix hipcc issue arlo-phoenix 2023-08-05 02:12:14 +0200
d10197bc93 Add HIP to cuda defines arlo-phoenix 2023-08-05 02:11:46 +0200
18e827d666 Version 0.41.1. Tim Dettmers 2023-08-03 20:01:10 -0700
3c9aca9124 Fixed two bugs in dynamic data type creation. Tim Dettmers 2023-08-03 19:47:15 -0700
a06a0f6a08 Bumped version for new release. Tim Dettmers 2023-07-22 13:07:08 -0700
412fd0e717 Added better default compute_dtype handling for Linear4bit layers. Tim Dettmers 2023-07-22 12:56:29 -0700
c82f51c0f7 Increased occupancy. Tim Dettmers 2023-07-19 16:08:37 -0700
e229fbce66 Added latest changes. Tim Dettmers 2023-07-16 21:23:57 -0700
7be5f2c7b3 Guard for prefetchAsync GPU capability. #470 #451 #477 Tim Dettmers 2023-07-16 21:12:03 -0700
f3232d1391 Fixed bug where read-permission was assumed for a file. #497 Tim Dettmers 2023-07-16 21:08:13 -0700
37c25c1e0d Merge branch 'main' of github.com:TimDettmers/bitsandbytes into main Tim Dettmers 2023-07-15 10:22:45 -0700
f4996978db Added missing check if LD_LIBRARY_PATH exists. #588 Tim Dettmers 2023-07-15 10:22:08 -0700
6102029ab9

Merge pull request #587 from BramVanroy/patch-1 Tim Dettmers 2023-07-15 10:04:34 -0700
67a3cdf652

Merge pull request #595 from ihsanturk/FIX-__main__.py-REFERENCE-TO-NONEXISTENT-get_cuda_lib_handle Tim Dettmers 2023-07-15 10:04:15 -0700
ce126d462d deleted references to get_cuda_lib_handle ihsanturk 2023-07-15 02:49:57 -0700
2f0f0e5dba get_cuda_lib_handle brought back so import works ihsanturk 2023-07-15 02:24:46 -0700
6ec4f0c374 Changed CUDA_INSTALL variable to BNB_CUDA_INSTALL. Tim Dettmers 2023-07-14 18:16:45 -0700
8cdec888b1

Merge pull request #593 from bilelomrani1/main Tim Dettmers 2023-07-14 17:47:48 -0700
35dbb1ff52 Fix bitsandbytes import error when CUDA is unavailable Bilel Omrani 2023-07-15 02:04:26 +0200
486488bccb Bumped version. Tim Dettmers 2023-07-14 12:55:57 -0700
6c6e5fcb53 Added changelog entry. Tim Dettmers 2023-07-14 12:55:04 -0700
55f4c398a0 Polished CUDA SETUP replacement and added docs. Tim Dettmers 2023-07-14 12:50:59 -0700
1ab6758b36 Changed CUDA setup to use PyTorch default; added a weak test. Tim Dettmers 2023-07-13 23:58:41 -0700
ac155f7415 Merge branch 'main' into bugfixes Tim Dettmers 2023-07-13 21:55:35 -0700
e8df8d64a2

Merge pull request #375 from rapsealk/fix/libcuda-to-torch Tim Dettmers 2023-07-13 21:54:47 -0700
c00402f17e Fixed a bug in absmax float conversion. Tim Dettmers 2023-07-13 21:47:38 -0700
6689afaec4

Merge pull request #567 from apbard/patch-1 Tim Dettmers 2023-07-13 21:45:00 -0700
67475257a9 Added documentation for NF4; failing 8-bit matmul; fixed absmax bug. #529 #543 Tim Dettmers 2023-07-13 21:41:43 -0700
8a20cd864b Added missing scipy requirement. Addressing #544 Tim Dettmers 2023-07-13 21:25:07 -0700
097b1cc5da Fixed bug caused by undefined default type of absmax. #553 Tim Dettmers 2023-07-13 21:23:33 -0700
7b6cfe1738 Added H100 support for CUDA 11.8 precompiled binaries. Tim Dettmers 2023-07-13 21:16:23 -0700
91c4fd844b

add public git repo URL Bram Vanroy 2023-07-14 00:51:05 +0200
817bdf6325 Bumped version after hotfix. Tim Dettmers 2023-07-11 17:16:05 -0700
90b0ac57b0 Fixed missing bias in bnb.matmul_4bit for inference; more tests. Tim Dettmers 2023-07-11 17:13:33 -0700
dc96e9e7c8 Test for bloom that fails with inference kernels. Tim Dettmers 2023-07-11 15:40:20 -0700
ae7cd6ad14 Bump version. Tim Dettmers 2023-07-11 05:58:25 -0700
ba51d95d43 Added more extensive gemv tests; blocksize guard for gemv. Tim Dettmers 2023-07-11 05:55:49 -0700
b8da4a165a Bump on version. Tim Dettmers 2023-07-10 16:40:22 -0700
a26a321e07 Removed debugging statement. Tim Dettmers 2023-07-10 14:34:19 -0700
306f6b2362 Fixed accidential deletion of limits in kernel. Tim Dettmers 2023-07-10 14:24:33 -0700
2221f4cee0 Fixed potential memory leak. Tim Dettmers 2023-07-10 13:57:44 -0700
490153b29f Added generation tests. Tim Dettmers 2023-07-10 12:19:16 -0700
1c774ecebb Added ARCH guard for bfloat16 computations. Tim Dettmers 2023-07-10 09:53:23 -0700
0a1cced375 Fixed typo in cuda_install.sh. Tim Dettmers 2023-07-10 06:40:19 -0700
0d344b70ba Changelog and version bump. Tim Dettmers 2023-07-10 06:38:57 -0700
73aa4e0a33 Fixed Makefile and added CUDA 12.2 install. Tim Dettmers 2023-07-10 06:34:04 -0700
5f492d437e Merge remote-tracking branch 'origin/inference' Tim Dettmers 2023-07-10 06:24:24 -0700
196d6f5dc1

Merge pull request #469 from shadeMe/linear-layer-device Tim Dettmers 2023-07-10 06:17:13 -0700
5fab673442 Added fp32 compute type for gemv_4bit. Tim Dettmers 2023-07-09 21:06:01 -0700
cef519c89e Added test for Param4bit.to() and fixed double quant behavior. Tim Dettmers 2023-07-09 17:16:50 -0700
6a905be5ce Fixed a bug where gemv_4bit would return a wrongly sized tensor. Tim Dettmers 2023-07-09 15:34:02 -0700
0f0390acb2 Added double quantization support and tests. Tim Dettmers 2023-07-09 15:32:03 -0700
94168d79d7 Added FP4 fast inference support. Tim Dettmers 2023-07-09 14:46:19 -0700
4b88d69de7 Added abitrary data types; fixed a bug for small matrices. Tim Dettmers 2023-07-09 12:04:09 -0700
eefbf60270 Turning optimization (float accumulation). 185 vs 50. Tim Dettmers 2023-07-08 16:31:58 -0700
7e49b5b938 Added warp_shuffle indexing 185 vs 54. Tim Dettmers 2023-07-08 14:27:12 -0700
463630dc73

[BugFix] replace view+continuous with reshape Alessandro Pietro Bardelli 2023-07-06 12:26:03 +0200
a24aae30bf Merge branch 'main' into fix/libcuda-to-torch Jeongseok Kang 2023-07-06 15:43:42 +0900
02fd80cb81 Added bfloat16 quantizations and tests. Tim Dettmers 2023-07-04 19:58:31 -0700
dfe6900b94 Vectorized loads, conflict free NF4; 52 vs 172. Tim Dettmers 2023-07-04 15:20:10 -0700
f89ff93e26 Initial 4-bit naive batch size 1, 81 vs 185. Tim Dettmers 2023-07-03 18:45:38 -0700
4395d68cf6 Release 0.39.1. Tim Dettmers 2023-06-19 19:40:41 -0700
2d321a7524

Merge pull request #503 from TimDettmers/efficient_8bit_serialize Tim Dettmers 2023-06-19 11:28:30 -0700
b599fdb197 Only rearrange weight if it exists Max Ryabinin 2023-06-14 19:27:13 +0200
c1f3f56d2c Rearrange the weights directly in state dict before loading Max Ryabinin 2023-06-09 21:58:39 +0200
f734076e94 Improve memory efficiency of 8-bit serialization Max Ryabinin 2023-06-09 21:39:57 +0200
4fb37d45c1 Extract get_tile_inds to a separate function Max Ryabinin 2023-06-09 21:39:37 +0200
db49ad43ab

Add device parameter to Embedding shadeMe 2023-06-01 17:43:49 +0200
9cac5dd1b6

Add device parameter to Linear subclasses shadeMe 2023-06-01 17:43:30 +0200
e54d2730fc Added debugging functions. Tim Dettmers 2023-05-30 20:42:21 -0700
b7f04e2a20 Added lookup table. Tim Dettmers 2023-05-30 20:07:05 -0700
ac5550a023 Added changes for deployment. Tim Dettmers 2023-05-30 19:06:59 -0700
0f40fa3f0a Bumped version. Tim Dettmers 2023-05-23 19:55:52 -0700
1b8772a8f3 Added PagedLion and bf16 Lion. Tim Dettmers 2023-05-23 19:37:38 -0700
2bce175d15 Fixed Makefile. Tim Dettmers 2023-05-23 18:42:19 -0700
4bd1151829 Fixed gradient accumulation test. Tim Dettmers 2023-05-07 15:06:17 -0700
675baa79d2 Merge remote-tracking branch 'origin/main' into merge Tim Dettmers 2023-05-07 13:34:03 -0700
f64cfe65aa Fixed prefetch bug for non-paged tensors; added benchmark. Tim Dettmers 2023-05-06 21:49:16 -0700
41a9c70814 Changed prefetching. Tim Dettmers 2023-05-06 18:59:59 -0700
44d68ff29c Added paged optimizers. Tim Dettmers 2023-05-06 14:59:29 -0700
ec38ba95b0 Added paging. Tim Dettmers 2023-05-06 11:14:06 -0700
264a948539 4-bit draft; 128 vector load 240. Tim Dettmers 2023-05-02 16:15:38 -0700
869b7e83b5 Warp multi-specialization 240. Tim Dettmers 2023-05-02 12:10:32 -0700
77f15fdce9 Shared memory efficient 240. Tim Dettmers 2023-05-02 11:38:11 -0700
89cccd8196 A tile multi-tiling. Tim Dettmers 2023-05-02 09:40:31 -0700
4decb3cc68 Removed uncessary sync. Tim Dettmers 2023-05-02 09:38:14 -0700
394749db71 Correct implementation 240. Tim Dettmers 2023-05-02 08:58:59 -0700
9aa232cc39 Initial. Tim Dettmers 2023-05-02 07:53:29 -0700
9192c9de64 Tighter and scaled error analysis. Tim Dettmers 2023-05-02 07:50:32 -0700
f9bfea8f23 Baseline for debugging. Tim Dettmers 2023-05-02 07:24:12 -0700
7bfa09d0fc 8x32 240 6 warps. Tim Dettmers 2023-05-01 16:38:09 -0700
3d4a2eadd3 16x16 240. Tim Dettmers 2023-05-01 16:23:45 -0700
7cc8ff4727 Warp specalization 362. Tim Dettmers 2023-05-01 08:21:12 -0700

Commit Graph Select branches Hide Pull Requests master stale #2 Mono Color

Commit Graph

Select branches

Hide Pull Requests

master

stale

#2