Commit Graph

488 Commits (master)
 

Author SHA1 Message Date
mrq c88f97a9c8 drop support for gfx903 because depending on hipblaslt gums up too many things 2023-10-12 19:16:14 +07:00
arlo-phoenix e38b9e91b7 Revert get_cuda_version ROCM version change
not called anymore
2023-08-08 21:31:20 +07:00
arlo-phoenix c97c78bd66 Update README rocm quickstart 2023-08-08 21:28:37 +07:00
arlo-phoenix 0b481bfcc2 Use workaround for ROCm wave32 recognition
just sets __AMDGCN_WAVEFRONT_SIZE forcefully to 32.
Not correct (some GPU's don't support wave32), but works
on the supported GPU's. Can disable with DISABLE_WARP_32

With this blockwise quantize works and with that nf4 is supported.
2023-08-08 18:50:26 +07:00
arlo-phoenix 615d47583f README: Add quickstart and info section 2023-08-05 02:42:13 +07:00
arlo-phoenix 705bc024d2 Makefile: Add make hip 2023-08-05 02:41:58 +07:00
arlo-phoenix 40361ecfbb Adapt python to work with HIP 2023-08-05 02:12:48 +07:00
arlo-phoenix 3682106eb0 Algo-Direct2.h: fix hipcc issue
from https://github.com/agrocylo/bitsandbytes-rocm, thanks
2023-08-05 02:12:14 +07:00
arlo-phoenix d10197bc93 Add HIP to cuda defines
collected by hipifying all files and then comparing with original
Cuda file
2023-08-05 02:11:46 +07:00
Tim Dettmers 18e827d666 Version 0.41.1. 2023-08-03 20:01:10 +07:00
Tim Dettmers 3c9aca9124 Fixed two bugs in dynamic data type creation. 2023-08-03 19:47:15 +07:00
Tim Dettmers a06a0f6a08 Bumped version for new release. 2023-07-22 13:07:08 +07:00
Tim Dettmers 412fd0e717 Added better default compute_dtype handling for Linear4bit layers. 2023-07-22 12:56:29 +07:00
Tim Dettmers c82f51c0f7 Increased occupancy. 2023-07-19 16:08:37 +07:00
Tim Dettmers e229fbce66 Added latest changes. 2023-07-16 21:23:57 +07:00
Tim Dettmers 7be5f2c7b3 Guard for prefetchAsync GPU capability. #470 #451 #477 2023-07-16 21:12:03 +07:00
Tim Dettmers f3232d1391 Fixed bug where read-permission was assumed for a file. #497 2023-07-16 21:08:13 +07:00
Tim Dettmers 37c25c1e0d Merge branch 'main' of github.com:TimDettmers/bitsandbytes into main 2023-07-15 10:22:45 +07:00
Tim Dettmers f4996978db Added missing check if LD_LIBRARY_PATH exists. #588 2023-07-15 10:22:08 +07:00
Tim Dettmers 6102029ab9
Merge pull request #587 from BramVanroy/patch-1
replace private with public https repo URL
2023-07-15 10:04:34 +07:00
Tim Dettmers 67a3cdf652
Merge pull request #595 from ihsanturk/FIX-__main__.py-REFERENCE-TO-NONEXISTENT-get_cuda_lib_handle
Fix  import crash caused by __main__.py reference to nonexistent cuda_setup.main.get_cuda_lib_handle
2023-07-15 10:04:15 +07:00
ihsanturk ce126d462d deleted references to get_cuda_lib_handle 2023-07-15 02:49:57 +07:00
ihsanturk 2f0f0e5dba get_cuda_lib_handle brought back so import works 2023-07-15 02:24:46 +07:00
Tim Dettmers 6ec4f0c374 Changed CUDA_INSTALL variable to BNB_CUDA_INSTALL. 2023-07-14 18:16:45 +07:00
Tim Dettmers 8cdec888b1
Merge pull request #593 from bilelomrani1/main
Fix bitsandbytes import error when CUDA is unavailable
2023-07-14 17:47:48 +07:00
Bilel Omrani 35dbb1ff52 Fix bitsandbytes import error when CUDA is unavailable 2023-07-15 02:04:26 +07:00
Tim Dettmers 486488bccb Bumped version. 2023-07-14 12:55:57 +07:00
Tim Dettmers 6c6e5fcb53 Added changelog entry. 2023-07-14 12:55:04 +07:00
Tim Dettmers 55f4c398a0 Polished CUDA SETUP replacement and added docs. 2023-07-14 12:50:59 +07:00
Tim Dettmers 1ab6758b36 Changed CUDA setup to use PyTorch default; added a weak test. 2023-07-13 23:58:41 +07:00
Tim Dettmers ac155f7415 Merge branch 'main' into bugfixes 2023-07-13 21:55:35 +07:00
Tim Dettmers e8df8d64a2
Merge pull request #375 from rapsealk/fix/libcuda-to-torch
Replace libcudart.so with PyTorch's CUDA APIs
2023-07-13 21:54:47 +07:00
Tim Dettmers c00402f17e Fixed a bug in absmax float conversion. 2023-07-13 21:47:38 +07:00
Tim Dettmers 6689afaec4
Merge pull request #567 from apbard/patch-1
[BugFix] replace view+continuous with reshape
2023-07-13 21:45:00 +07:00
Tim Dettmers 67475257a9 Added documentation for NF4; failing 8-bit matmul; fixed absmax bug. #529 #543 2023-07-13 21:41:43 +07:00
Tim Dettmers 8a20cd864b Added missing scipy requirement. Addressing #544 2023-07-13 21:25:07 +07:00
Tim Dettmers 097b1cc5da Fixed bug caused by undefined default type of absmax. #553 2023-07-13 21:23:33 +07:00
Tim Dettmers 7b6cfe1738 Added H100 support for CUDA 11.8 precompiled binaries. 2023-07-13 21:16:23 +07:00
Bram Vanroy 91c4fd844b
add public git repo URL 2023-07-14 00:51:05 +07:00
Tim Dettmers 817bdf6325 Bumped version after hotfix. 2023-07-11 17:16:05 +07:00
Tim Dettmers 90b0ac57b0 Fixed missing bias in bnb.matmul_4bit for inference; more tests. 2023-07-11 17:13:33 +07:00
Tim Dettmers dc96e9e7c8 Test for bloom that fails with inference kernels. 2023-07-11 15:40:20 +07:00
Tim Dettmers ae7cd6ad14 Bump version. 2023-07-11 05:58:25 +07:00
Tim Dettmers ba51d95d43 Added more extensive gemv tests; blocksize guard for gemv. 2023-07-11 05:55:49 +07:00
Tim Dettmers b8da4a165a Bump on version. 2023-07-10 16:40:22 +07:00
Tim Dettmers a26a321e07 Removed debugging statement. 2023-07-10 14:34:19 +07:00
Tim Dettmers 306f6b2362 Fixed accidential deletion of limits in kernel. 2023-07-10 14:24:33 +07:00
Tim Dettmers 2221f4cee0 Fixed potential memory leak. 2023-07-10 13:57:44 +07:00
Tim Dettmers 490153b29f Added generation tests. 2023-07-10 12:19:16 +07:00
Tim Dettmers 1c774ecebb Added ARCH guard for bfloat16 computations. 2023-07-10 09:53:23 +07:00