bitsandbytes-rocm

Author	SHA1	Message	Date
mrq	c88f97a9c8	drop support for gfx903 because depending on hipblaslt gums up too many things	2023-10-12 19:16:14 -05:00
arlo-phoenix	e38b9e91b7	Revert get_cuda_version ROCM version change not called anymore	2023-08-08 21:31:20 +02:00
arlo-phoenix	c97c78bd66	Update README rocm quickstart	2023-08-08 21:28:37 +02:00
arlo-phoenix	0b481bfcc2	Use workaround for ROCm wave32 recognition just sets __AMDGCN_WAVEFRONT_SIZE forcefully to 32. Not correct (some GPU's don't support wave32), but works on the supported GPU's. Can disable with DISABLE_WARP_32 With this blockwise quantize works and with that nf4 is supported.	2023-08-08 18:50:26 +00:00
arlo-phoenix	615d47583f	README: Add quickstart and info section	2023-08-05 02:42:13 +02:00
arlo-phoenix	705bc024d2	Makefile: Add make hip	2023-08-05 02:41:58 +02:00
arlo-phoenix	40361ecfbb	Adapt python to work with HIP	2023-08-05 02:12:48 +02:00
arlo-phoenix	3682106eb0	Algo-Direct2.h: fix hipcc issue from https://github.com/agrocylo/bitsandbytes-rocm, thanks	2023-08-05 02:12:14 +02:00
arlo-phoenix	d10197bc93	Add HIP to cuda defines collected by hipifying all files and then comparing with original Cuda file	2023-08-05 02:11:46 +02:00
Tim Dettmers	18e827d666	Version 0.41.1.	2023-08-03 20:01:10 -07:00
Tim Dettmers	3c9aca9124	Fixed two bugs in dynamic data type creation.	2023-08-03 19:47:15 -07:00
Tim Dettmers	a06a0f6a08	Bumped version for new release.	2023-07-22 13:07:08 -07:00
Tim Dettmers	412fd0e717	Added better default compute_dtype handling for Linear4bit layers.	2023-07-22 12:56:29 -07:00
Tim Dettmers	c82f51c0f7	Increased occupancy.	2023-07-19 16:08:37 -07:00
Tim Dettmers	e229fbce66	Added latest changes.	2023-07-16 21:23:57 -07:00
Tim Dettmers	7be5f2c7b3	Guard for prefetchAsync GPU capability. #470 #451 #477	2023-07-16 21:12:03 -07:00
Tim Dettmers	f3232d1391	Fixed bug where read-permission was assumed for a file. #497	2023-07-16 21:08:13 -07:00
Tim Dettmers	37c25c1e0d	Merge branch 'main' of github.com:TimDettmers/bitsandbytes into main	2023-07-15 10:22:45 -07:00
Tim Dettmers	f4996978db	Added missing check if LD_LIBRARY_PATH exists. #588	2023-07-15 10:22:08 -07:00
Tim Dettmers	6102029ab9	Merge pull request #587 from BramVanroy/patch-1 replace private with public https repo URL	2023-07-15 10:04:34 -07:00
Tim Dettmers	67a3cdf652	Merge pull request #595 from ihsanturk/FIX-__main__.py-REFERENCE-TO-NONEXISTENT-get_cuda_lib_handle Fix import crash caused by __main__.py reference to nonexistent cuda_setup.main.get_cuda_lib_handle	2023-07-15 10:04:15 -07:00
ihsanturk	ce126d462d	deleted references to get_cuda_lib_handle	2023-07-15 02:49:57 -07:00
ihsanturk	2f0f0e5dba	get_cuda_lib_handle brought back so import works	2023-07-15 02:24:46 -07:00
Tim Dettmers	6ec4f0c374	Changed CUDA_INSTALL variable to BNB_CUDA_INSTALL.	2023-07-14 18:16:45 -07:00
Tim Dettmers	8cdec888b1	Merge pull request #593 from bilelomrani1/main Fix bitsandbytes import error when CUDA is unavailable	2023-07-14 17:47:48 -07:00
Bilel Omrani	35dbb1ff52	Fix bitsandbytes import error when CUDA is unavailable	2023-07-15 02:04:26 +02:00
Tim Dettmers	486488bccb	Bumped version.	2023-07-14 12:55:57 -07:00
Tim Dettmers	6c6e5fcb53	Added changelog entry.	2023-07-14 12:55:04 -07:00
Tim Dettmers	55f4c398a0	Polished CUDA SETUP replacement and added docs.	2023-07-14 12:50:59 -07:00
Tim Dettmers	1ab6758b36	Changed CUDA setup to use PyTorch default; added a weak test.	2023-07-13 23:58:41 -07:00
Tim Dettmers	ac155f7415	Merge branch 'main' into bugfixes	2023-07-13 21:55:35 -07:00
Tim Dettmers	e8df8d64a2	Merge pull request #375 from rapsealk/fix/libcuda-to-torch Replace libcudart.so with PyTorch's CUDA APIs	2023-07-13 21:54:47 -07:00
Tim Dettmers	c00402f17e	Fixed a bug in absmax float conversion.	2023-07-13 21:47:38 -07:00
Tim Dettmers	6689afaec4	Merge pull request #567 from apbard/patch-1 [BugFix] replace view+continuous with reshape	2023-07-13 21:45:00 -07:00
Tim Dettmers	67475257a9	Added documentation for NF4; failing 8-bit matmul; fixed absmax bug. #529 #543	2023-07-13 21:41:43 -07:00
Tim Dettmers	8a20cd864b	Added missing scipy requirement. Addressing #544	2023-07-13 21:25:07 -07:00
Tim Dettmers	097b1cc5da	Fixed bug caused by undefined default type of absmax. #553	2023-07-13 21:23:33 -07:00
Tim Dettmers	7b6cfe1738	Added H100 support for CUDA 11.8 precompiled binaries.	2023-07-13 21:16:23 -07:00
Bram Vanroy	91c4fd844b	add public git repo URL	2023-07-14 00:51:05 +02:00
Tim Dettmers	817bdf6325	Bumped version after hotfix.	2023-07-11 17:16:05 -07:00
Tim Dettmers	90b0ac57b0	Fixed missing bias in bnb.matmul_4bit for inference; more tests.	2023-07-11 17:13:33 -07:00
Tim Dettmers	dc96e9e7c8	Test for bloom that fails with inference kernels.	2023-07-11 15:40:20 -07:00
Tim Dettmers	ae7cd6ad14	Bump version.	2023-07-11 05:58:25 -07:00
Tim Dettmers	ba51d95d43	Added more extensive gemv tests; blocksize guard for gemv.	2023-07-11 05:55:49 -07:00
Tim Dettmers	b8da4a165a	Bump on version.	2023-07-10 16:40:22 -07:00
Tim Dettmers	a26a321e07	Removed debugging statement.	2023-07-10 14:34:19 -07:00
Tim Dettmers	306f6b2362	Fixed accidential deletion of limits in kernel.	2023-07-10 14:24:33 -07:00
Tim Dettmers	2221f4cee0	Fixed potential memory leak.	2023-07-10 13:57:44 -07:00
Tim Dettmers	490153b29f	Added generation tests.	2023-07-10 12:19:16 -07:00
Tim Dettmers	1c774ecebb	Added ARCH guard for bfloat16 computations.	2023-07-10 09:53:23 -07:00

1 2 3 4 5 ...

488 Commits