bitsandbytes-rocm

Author	SHA1	Message	Date
Tim Dettmers	486488bccb	Bumped version.	2023-07-14 12:55:57 -07:00
Tim Dettmers	6c6e5fcb53	Added changelog entry.	2023-07-14 12:55:04 -07:00
Tim Dettmers	55f4c398a0	Polished CUDA SETUP replacement and added docs.	2023-07-14 12:50:59 -07:00
Tim Dettmers	1ab6758b36	Changed CUDA setup to use PyTorch default; added a weak test.	2023-07-13 23:58:41 -07:00
Tim Dettmers	ac155f7415	Merge branch 'main' into bugfixes	2023-07-13 21:55:35 -07:00
Tim Dettmers	e8df8d64a2	Merge pull request #375 from rapsealk/fix/libcuda-to-torch Replace libcudart.so with PyTorch's CUDA APIs	2023-07-13 21:54:47 -07:00
Tim Dettmers	c00402f17e	Fixed a bug in absmax float conversion.	2023-07-13 21:47:38 -07:00
Tim Dettmers	6689afaec4	Merge pull request #567 from apbard/patch-1 [BugFix] replace view+continuous with reshape	2023-07-13 21:45:00 -07:00
Tim Dettmers	67475257a9	Added documentation for NF4; failing 8-bit matmul; fixed absmax bug. #529 #543	2023-07-13 21:41:43 -07:00
Tim Dettmers	8a20cd864b	Added missing scipy requirement. Addressing #544	2023-07-13 21:25:07 -07:00
Tim Dettmers	097b1cc5da	Fixed bug caused by undefined default type of absmax. #553	2023-07-13 21:23:33 -07:00
Tim Dettmers	7b6cfe1738	Added H100 support for CUDA 11.8 precompiled binaries.	2023-07-13 21:16:23 -07:00
Tim Dettmers	817bdf6325	Bumped version after hotfix.	2023-07-11 17:16:05 -07:00
Tim Dettmers	90b0ac57b0	Fixed missing bias in bnb.matmul_4bit for inference; more tests.	2023-07-11 17:13:33 -07:00
Tim Dettmers	dc96e9e7c8	Test for bloom that fails with inference kernels.	2023-07-11 15:40:20 -07:00
Tim Dettmers	ae7cd6ad14	Bump version.	2023-07-11 05:58:25 -07:00
Tim Dettmers	ba51d95d43	Added more extensive gemv tests; blocksize guard for gemv.	2023-07-11 05:55:49 -07:00
Tim Dettmers	b8da4a165a	Bump on version.	2023-07-10 16:40:22 -07:00
Tim Dettmers	a26a321e07	Removed debugging statement.	2023-07-10 14:34:19 -07:00
Tim Dettmers	306f6b2362	Fixed accidential deletion of limits in kernel.	2023-07-10 14:24:33 -07:00
Tim Dettmers	2221f4cee0	Fixed potential memory leak.	2023-07-10 13:57:44 -07:00
Tim Dettmers	490153b29f	Added generation tests.	2023-07-10 12:19:16 -07:00
Tim Dettmers	1c774ecebb	Added ARCH guard for bfloat16 computations.	2023-07-10 09:53:23 -07:00
Tim Dettmers	0a1cced375	Fixed typo in cuda_install.sh.	2023-07-10 06:40:19 -07:00
Tim Dettmers	0d344b70ba	Changelog and version bump.	2023-07-10 06:38:57 -07:00
Tim Dettmers	73aa4e0a33	Fixed Makefile and added CUDA 12.2 install.	2023-07-10 06:34:04 -07:00
Tim Dettmers	5f492d437e	Merge remote-tracking branch 'origin/inference'	2023-07-10 06:24:24 -07:00
Tim Dettmers	196d6f5dc1	Merge pull request #469 from shadeMe/linear-layer-device Add `device` parameter to `Linear` subclasses and `Embedding`	2023-07-10 06:17:13 -07:00
Tim Dettmers	5fab673442	Added fp32 compute type for gemv_4bit.	2023-07-09 21:06:01 -07:00
Tim Dettmers	cef519c89e	Added test for Param4bit.to() and fixed double quant behavior.	2023-07-09 17:16:50 -07:00
Tim Dettmers	6a905be5ce	Fixed a bug where gemv_4bit would return a wrongly sized tensor.	2023-07-09 15:34:02 -07:00
Tim Dettmers	0f0390acb2	Added double quantization support and tests.	2023-07-09 15:32:03 -07:00
Tim Dettmers	94168d79d7	Added FP4 fast inference support.	2023-07-09 14:46:19 -07:00
Tim Dettmers	4b88d69de7	Added abitrary data types; fixed a bug for small matrices.	2023-07-09 12:04:09 -07:00
Tim Dettmers	eefbf60270	Turning optimization (float accumulation). 185 vs 50.	2023-07-08 16:31:58 -07:00
Tim Dettmers	7e49b5b938	Added warp_shuffle indexing 185 vs 54.	2023-07-08 14:27:12 -07:00
Alessandro Pietro Bardelli	463630dc73	[BugFix] replace view+continuous with reshape	2023-07-06 12:26:03 +02:00
Jeongseok Kang	a24aae30bf	Merge branch 'main' into fix/libcuda-to-torch	2023-07-06 15:43:42 +09:00
Tim Dettmers	02fd80cb81	Added bfloat16 quantizations and tests.	2023-07-04 19:58:31 -07:00
Tim Dettmers	dfe6900b94	Vectorized loads, conflict free NF4; 52 vs 172.	2023-07-04 15:20:10 -07:00
Tim Dettmers	f89ff93e26	Initial 4-bit naive batch size 1, 81 vs 185.	2023-07-03 18:45:38 -07:00
Tim Dettmers	4395d68cf6	Release 0.39.1.	2023-06-19 19:40:41 -07:00
Tim Dettmers	2d321a7524	Merge pull request #503 from TimDettmers/efficient_8bit_serialize Make 8-bit serialization more memory-efficient (v2)	2023-06-19 11:28:30 -07:00
Max Ryabinin	b599fdb197	Only rearrange weight if it exists	2023-06-14 19:27:13 +02:00
Max Ryabinin	c1f3f56d2c	Rearrange the weights directly in state dict before loading	2023-06-09 21:58:39 +02:00
Max Ryabinin	f734076e94	Improve memory efficiency of 8-bit serialization	2023-06-09 21:39:57 +02:00
Max Ryabinin	4fb37d45c1	Extract get_tile_inds to a separate function	2023-06-09 21:39:37 +02:00
shadeMe	db49ad43ab	Add `device` parameter to `Embedding`	2023-06-01 17:43:49 +02:00
shadeMe	9cac5dd1b6	Add `device` parameter to `Linear` subclasses	2023-06-01 17:43:30 +02:00
Tim Dettmers	e54d2730fc	Added debugging functions.	2023-05-30 20:42:21 -07:00

1 2 3 4 5 ...

461 Commits