bitsandbytes-rocm

Author	SHA1	Message	Date
Tim Dettmers	4cd63deff3	Fixed CUDA Conda PyTorch 2.0 issues.	2023-04-11 12:10:20 -07:00
Tim Dettmers	2bb5c00ba9	Added pre/post call to all lib calls. Fixes #120	2023-04-11 09:36:56 -07:00
Tim Dettmers	2eb3108356	Fixed bug where beta2 was not passed into Lion 32-bit.	2023-04-11 09:16:01 -07:00
Tim Dettmers	ed6f3eb146	Merge pull request #159 from TimDettmers/serialize_8bit Implement proper serialization of Linear8bitLt	2023-04-11 07:24:51 -07:00
Tim Dettmers	b0ec20c3b3	Merge pull request #188 from lucidrains/main Lion 8 bit	2023-04-11 07:22:45 -07:00
Tim Dettmers	d3e0e39def	Merge pull request #190 from svgsponer/Fix#157 Fix #157; Add XDG_GREETER_DATA_DIR to ignorelist	2023-04-11 07:20:16 -07:00
Tim Dettmers	c7875533ce	Merge pull request #213 from tonylins/dev/fix_no_absmax Gix a bug in (de)quantize_no_absmax with multiple GPUs	2023-04-11 07:18:24 -07:00
Tim Dettmers	6b4c5afe21	Merge pull request #260 from rapsealk/fix_libsbitsandbytes_cpu_so Fixed typo libsbitsandbytes_cpu.so	2023-04-11 07:15:42 -07:00
justheuristic	5e456be50e	Support 1650, 1660	2023-04-10 21:26:52 +03:00
Mitchell Wortsman	d677a71607	typo	2023-04-08 19:36:17 +00:00
Mitchell Wortsman	da524d97c9	mem efficient"	2023-04-08 19:34:18 +00:00
Tim Dettmers	e9fa03b717	Some fixed for loading PEFT modules with Params4bit.	2023-04-07 09:59:21 -07:00
Jeongseok Kang	8cceff72db	Fixed typo libsbitsandbytes_cpu.so	2023-04-05 09:28:41 +09:00
Tim Dettmers	1ccb7bdec6	Fixed ParamsIn4 init; fixed PyTorch 2.0 test failure.	2023-04-03 18:47:00 -07:00
Tim Dettmers	4ea489d3bf	Refactor FP4 into 4Bit and integrate NF4 data type.	2023-04-03 11:00:12 -07:00
Tim Dettmers	64cc05920d	First draft of NF4.	2023-04-02 16:10:35 -07:00
Tim Dettmers	4ad999d144	Added quantization tree generation.	2023-04-02 14:42:45 -07:00
Tim Dettmers	0d332a641f	Added normal with extra value.	2023-04-02 14:09:08 -07:00
Tim Dettmers	51a21df728	Added 8-bit compression to quantization statistics.	2023-04-01 16:10:18 -07:00
Mitchell Wortsman	7f87ba83ee	cleaning and refactor	2023-04-01 18:46:04 +00:00
Tim Dettmers	c4cfe4fbdd	Added bf16 Adam.	2023-04-01 10:33:03 -07:00
Tim Dettmers	a13a522c4c	Added first triton test.	2023-03-31 11:20:54 -07:00
Tim Dettmers	8645d1f71c	Added normal quant.	2023-03-29 18:41:37 -07:00
Mitchell Wortsman	5f3d9ada8d	triton-v1	2023-03-29 06:47:08 +00:00
Tim Dettmers	69810521d3	Some small changes.	2023-03-27 09:12:57 -07:00
Mitchell Wortsman	51f8bb7133	pre-triton update	2023-03-24 05:44:42 +00:00
Ji Lin	b6383ba116	fix a bug in quantize_no_absmax and dequantize_no_absmax with multiple gpus	2023-03-22 22:14:57 -04:00
Severin Gsponer	c4866ab06e	Fix #157 ; Add XDG_GREETER_DATA_DIR to ignorelist	2023-03-11 15:35:23 +01:00
Phil Wang	19b9ef34b9	whoops	2023-03-10 08:59:49 -08:00
Phil Wang	c99b44f774	do the epsilon beta2 switcharoo within the cuda code, and not within the python class (so that the state dict still makes sense)	2023-03-10 08:57:59 -08:00
Phil Wang	c83888aa1a	use epsilon as beta2 for lion, complete most of the logic in kernel.cu for all functions	2023-03-09 11:54:54 -08:00
Phil Wang	cb4c3c8c66	do a bunch of typical bookkeeping before getting to main lion logic	2023-03-09 10:10:19 -08:00
Phil Wang	d43ea9722c	make sure interface is correct	2023-03-09 09:45:33 -08:00
Phil Wang	7247cb4554	initial commit, slowly work from interface into the kernel	2023-03-09 08:08:46 -08:00
Artidoro Pagnoni	6c31a5fe99	t5 model fix	2023-02-27 14:23:21 -08:00
Max Ryabinin	24609b66af	Reduce diff	2023-02-25 06:24:58 +01:00
Max Ryabinin	d15822a54b	Refactor _tile_indices into a cached property, fix device bug	2023-02-25 06:23:07 +01:00
Max Ryabinin	cc608c04c2	Revert the layout if weights were reordered	2023-02-25 06:02:06 +01:00
Max Ryabinin	cd4d904a4c	Raise an error when loading a quantized checkpoint before quantization	2023-02-25 06:01:34 +01:00
Tim Dettmers	9851a10b46	Added cast to fp4 layer for speed.	2023-02-24 10:17:57 -08:00
Mitchell Wortsman	75377d125e	new experiments	2023-02-24 00:10:15 +00:00
Tim Dettmers	5d2e23e8d6	Merge branch 'fp8sim' of github.com:TimDettmers/bitsandbytes into fp8sim	2023-02-23 10:56:49 -08:00
Tim Dettmers	c5c38ca19c	Added matmul_mixed.	2023-02-23 10:45:18 -08:00
Mitchell Wortsman	3fbf60ad83	sim now worse than real	2023-02-23 08:27:15 +00:00
Max Ryabinin	58b09ee1b1	[WIP] Implement proper serialization of Linear8bitLt	2023-02-21 12:04:47 +01:00
Mitchell Wortsman	7b764d3569	adding half() cast	2023-02-21 03:53:44 +00:00
Tim Dettmers	2489d819c5	Added more blocksizes for stochastic rounding; fixed dequant blocksize.	2023-02-14 13:55:17 -08:00
Tim Dettmers	c93a90d075	Fixed FP4 import and data type conversion in backward.	2023-02-14 13:31:39 -08:00
Tim Dettmers	2dfa3ce16d	Fixed LinearFP8 and added tests.	2023-02-13 17:48:52 -08:00
Tim Dettmers	fa255cbc56	Added missing import.	2023-02-13 17:29:39 -08:00
Tim Dettmers	ca3236587a	Added forward/backward tests; removed bias.	2023-02-13 17:20:52 -08:00
Tim Dettmers	6bdb6c351e	Added fp8 simulation layer.	2023-02-13 16:53:07 -08:00
Tim Dettmers	c0c352b379	Added bias test for LinearFP4 and basic test.	2023-02-05 06:29:52 -08:00
Tim Dettmers	c361f84239	Fixed matmul_fp4 transpose.	2023-02-05 06:16:56 -08:00
Tim Dettmers	cfe4705e32	Added matmul_fp4 to the benchmark.	2023-02-04 22:00:04 -08:00
Tim Dettmers	13c0a4dc5d	Backward matmul_fp4 passes.	2023-02-04 21:35:43 -08:00
Tim Dettmers	160a83580d	Forward matmul_fp4 tests pass.	2023-02-04 21:11:21 -08:00
Tim Dettmers	3ac5840c03	Added fp4 quant/dequant and dequant optimizations.	2023-02-04 14:52:04 -08:00
Kashif Rasul	c52365ac1d	Merge branch 'main' into patch-1	2023-02-03 09:01:48 +01:00
Tim Dettmers	0f5c394870	Added version 0.37.0.	2023-02-01 20:27:01 -08:00
Tim Dettmers	de53588934	Added Int8 matmul support for all GPUs. Full backward support.	2023-02-01 20:09:31 -08:00
Tim Dettmers	c9f505064e	Added outlier detector and fake quantization layer.	2023-01-28 17:05:22 -08:00
Kashif Rasul	59bf8fcff2	fix CUDASetup call	2023-01-04 17:47:18 +01:00
Kashif Rasul	792f6213a7	Fix for python 3.7	2023-01-04 17:38:33 +01:00
Tim Dettmers	1341fb44ad	Fixed issue where the CUDA SETUP was not printed.	2023-01-04 03:50:53 -08:00
Tim Dettmers	b3de19218e	Added error message for unexpected CUDA exception.	2023-01-03 06:57:07 -08:00
Tim Dettmers	81990491ff	Merge pull request #113 from Borzik/fix-warnings Import missing warn function	2023-01-03 15:46:58 +01:00
Tim Dettmers	9180b4cc11	Added additional error message for cudart error #85	2023-01-03 06:44:11 -08:00
Tim Dettmers	211ad594df	Added error+instructions for unsupported CUDA 10.0 version #82	2023-01-03 06:07:35 -08:00
Felix Borzik	f3800bab75	import warn function	2023-01-03 13:23:34 +00:00
Tim Dettmers	9d353ca786	Merge pull request #87 from lostmsu/main Add `device` and `dtype` parameters to `StableEmbedding`	2023-01-02 13:22:45 +01:00
Tim Dettmers	7a6563b6c8	Default to CPU library on CUDA error+small refactor.	2023-01-02 03:47:09 -08:00
Tim Dettmers	d9112dc55b	Merge pull request #110 from BlackHC/cublaslt_version Improve cc version detection for cublaslt	2023-01-02 12:35:53 +01:00
Tim Dettmers	336e24696c	CUDASetup only executed once + fixed circular import.	2023-01-02 03:31:43 -08:00
Tim Dettmers	be5cecb88f	Merge branch 'main' into main	2023-01-02 11:23:17 +01:00
Tim Dettmers	c91f592ad7	Merge branch 'main' into cleanup	2023-01-02 11:19:16 +01:00
blackhc	ed17aa9a31	Don't mark it as failure though.	2022-12-29 23:50:48 +00:00
blackhc	7b39a5511d	Fix issue #97	2022-12-29 23:47:21 +00:00
Tim Dettmers	c059bd2848	Added additional blocksizes: {64, 128, 256}.	2022-11-20 14:18:15 -08:00
Tim Dettmers	eb028e6ebc	Fixed k-bit quantization maps.	2022-11-19 07:24:03 -08:00
Tom Aarsen	b104ce3b62	Merge branch 'main' into cleanup	2022-11-17 15:22:29 +01:00
Tim Dettmers	08fa2e7b01	Fixed bug in cpu quant; faster GPU dequant.	2022-11-07 18:06:18 -08:00
Tim Dettmers	62a333ac40	Added pre/post calls do quantize_blockwise.	2022-11-06 17:17:51 -08:00
Tim Dettmers	e0e697b150	Fixed blockwise test and logic.	2022-11-06 16:36:31 -08:00
Tim Dettmers	6bc2b992be	Added blocksizes 2048, 1024, and 512 to blockwise quant.	2022-11-06 16:27:48 -08:00
Tim Dettmers	2f2063bac2	Added k<256 quantile estimate.	2022-11-06 13:05:25 -08:00
Tim Dettmers	98cbc4bc4f	Added k-bit fp8 map.	2022-11-06 11:59:37 -08:00
Tim Dettmers	caf1832526	Added k-bit linear quantization.	2022-11-06 11:47:54 -08:00
Victor Nova	62d39a237c	add device and dtype parameters to StableEmbedding	2022-11-04 14:12:46 -07:00
Tim Dettmers	1efb87d89d	Added FP8 quantization map.	2022-11-03 19:49:50 -07:00
Tom Aarsen	62c0bd2278	Fix several typos in logging and comments Via codespell	2022-11-01 09:53:47 +01:00
Tom Aarsen	d504050ff7	Call isort over cuda_setup/main.py	2022-11-01 09:46:03 +01:00
Tom Aarsen	30f28b94a0	Merge branch 'main' into cleanup	2022-11-01 09:43:49 +01:00
Tim Dettmers	8d87c0b852	Fixed CUDA setup bugs, including #81 .	2022-10-31 18:04:49 -07:00
adpkadspokasdk	8724c990c7	allow hiding of the welcome message	2022-10-27 16:04:49 -06:00
Tim Dettmers	4844aef4ff	Fixing bad error when GPU was not detected for #73 .	2022-10-27 08:54:30 -07:00
Tom Aarsen	c6dad28a08	Remove extraneous get_ptr calls	2022-10-27 13:53:16 +02:00
Tom Aarsen	7727fa4c8c	Remove f-prefix from strings that don't use formatting	2022-10-27 13:36:39 +02:00
Tom Aarsen	54bd6ed1d6	Remove unused imports	2022-10-27 13:32:01 +02:00
Tom Aarsen	ef70f2adcd	Fix bad indentation	2022-10-27 13:27:18 +02:00

1 2 3 4 5 ...

280 Commits