bitsandbytes-rocm

Author	SHA1	Message	Date
Tim Dettmers	ec1ea63711	Refactored triton into its own folder. Refactored fp8 matmuls.	2023-04-12 09:39:39 -07:00
Tim Dettmers	4cd63deff3	Fixed CUDA Conda PyTorch 2.0 issues.	2023-04-11 12:10:20 -07:00
Tim Dettmers	2eb3108356	Fixed bug where beta2 was not passed into Lion 32-bit.	2023-04-11 09:16:01 -07:00
Tim Dettmers	792af5c883	Fixed noisy tests for 8-bit Lion.	2023-04-11 08:42:41 -07:00
Tim Dettmers	ed6f3eb146	Merge pull request #159 from TimDettmers/serialize_8bit Implement proper serialization of Linear8bitLt	2023-04-11 07:24:51 -07:00
Tim Dettmers	e9fa03b717	Some fixed for loading PEFT modules with Params4bit.	2023-04-07 09:59:21 -07:00
Tim Dettmers	1ccb7bdec6	Fixed ParamsIn4 init; fixed PyTorch 2.0 test failure.	2023-04-03 18:47:00 -07:00
Tim Dettmers	4ea489d3bf	Refactor FP4 into 4Bit and integrate NF4 data type.	2023-04-03 11:00:12 -07:00
Tim Dettmers	64cc05920d	First draft of NF4.	2023-04-02 16:10:35 -07:00
Tim Dettmers	4ad999d144	Added quantization tree generation.	2023-04-02 14:42:45 -07:00
Tim Dettmers	0d332a641f	Added normal with extra value.	2023-04-02 14:09:08 -07:00
Tim Dettmers	2dd5d69056	Generalized FP4 data type.	2023-04-02 12:42:01 -07:00
Tim Dettmers	51a21df728	Added 8-bit compression to quantization statistics.	2023-04-01 16:10:18 -07:00
Mitchell Wortsman	7f87ba83ee	cleaning and refactor	2023-04-01 18:46:04 +00:00
Tim Dettmers	c4cfe4fbdd	Added bf16 Adam.	2023-04-01 10:33:03 -07:00
Tim Dettmers	30d21d585c	Added triton test.	2023-03-31 11:33:26 -07:00
Tim Dettmers	a13a522c4c	Added first triton test.	2023-03-31 11:20:54 -07:00
Tim Dettmers	8645d1f71c	Added normal quant.	2023-03-29 18:41:37 -07:00
Mitchell Wortsman	b373034e31	test	2023-03-29 19:04:53 +00:00
Mitchell Wortsman	5f3d9ada8d	triton-v1	2023-03-29 06:47:08 +00:00
Tim Dettmers	69810521d3	Some small changes.	2023-03-27 09:12:57 -07:00
Phil Wang	a43cd2008d	add some code in test_optim.py, although it seems to be failing	2023-03-22 09:14:05 -07:00
Max Ryabinin	dcecbb26ca	Add force_no_igemmlt to test params	2023-03-22 00:28:49 +01:00
Phil Wang	8de29fc364	forget about tests for now, will test live on local enwik8 training	2023-03-09 10:11:32 -08:00
Phil Wang	cb4c3c8c66	do a bunch of typical bookkeeping before getting to main lion logic	2023-03-09 10:10:19 -08:00
Max Ryabinin	ac3ab281e3	Handle more cases in test_linear_serialization	2023-02-25 06:01:04 +01:00
Tim Dettmers	c5c38ca19c	Added matmul_mixed.	2023-02-23 10:45:18 -08:00
Max Ryabinin	58b09ee1b1	[WIP] Implement proper serialization of Linear8bitLt	2023-02-21 12:04:47 +01:00
Tim Dettmers	2489d819c5	Added more blocksizes for stochastic rounding; fixed dequant blocksize.	2023-02-14 13:55:17 -08:00
Tim Dettmers	2dfa3ce16d	Fixed LinearFP8 and added tests.	2023-02-13 17:48:52 -08:00
Tim Dettmers	ca3236587a	Added forward/backward tests; removed bias.	2023-02-13 17:20:52 -08:00
Tim Dettmers	6bdb6c351e	Added fp8 simulation layer.	2023-02-13 16:53:07 -08:00
Tim Dettmers	7f0773aede	Added backprop test for Linear8bitLt and LinearFP4.	2023-02-05 06:49:54 -08:00
Tim Dettmers	c0c352b379	Added bias test for LinearFP4 and basic test.	2023-02-05 06:29:52 -08:00
Tim Dettmers	c361f84239	Fixed matmul_fp4 transpose.	2023-02-05 06:16:56 -08:00
Tim Dettmers	cfe4705e32	Added matmul_fp4 to the benchmark.	2023-02-04 22:00:04 -08:00
Tim Dettmers	13c0a4dc5d	Backward matmul_fp4 passes.	2023-02-04 21:35:43 -08:00
Tim Dettmers	160a83580d	Forward matmul_fp4 tests pass.	2023-02-04 21:11:21 -08:00
Tim Dettmers	3ac5840c03	Added fp4 quant/dequant and dequant optimizations.	2023-02-04 14:52:04 -08:00
Tim Dettmers	de53588934	Added Int8 matmul support for all GPUs. Full backward support.	2023-02-01 20:09:31 -08:00
Tim Dettmers	c9f505064e	Added outlier detector and fake quantization layer.	2023-01-28 17:05:22 -08:00
Tim Dettmers	336e24696c	CUDASetup only executed once + fixed circular import.	2023-01-02 03:31:43 -08:00
Tim Dettmers	c91f592ad7	Merge branch 'main' into cleanup	2023-01-02 11:19:16 +01:00
Tim Dettmers	eb028e6ebc	Fixed k-bit quantization maps.	2022-11-19 07:24:03 -08:00
Tom Aarsen	b104ce3b62	Merge branch 'main' into cleanup	2022-11-17 15:22:29 +01:00
Tim Dettmers	08fa2e7b01	Fixed bug in cpu quant; faster GPU dequant.	2022-11-07 18:06:18 -08:00
Tim Dettmers	e0e697b150	Fixed blockwise test and logic.	2022-11-06 16:36:31 -08:00
Tim Dettmers	6bc2b992be	Added blocksizes 2048, 1024, and 512 to blockwise quant.	2022-11-06 16:27:48 -08:00
Tim Dettmers	2f2063bac2	Added k<256 quantile estimate.	2022-11-06 13:05:25 -08:00
Tim Dettmers	98cbc4bc4f	Added k-bit fp8 map.	2022-11-06 11:59:37 -08:00
Tim Dettmers	caf1832526	Added k-bit linear quantization.	2022-11-06 11:47:54 -08:00
Tim Dettmers	1efb87d89d	Added FP8 quantization map.	2022-11-03 19:49:50 -07:00
Tom Aarsen	7a3c9af05d	Sort imports Via isort	2022-10-27 13:15:21 +02:00
Tom Aarsen	0b078403ee	Simplify statements into equivalent, modern variants via pyupgrade --py37-plus. The changes e.g. are subclassing from object, calling super() with super(ThisClass, self), or old-style syntax formatting.	2022-10-27 13:14:13 +02:00
Tom Aarsen	1eec77d34c	Remove trailing whitespace & ensure newline at EOF	2022-10-27 13:11:29 +02:00
Tim Dettmers	a371be302d	Added CUDA SETUP instruction generator.	2022-10-25 08:01:19 -07:00
Tim Dettmers	df86625a93	Isolated CUDASetup logging; all tests green.	2022-10-24 11:54:25 -07:00
justheuristic	76ce9aa6da	try fp32	2022-09-20 06:51:25 +03:00
Tim Dettmers	292a478716	set threshold	2022-09-20 06:42:05 +03:00
justheuristic	a07825ac31	review	2022-09-20 06:40:36 +03:00
justheuristic	cff3a71599	cast device	2022-09-18 01:26:25 +03:00
justheuristic	32a9a88f98	cast device	2022-09-18 01:26:12 +03:00
justheuristic	01b4c6a048	cast device	2022-09-18 01:25:56 +03:00
justheuristic	e4086a2758	cast device	2022-09-18 01:24:57 +03:00
justheuristic	725cc72993	cast device	2022-09-18 01:24:44 +03:00
justheuristic	28a9313ddc	cast before allclose	2022-09-18 01:24:27 +03:00
justheuristic	95dafc6475	cast before allclose	2022-09-18 01:22:31 +03:00
justheuristic	37f805bb44	debug	2022-09-18 01:21:12 +03:00
justheuristic	6a826c41a6	pre-cast	2022-09-18 01:20:34 +03:00
justheuristic	d9b8789818	debug	2022-09-18 01:13:58 +03:00
justheuristic	2cd047e35d	run backward	2022-09-18 00:55:53 +03:00
justheuristic	591f60395a	add memory efficient backward	2022-09-18 00:52:53 +03:00
justheuristic	f6670329fb	bump threshold to 0.21	2022-09-18 00:42:23 +03:00
justheuristic	fa8e07c7c5	more lenient threshold	2022-09-18 00:38:02 +03:00
justheuristic	e35e2c665a	cast properly	2022-09-18 00:35:03 +03:00
justheuristic	d9ca0ed905	un-fuse bias	2022-09-17 23:44:28 +03:00
justheuristic	7facedda38	copypaste tolerances	2022-09-17 23:41:40 +03:00
justheuristic	e29c5f5c41	clearer assertions	2022-09-17 23:22:04 +03:00
justheuristic	9379df85d2	check dtypes first	2022-09-17 23:13:23 +03:00
justheuristic	140cdbe876	check dtypes first	2022-09-17 23:12:58 +03:00
justheuristic	a9c7953e0a	cast to half before double_quant	2022-09-17 23:10:21 +03:00
justheuristic	469d5a631d	test_bf16	2022-09-17 23:06:57 +03:00
Tim Dettmers	c05dd42ddd	Fixed cpu blockwise quantization for small input tensors.	2022-09-13 10:37:53 -07:00
Tim Dettmers	19a7adca7a	Fixed 2^31 max size issue for cpu blockwise quant.	2022-09-11 11:55:09 -07:00
Tim Dettmers	7e0fb655e1	Some initial code. Needs to be tested.	2022-08-23 13:59:34 -07:00
Tim Dettmers	9d60b3c527	Fixed bug in Linear8bitLt, when the bias is None.	2022-08-17 03:45:57 -07:00
Tim Dettmers	de354f7ded	Added fused bias to matmullt.	2022-08-16 12:00:54 -07:00
Tim Dettmers	dede343033	Added fused bias in dequant_mm.	2022-08-16 11:12:09 -07:00
Tim Dettmers	1ed2fa2f21	Removed storage() from get_ptr; added boilerplate for bias dequant_mm.	2022-08-16 10:56:17 -07:00
Tim Dettmers	c472bd56f0	Added the case that all env variables are empty (CUDA docker).	2022-08-05 08:57:52 -07:00
Tim Dettmers	8f84674d67	Fixed bugs in cuda setup.	2022-08-04 09:16:00 -07:00
Tim Dettmers	758c7175a2	Merge branch 'debug' into cuda-bin-switch-and-cli	2022-08-04 08:03:00 -07:00
Tim Dettmers	cc5b323876	Merge branch 'extract_outliers' into debug	2022-08-04 07:40:48 -07:00
Tim Dettmers	451fd9506e	Added fixes for the case that matmullt dim A is zero, e.g. [0, 768].	2022-08-03 11:54:01 -07:00
Titus von Koeller	59a615b386	factored cuda_setup.main out into smaller modules and functions	2022-08-02 21:26:50 -07:00
Tim Dettmers	3479d02a76	Added some more docs and comments.	2022-08-01 19:43:09 -07:00
Tim Dettmers	8bf3e9faab	Added full env variable search; CONDA_PREFIX priority.	2022-08-01 19:22:41 -07:00
Titus von Koeller	ea7c14f8ef	reran black with linelength 80 for greater readability	2022-08-01 09:32:47 -07:00
Titus von Koeller	bfa0e33294	ran black and isort for coherent code formatting	2022-08-01 03:31:48 -07:00
Tim Dettmers	dd50382b32	Full evaluate_cuda setup with integration test.	2022-07-31 17:47:44 -07:00

1 2 3 4

165 Commits