Commit Graph

299 Commits

Author SHA1 Message Date
Tim Dettmers
7bfa09d0fc 8x32 240 6 warps. 2023-05-01 16:38:09 -07:00
Tim Dettmers
3d4a2eadd3 16x16 240. 2023-05-01 16:23:45 -07:00
Tim Dettmers
7cc8ff4727 Warp specalization 362. 2023-05-01 08:21:12 -07:00
Tim Dettmers
cabcd9b9d5 Halved shared memory 466. 2023-04-30 19:12:42 -07:00
Tim Dettmers
30d03e0254 64 threads, high smem, 434. 2023-04-30 18:55:12 -07:00
Tim Dettmers
e01d4e033d Fixed bank conflicts in non-vector load 422. 2023-04-30 18:28:52 -07:00
Tim Dettmers
c35ed09b66 Double frag 440. 2023-04-30 18:19:30 -07:00
Tim Dettmers
604bb3fb57 Slow non-vector 530. 2023-04-30 18:06:01 -07:00
Tim Dettmers
ad07d254fb Slow tensor core solution. 2023-04-30 17:43:02 -07:00
Tim Dettmers
21723f796a 4-bit draft. 2023-04-29 21:52:47 -07:00
Tim Dettmers
cad839941b Added bit template. 2023-04-28 22:10:42 -07:00
Tim Dettmers
f3e97ccbd2 New implementation for batch size 1. 2023-04-28 21:29:40 -07:00
Tim Dettmers
f6df4aef6a Added fp16 and thread/item template. 2023-04-28 18:26:52 -07:00
Tim Dettmers
3aef78342a Added template refactor. 2023-04-28 17:34:08 -07:00
Tim Dettmers
c1bfb210c5 First baseline kernel. 2023-04-28 17:19:02 -07:00
Tim Dettmers
9cab14a3ff Adedd pipeline draft. 2023-04-27 15:12:49 -07:00
Tim Dettmers
d1c4c20568 Added non-cutlass template. 2023-04-27 15:11:26 -07:00
Tim Dettmers
0afc8e9e2f Best attempt at cutlass3. 2023-04-26 17:12:34 -07:00
Tim Dettmers
84964db937 CUTLASS compiles. 2023-04-25 17:15:51 -07:00
Tim Dettmers
6e2544da25 Added cutlass example. 2023-04-25 16:15:44 -07:00
Tim Dettmers
6bfd7a405f Initial template. 2023-04-25 16:13:43 -07:00
Tim Dettmers
0f9d30207f Added nested quantization for blockwise quantization. 2023-04-19 11:48:47 -07:00
Tim Dettmers
7dc198feb7 Added 32-bit optimizer for bfloat16 gradients. 2023-04-17 18:01:49 -07:00
Tim Dettmers
b8ea2b416d Fixed bias conversion in Linear4bit 2023-04-12 12:28:35 -07:00
Tim Dettmers
e9fa03b717 Some fixed for loading PEFT modules with Params4bit. 2023-04-07 09:59:21 -07:00
Tim Dettmers
1ccb7bdec6 Fixed ParamsIn4 init; fixed PyTorch 2.0 test failure. 2023-04-03 18:47:00 -07:00
Tim Dettmers
4ea489d3bf Refactor FP4 into 4Bit and integrate NF4 data type. 2023-04-03 11:00:12 -07:00
Tim Dettmers
64cc05920d First draft of NF4. 2023-04-02 16:10:35 -07:00
Tim Dettmers
4ad999d144 Added quantization tree generation. 2023-04-02 14:42:45 -07:00
Tim Dettmers
0d332a641f Added normal with extra value. 2023-04-02 14:09:08 -07:00
Tim Dettmers
2dd5d69056 Generalized FP4 data type. 2023-04-02 12:42:01 -07:00
Tim Dettmers
51a21df728 Added 8-bit compression to quantization statistics. 2023-04-01 16:10:18 -07:00
Tim Dettmers
c4cfe4fbdd Added bf16 Adam. 2023-04-01 10:33:03 -07:00
Tim Dettmers
8645d1f71c Added normal quant. 2023-03-29 18:41:37 -07:00
Tim Dettmers
69810521d3 Some small changes. 2023-03-27 09:12:57 -07:00
Artidoro Pagnoni
6c31a5fe99 t5 model fix 2023-02-27 14:23:21 -08:00
Tim Dettmers
9851a10b46 Added cast to fp4 layer for speed. 2023-02-24 10:17:57 -08:00
Tim Dettmers
c93a90d075 Fixed FP4 import and data type conversion in backward. 2023-02-14 13:31:39 -08:00
Tim Dettmers
7f0773aede Added backprop test for Linear8bitLt and LinearFP4. 2023-02-05 06:49:54 -08:00
Tim Dettmers
c0c352b379 Added bias test for LinearFP4 and basic test. 2023-02-05 06:29:52 -08:00
Tim Dettmers
c361f84239 Fixed matmul_fp4 transpose. 2023-02-05 06:16:56 -08:00
Tim Dettmers
cfe4705e32 Added matmul_fp4 to the benchmark. 2023-02-04 22:00:04 -08:00
Tim Dettmers
13c0a4dc5d Backward matmul_fp4 passes. 2023-02-04 21:35:43 -08:00
Tim Dettmers
160a83580d Forward matmul_fp4 tests pass. 2023-02-04 21:11:21 -08:00
Tim Dettmers
3ac5840c03 Added fp4 quant/dequant and dequant optimizations. 2023-02-04 14:52:04 -08:00
Tim Dettmers
0f5c394870 Added version 0.37.0. 2023-02-01 20:27:01 -08:00
Tim Dettmers
de53588934 Added Int8 matmul support for all GPUs. Full backward support. 2023-02-01 20:09:31 -08:00
Tim Dettmers
92ab6a8d5f
Merge pull request #119 from stas00/patch-1
improve install instructions
2023-02-01 19:21:36 -08:00
Stas Bekman
c5372a8567
improve install instructions 2023-01-05 13:34:51 -08:00
Tim Dettmers
1341fb44ad Fixed issue where the CUDA SETUP was not printed. 2023-01-04 03:50:53 -08:00