Commit Graph

484 Commits

Author SHA1 Message Date
Tim Dettmers
604bb3fb57 Slow non-vector 530. 2023-04-30 18:06:01 -07:00
Tim Dettmers
ad07d254fb Slow tensor core solution. 2023-04-30 17:43:02 -07:00
Tim Dettmers
21723f796a 4-bit draft. 2023-04-29 21:52:47 -07:00
Tim Dettmers
cad839941b Added bit template. 2023-04-28 22:10:42 -07:00
Tim Dettmers
f3e97ccbd2 New implementation for batch size 1. 2023-04-28 21:29:40 -07:00
Tim Dettmers
f6df4aef6a Added fp16 and thread/item template. 2023-04-28 18:26:52 -07:00
Tim Dettmers
3aef78342a Added template refactor. 2023-04-28 17:34:08 -07:00
Tim Dettmers
c1bfb210c5 First baseline kernel. 2023-04-28 17:19:02 -07:00
rapsealk
2b4cc256f6 fix: Get device's compute capability 2023-04-28 11:18:54 +09:00
Tim Dettmers
9cab14a3ff Adedd pipeline draft. 2023-04-27 15:12:49 -07:00
Tim Dettmers
d1c4c20568 Added non-cutlass template. 2023-04-27 15:11:26 -07:00
Tim Dettmers
0afc8e9e2f Best attempt at cutlass3. 2023-04-26 17:12:34 -07:00
Jeongseok Kang
f5110265ff
fix: Remove unused code 2023-04-26 11:54:17 +09:00
Tim Dettmers
84964db937 CUTLASS compiles. 2023-04-25 17:15:51 -07:00
Tim Dettmers
6e2544da25 Added cutlass example. 2023-04-25 16:15:44 -07:00
Tim Dettmers
6bfd7a405f Initial template. 2023-04-25 16:13:43 -07:00
rapsealk
9836b0b90f fix: Use raw int 2023-04-25 17:12:27 +09:00
rapsealk
eb54c55b61 fix: Get CUDA compiled version through pytorch 2023-04-25 17:08:22 +09:00
rapsealk
97b2567ada fix: Replace libcudart with pytorch api 2023-04-25 17:00:02 +09:00
Tim Dettmers
0f9d30207f Added nested quantization for blockwise quantization. 2023-04-19 11:48:47 -07:00
Tim Dettmers
7dc198feb7 Added 32-bit optimizer for bfloat16 gradients. 2023-04-17 18:01:49 -07:00
Tim Dettmers
9e7cdc9ea9 Added last SwitchBack refactors. All tests green. 2023-04-12 13:41:30 -07:00
Tim Dettmers
008dfff9b4 Added triton utils. 2023-04-12 12:57:46 -07:00
Tim Dettmers
b8ea2b416d Fixed bias conversion in Linear4bit 2023-04-12 12:28:35 -07:00
Tim Dettmers
5b612bc6df Added is_available_triton guard to Triton SwitchBackLinear. 2023-04-12 12:16:55 -07:00
Tim Dettmers
c3d87e4435 Added is_available_triton guard. 2023-04-12 12:10:34 -07:00
Tim Dettmers
7140c01405 Merge branch 'main' into fp8_merge 2023-04-12 11:44:39 -07:00
Tim Dettmers
32f8c89201 Added missing example folder. 2023-04-12 11:27:31 -07:00
Tim Dettmers
dd562c24f1 Refactored simulated fp8 modules into research.nn. 2023-04-12 11:24:44 -07:00
Tim Dettmers
e67bfccbcd Added missing triton and fp8 files. 2023-04-12 10:06:18 -07:00
Tim Dettmers
ec1ea63711 Refactored triton into its own folder. Refactored fp8 matmuls. 2023-04-12 09:39:39 -07:00
Tim Dettmers
7c651012fc Added better error message for debugging on CUDA not detected failures. 2023-04-12 07:56:52 -07:00
Tim Dettmers
659a7dfc71 Fixing #300. 2023-04-11 16:14:29 -07:00
Tim Dettmers
eb1c331c84 Updates README and CHANGELOG. 2023-04-11 15:49:01 -07:00
Tim Dettmers
89e3b82731 Added more detailed cuda setup debug and debugging instructions. 2023-04-11 13:47:10 -07:00
Tim Dettmers
4cd63deff3 Fixed CUDA Conda PyTorch 2.0 issues. 2023-04-11 12:10:20 -07:00
Tim Dettmers
2bb5c00ba9 Added pre/post call to all lib calls. Fixes #120 2023-04-11 09:36:56 -07:00
Tim Dettmers
29ab3a6b14 Updated change log. 2023-04-11 09:26:52 -07:00
Tim Dettmers
2eb3108356 Fixed bug where beta2 was not passed into Lion 32-bit. 2023-04-11 09:16:01 -07:00
Tim Dettmers
792af5c883 Fixed noisy tests for 8-bit Lion. 2023-04-11 08:42:41 -07:00
Tim Dettmers
0b2ebcdab9 Added launch bounds to fix launch resource error for Lion. 2023-04-11 08:37:02 -07:00
Tim Dettmers
ed6f3eb146
Merge pull request #159 from TimDettmers/serialize_8bit
Implement proper serialization of Linear8bitLt
2023-04-11 07:24:51 -07:00
Tim Dettmers
b0ec20c3b3
Merge pull request #188 from lucidrains/main
Lion 8 bit
2023-04-11 07:22:45 -07:00
Tim Dettmers
d3e0e39def
Merge pull request #190 from svgsponer/Fix#157
Fix #157; Add XDG_GREETER_DATA_DIR to ignorelist
2023-04-11 07:20:16 -07:00
Tim Dettmers
c7875533ce
Merge pull request #213 from tonylins/dev/fix_no_absmax
Gix a bug in (de)quantize_no_absmax with multiple GPUs
2023-04-11 07:18:24 -07:00
Tim Dettmers
6b4c5afe21
Merge pull request #260 from rapsealk/fix_libsbitsandbytes_cpu_so
Fixed typo libsbitsandbytes_cpu.so
2023-04-11 07:15:42 -07:00
Tim Dettmers
72efa32962
Merge pull request #292 from justheuristic/patch-2
Support nvidia16 GPUs
2023-04-11 07:14:12 -07:00
justheuristic
5e456be50e
Support 1650, 1660 2023-04-10 21:26:52 +03:00
Mitchell Wortsman
d677a71607 typo 2023-04-08 19:36:17 +00:00
Mitchell Wortsman
da524d97c9 mem efficient" 2023-04-08 19:34:18 +00:00