Tim Dettmers
|
869b7e83b5
|
Warp multi-specialization 240.
|
2023-05-02 12:10:32 -07:00 |
|
Tim Dettmers
|
77f15fdce9
|
Shared memory efficient 240.
|
2023-05-02 11:38:11 -07:00 |
|
Tim Dettmers
|
89cccd8196
|
A tile multi-tiling.
|
2023-05-02 09:40:31 -07:00 |
|
Tim Dettmers
|
4decb3cc68
|
Removed uncessary sync.
|
2023-05-02 09:38:14 -07:00 |
|
Tim Dettmers
|
394749db71
|
Correct implementation 240.
|
2023-05-02 08:58:59 -07:00 |
|
Tim Dettmers
|
9aa232cc39
|
Initial.
|
2023-05-02 07:53:29 -07:00 |
|
Tim Dettmers
|
9192c9de64
|
Tighter and scaled error analysis.
|
2023-05-02 07:50:32 -07:00 |
|
Tim Dettmers
|
f9bfea8f23
|
Baseline for debugging.
|
2023-05-02 07:24:12 -07:00 |
|
Tim Dettmers
|
7bfa09d0fc
|
8x32 240 6 warps.
|
2023-05-01 16:38:09 -07:00 |
|
Tim Dettmers
|
3d4a2eadd3
|
16x16 240.
|
2023-05-01 16:23:45 -07:00 |
|
Tim Dettmers
|
7cc8ff4727
|
Warp specalization 362.
|
2023-05-01 08:21:12 -07:00 |
|
Tim Dettmers
|
cabcd9b9d5
|
Halved shared memory 466.
|
2023-04-30 19:12:42 -07:00 |
|
Tim Dettmers
|
30d03e0254
|
64 threads, high smem, 434.
|
2023-04-30 18:55:12 -07:00 |
|
Tim Dettmers
|
e01d4e033d
|
Fixed bank conflicts in non-vector load 422.
|
2023-04-30 18:28:52 -07:00 |
|
Tim Dettmers
|
c35ed09b66
|
Double frag 440.
|
2023-04-30 18:19:30 -07:00 |
|
Tim Dettmers
|
604bb3fb57
|
Slow non-vector 530.
|
2023-04-30 18:06:01 -07:00 |
|
Tim Dettmers
|
ad07d254fb
|
Slow tensor core solution.
|
2023-04-30 17:43:02 -07:00 |
|
Tim Dettmers
|
21723f796a
|
4-bit draft.
|
2023-04-29 21:52:47 -07:00 |
|
Tim Dettmers
|
cad839941b
|
Added bit template.
|
2023-04-28 22:10:42 -07:00 |
|
Tim Dettmers
|
f3e97ccbd2
|
New implementation for batch size 1.
|
2023-04-28 21:29:40 -07:00 |
|
Tim Dettmers
|
f6df4aef6a
|
Added fp16 and thread/item template.
|
2023-04-28 18:26:52 -07:00 |
|
Tim Dettmers
|
3aef78342a
|
Added template refactor.
|
2023-04-28 17:34:08 -07:00 |
|
Tim Dettmers
|
c1bfb210c5
|
First baseline kernel.
|
2023-04-28 17:19:02 -07:00 |
|
Tim Dettmers
|
9cab14a3ff
|
Adedd pipeline draft.
|
2023-04-27 15:12:49 -07:00 |
|
Tim Dettmers
|
d1c4c20568
|
Added non-cutlass template.
|
2023-04-27 15:11:26 -07:00 |
|
Tim Dettmers
|
0afc8e9e2f
|
Best attempt at cutlass3.
|
2023-04-26 17:12:34 -07:00 |
|
Tim Dettmers
|
84964db937
|
CUTLASS compiles.
|
2023-04-25 17:15:51 -07:00 |
|
Tim Dettmers
|
6e2544da25
|
Added cutlass example.
|
2023-04-25 16:15:44 -07:00 |
|
Tim Dettmers
|
6bfd7a405f
|
Initial template.
|
2023-04-25 16:13:43 -07:00 |
|
Tim Dettmers
|
0f9d30207f
|
Added nested quantization for blockwise quantization.
|
2023-04-19 11:48:47 -07:00 |
|
Tim Dettmers
|
7dc198feb7
|
Added 32-bit optimizer for bfloat16 gradients.
|
2023-04-17 18:01:49 -07:00 |
|
Tim Dettmers
|
9e7cdc9ea9
|
Added last SwitchBack refactors. All tests green.
|
2023-04-12 13:41:30 -07:00 |
|
Tim Dettmers
|
008dfff9b4
|
Added triton utils.
|
2023-04-12 12:57:46 -07:00 |
|
Tim Dettmers
|
b8ea2b416d
|
Fixed bias conversion in Linear4bit
|
2023-04-12 12:28:35 -07:00 |
|
Tim Dettmers
|
5b612bc6df
|
Added is_available_triton guard to Triton SwitchBackLinear.
|
2023-04-12 12:16:55 -07:00 |
|
Tim Dettmers
|
c3d87e4435
|
Added is_available_triton guard.
|
2023-04-12 12:10:34 -07:00 |
|
Tim Dettmers
|
7140c01405
|
Merge branch 'main' into fp8_merge
|
2023-04-12 11:44:39 -07:00 |
|
Tim Dettmers
|
32f8c89201
|
Added missing example folder.
|
2023-04-12 11:27:31 -07:00 |
|
Tim Dettmers
|
dd562c24f1
|
Refactored simulated fp8 modules into research.nn.
|
2023-04-12 11:24:44 -07:00 |
|
Tim Dettmers
|
e67bfccbcd
|
Added missing triton and fp8 files.
|
2023-04-12 10:06:18 -07:00 |
|
Tim Dettmers
|
ec1ea63711
|
Refactored triton into its own folder. Refactored fp8 matmuls.
|
2023-04-12 09:39:39 -07:00 |
|
Tim Dettmers
|
7c651012fc
|
Added better error message for debugging on CUDA not detected failures.
|
2023-04-12 07:56:52 -07:00 |
|
Tim Dettmers
|
659a7dfc71
|
Fixing #300.
|
2023-04-11 16:14:29 -07:00 |
|
Tim Dettmers
|
eb1c331c84
|
Updates README and CHANGELOG.
|
2023-04-11 15:49:01 -07:00 |
|
Tim Dettmers
|
89e3b82731
|
Added more detailed cuda setup debug and debugging instructions.
|
2023-04-11 13:47:10 -07:00 |
|
Tim Dettmers
|
4cd63deff3
|
Fixed CUDA Conda PyTorch 2.0 issues.
|
2023-04-11 12:10:20 -07:00 |
|
Tim Dettmers
|
2bb5c00ba9
|
Added pre/post call to all lib calls. Fixes #120
|
2023-04-11 09:36:56 -07:00 |
|
Tim Dettmers
|
29ab3a6b14
|
Updated change log.
|
2023-04-11 09:26:52 -07:00 |
|
Tim Dettmers
|
2eb3108356
|
Fixed bug where beta2 was not passed into Lion 32-bit.
|
2023-04-11 09:16:01 -07:00 |
|
Tim Dettmers
|
792af5c883
|
Fixed noisy tests for 8-bit Lion.
|
2023-04-11 08:42:41 -07:00 |
|