Commit Graph

414 Commits

Author SHA1 Message Date
Tim Dettmers
196d6f5dc1
Merge pull request #469 from shadeMe/linear-layer-device
Add `device` parameter to `Linear` subclasses and `Embedding`
2023-07-10 06:17:13 -07:00
Tim Dettmers
4395d68cf6 Release 0.39.1. 2023-06-19 19:40:41 -07:00
Tim Dettmers
2d321a7524
Merge pull request #503 from TimDettmers/efficient_8bit_serialize
Make 8-bit serialization more memory-efficient (v2)
2023-06-19 11:28:30 -07:00
Max Ryabinin
b599fdb197 Only rearrange weight if it exists 2023-06-14 19:27:13 +02:00
Max Ryabinin
c1f3f56d2c Rearrange the weights directly in state dict before loading 2023-06-09 21:58:39 +02:00
Max Ryabinin
f734076e94 Improve memory efficiency of 8-bit serialization 2023-06-09 21:39:57 +02:00
Max Ryabinin
4fb37d45c1 Extract get_tile_inds to a separate function 2023-06-09 21:39:37 +02:00
shadeMe
db49ad43ab
Add device parameter to Embedding 2023-06-01 17:43:49 +02:00
shadeMe
9cac5dd1b6
Add device parameter to Linear subclasses 2023-06-01 17:43:30 +02:00
Tim Dettmers
ac5550a023 Added changes for deployment. 2023-05-30 19:06:59 -07:00
Tim Dettmers
0f40fa3f0a Bumped version. 2023-05-23 19:55:52 -07:00
Tim Dettmers
1b8772a8f3 Added PagedLion and bf16 Lion. 2023-05-23 19:37:38 -07:00
Tim Dettmers
2bce175d15 Fixed Makefile. 2023-05-23 18:42:19 -07:00
Tim Dettmers
4bd1151829 Fixed gradient accumulation test. 2023-05-07 15:06:17 -07:00
Tim Dettmers
675baa79d2 Merge remote-tracking branch 'origin/main' into merge 2023-05-07 13:34:03 -07:00
Tim Dettmers
f64cfe65aa Fixed prefetch bug for non-paged tensors; added benchmark. 2023-05-06 21:49:16 -07:00
Tim Dettmers
41a9c70814 Changed prefetching. 2023-05-06 18:59:59 -07:00
Tim Dettmers
44d68ff29c Added paged optimizers. 2023-05-06 14:59:29 -07:00
Tim Dettmers
ec38ba95b0 Added paging. 2023-05-06 11:14:06 -07:00
Tim Dettmers
264a948539 4-bit draft; 128 vector load 240. 2023-05-02 16:15:38 -07:00
Tim Dettmers
869b7e83b5 Warp multi-specialization 240. 2023-05-02 12:10:32 -07:00
Tim Dettmers
77f15fdce9 Shared memory efficient 240. 2023-05-02 11:38:11 -07:00
Tim Dettmers
89cccd8196 A tile multi-tiling. 2023-05-02 09:40:31 -07:00
Tim Dettmers
4decb3cc68 Removed uncessary sync. 2023-05-02 09:38:14 -07:00
Tim Dettmers
394749db71 Correct implementation 240. 2023-05-02 08:58:59 -07:00
Tim Dettmers
9aa232cc39 Initial. 2023-05-02 07:53:29 -07:00
Tim Dettmers
9192c9de64 Tighter and scaled error analysis. 2023-05-02 07:50:32 -07:00
Tim Dettmers
f9bfea8f23 Baseline for debugging. 2023-05-02 07:24:12 -07:00
Tim Dettmers
7bfa09d0fc 8x32 240 6 warps. 2023-05-01 16:38:09 -07:00
Tim Dettmers
3d4a2eadd3 16x16 240. 2023-05-01 16:23:45 -07:00
Tim Dettmers
7cc8ff4727 Warp specalization 362. 2023-05-01 08:21:12 -07:00
Tim Dettmers
cabcd9b9d5 Halved shared memory 466. 2023-04-30 19:12:42 -07:00
Tim Dettmers
30d03e0254 64 threads, high smem, 434. 2023-04-30 18:55:12 -07:00
Tim Dettmers
e01d4e033d Fixed bank conflicts in non-vector load 422. 2023-04-30 18:28:52 -07:00
Tim Dettmers
c35ed09b66 Double frag 440. 2023-04-30 18:19:30 -07:00
Tim Dettmers
604bb3fb57 Slow non-vector 530. 2023-04-30 18:06:01 -07:00
Tim Dettmers
ad07d254fb Slow tensor core solution. 2023-04-30 17:43:02 -07:00
Tim Dettmers
21723f796a 4-bit draft. 2023-04-29 21:52:47 -07:00
Tim Dettmers
cad839941b Added bit template. 2023-04-28 22:10:42 -07:00
Tim Dettmers
f3e97ccbd2 New implementation for batch size 1. 2023-04-28 21:29:40 -07:00
Tim Dettmers
f6df4aef6a Added fp16 and thread/item template. 2023-04-28 18:26:52 -07:00
Tim Dettmers
3aef78342a Added template refactor. 2023-04-28 17:34:08 -07:00
Tim Dettmers
c1bfb210c5 First baseline kernel. 2023-04-28 17:19:02 -07:00
Tim Dettmers
9cab14a3ff Adedd pipeline draft. 2023-04-27 15:12:49 -07:00
Tim Dettmers
d1c4c20568 Added non-cutlass template. 2023-04-27 15:11:26 -07:00
Tim Dettmers
0afc8e9e2f Best attempt at cutlass3. 2023-04-26 17:12:34 -07:00
Tim Dettmers
84964db937 CUTLASS compiles. 2023-04-25 17:15:51 -07:00
Tim Dettmers
6e2544da25 Added cutlass example. 2023-04-25 16:15:44 -07:00
Tim Dettmers
6bfd7a405f Initial template. 2023-04-25 16:13:43 -07:00
Tim Dettmers
0f9d30207f Added nested quantization for blockwise quantization. 2023-04-19 11:48:47 -07:00