Tim Dettmers
|
196d6f5dc1
|
Merge pull request #469 from shadeMe/linear-layer-device
Add `device` parameter to `Linear` subclasses and `Embedding`
|
2023-07-10 06:17:13 -07:00 |
|
Tim Dettmers
|
5fab673442
|
Added fp32 compute type for gemv_4bit.
|
2023-07-09 21:06:01 -07:00 |
|
Tim Dettmers
|
cef519c89e
|
Added test for Param4bit.to() and fixed double quant behavior.
|
2023-07-09 17:16:50 -07:00 |
|
Tim Dettmers
|
6a905be5ce
|
Fixed a bug where gemv_4bit would return a wrongly sized tensor.
|
2023-07-09 15:34:02 -07:00 |
|
Tim Dettmers
|
0f0390acb2
|
Added double quantization support and tests.
|
2023-07-09 15:32:03 -07:00 |
|
Tim Dettmers
|
94168d79d7
|
Added FP4 fast inference support.
|
2023-07-09 14:46:19 -07:00 |
|
Tim Dettmers
|
4b88d69de7
|
Added abitrary data types; fixed a bug for small matrices.
|
2023-07-09 12:04:09 -07:00 |
|
Tim Dettmers
|
eefbf60270
|
Turning optimization (float accumulation). 185 vs 50.
|
2023-07-08 16:31:58 -07:00 |
|
Tim Dettmers
|
7e49b5b938
|
Added warp_shuffle indexing 185 vs 54.
|
2023-07-08 14:27:12 -07:00 |
|
Alessandro Pietro Bardelli
|
463630dc73
|
[BugFix] replace view+continuous with reshape
|
2023-07-06 12:26:03 +02:00 |
|
Jeongseok Kang
|
a24aae30bf
|
Merge branch 'main' into fix/libcuda-to-torch
|
2023-07-06 15:43:42 +09:00 |
|
Tim Dettmers
|
02fd80cb81
|
Added bfloat16 quantizations and tests.
|
2023-07-04 19:58:31 -07:00 |
|
Tim Dettmers
|
dfe6900b94
|
Vectorized loads, conflict free NF4; 52 vs 172.
|
2023-07-04 15:20:10 -07:00 |
|
Tim Dettmers
|
f89ff93e26
|
Initial 4-bit naive batch size 1, 81 vs 185.
|
2023-07-03 18:45:38 -07:00 |
|
Tim Dettmers
|
4395d68cf6
|
Release 0.39.1.
|
2023-06-19 19:40:41 -07:00 |
|
Tim Dettmers
|
2d321a7524
|
Merge pull request #503 from TimDettmers/efficient_8bit_serialize
Make 8-bit serialization more memory-efficient (v2)
|
2023-06-19 11:28:30 -07:00 |
|
Max Ryabinin
|
b599fdb197
|
Only rearrange weight if it exists
|
2023-06-14 19:27:13 +02:00 |
|
Max Ryabinin
|
c1f3f56d2c
|
Rearrange the weights directly in state dict before loading
|
2023-06-09 21:58:39 +02:00 |
|
Max Ryabinin
|
f734076e94
|
Improve memory efficiency of 8-bit serialization
|
2023-06-09 21:39:57 +02:00 |
|
Max Ryabinin
|
4fb37d45c1
|
Extract get_tile_inds to a separate function
|
2023-06-09 21:39:37 +02:00 |
|
shadeMe
|
db49ad43ab
|
Add device parameter to Embedding
|
2023-06-01 17:43:49 +02:00 |
|
shadeMe
|
9cac5dd1b6
|
Add device parameter to Linear subclasses
|
2023-06-01 17:43:30 +02:00 |
|
Tim Dettmers
|
e54d2730fc
|
Added debugging functions.
|
2023-05-30 20:42:21 -07:00 |
|
Tim Dettmers
|
b7f04e2a20
|
Added lookup table.
|
2023-05-30 20:07:05 -07:00 |
|
Tim Dettmers
|
ac5550a023
|
Added changes for deployment.
|
2023-05-30 19:06:59 -07:00 |
|
Tim Dettmers
|
0f40fa3f0a
|
Bumped version.
|
2023-05-23 19:55:52 -07:00 |
|
Tim Dettmers
|
1b8772a8f3
|
Added PagedLion and bf16 Lion.
|
2023-05-23 19:37:38 -07:00 |
|
Tim Dettmers
|
2bce175d15
|
Fixed Makefile.
|
2023-05-23 18:42:19 -07:00 |
|
Tim Dettmers
|
4bd1151829
|
Fixed gradient accumulation test.
|
2023-05-07 15:06:17 -07:00 |
|
Tim Dettmers
|
675baa79d2
|
Merge remote-tracking branch 'origin/main' into merge
|
2023-05-07 13:34:03 -07:00 |
|
Tim Dettmers
|
f64cfe65aa
|
Fixed prefetch bug for non-paged tensors; added benchmark.
|
2023-05-06 21:49:16 -07:00 |
|
Tim Dettmers
|
41a9c70814
|
Changed prefetching.
|
2023-05-06 18:59:59 -07:00 |
|
Tim Dettmers
|
44d68ff29c
|
Added paged optimizers.
|
2023-05-06 14:59:29 -07:00 |
|
Tim Dettmers
|
ec38ba95b0
|
Added paging.
|
2023-05-06 11:14:06 -07:00 |
|
Tim Dettmers
|
264a948539
|
4-bit draft; 128 vector load 240.
|
2023-05-02 16:15:38 -07:00 |
|
Tim Dettmers
|
869b7e83b5
|
Warp multi-specialization 240.
|
2023-05-02 12:10:32 -07:00 |
|
Tim Dettmers
|
77f15fdce9
|
Shared memory efficient 240.
|
2023-05-02 11:38:11 -07:00 |
|
Tim Dettmers
|
89cccd8196
|
A tile multi-tiling.
|
2023-05-02 09:40:31 -07:00 |
|
Tim Dettmers
|
4decb3cc68
|
Removed uncessary sync.
|
2023-05-02 09:38:14 -07:00 |
|
Tim Dettmers
|
394749db71
|
Correct implementation 240.
|
2023-05-02 08:58:59 -07:00 |
|
Tim Dettmers
|
9aa232cc39
|
Initial.
|
2023-05-02 07:53:29 -07:00 |
|
Tim Dettmers
|
9192c9de64
|
Tighter and scaled error analysis.
|
2023-05-02 07:50:32 -07:00 |
|
Tim Dettmers
|
f9bfea8f23
|
Baseline for debugging.
|
2023-05-02 07:24:12 -07:00 |
|
Tim Dettmers
|
7bfa09d0fc
|
8x32 240 6 warps.
|
2023-05-01 16:38:09 -07:00 |
|
Tim Dettmers
|
3d4a2eadd3
|
16x16 240.
|
2023-05-01 16:23:45 -07:00 |
|
Tim Dettmers
|
7cc8ff4727
|
Warp specalization 362.
|
2023-05-01 08:21:12 -07:00 |
|
Tim Dettmers
|
cabcd9b9d5
|
Halved shared memory 466.
|
2023-04-30 19:12:42 -07:00 |
|
Tim Dettmers
|
30d03e0254
|
64 threads, high smem, 434.
|
2023-04-30 18:55:12 -07:00 |
|
Tim Dettmers
|
e01d4e033d
|
Fixed bank conflicts in non-vector load 422.
|
2023-04-30 18:28:52 -07:00 |
|
Tim Dettmers
|
c35ed09b66
|
Double frag 440.
|
2023-04-30 18:19:30 -07:00 |
|