Tim Dettmers
|
1b8772a8f3
|
Added PagedLion and bf16 Lion.
|
2023-05-23 19:37:38 -07:00 |
|
Tim Dettmers
|
2bce175d15
|
Fixed Makefile.
|
2023-05-23 18:42:19 -07:00 |
|
Tim Dettmers
|
4bd1151829
|
Fixed gradient accumulation test.
|
2023-05-07 15:06:17 -07:00 |
|
Tim Dettmers
|
675baa79d2
|
Merge remote-tracking branch 'origin/main' into merge
|
2023-05-07 13:34:03 -07:00 |
|
Tim Dettmers
|
f64cfe65aa
|
Fixed prefetch bug for non-paged tensors; added benchmark.
|
2023-05-06 21:49:16 -07:00 |
|
Tim Dettmers
|
44d68ff29c
|
Added paged optimizers.
|
2023-05-06 14:59:29 -07:00 |
|
Tim Dettmers
|
ec38ba95b0
|
Added paging.
|
2023-05-06 11:14:06 -07:00 |
|
Tim Dettmers
|
264a948539
|
4-bit draft; 128 vector load 240.
|
2023-05-02 16:15:38 -07:00 |
|
Tim Dettmers
|
869b7e83b5
|
Warp multi-specialization 240.
|
2023-05-02 12:10:32 -07:00 |
|
Tim Dettmers
|
77f15fdce9
|
Shared memory efficient 240.
|
2023-05-02 11:38:11 -07:00 |
|
Tim Dettmers
|
394749db71
|
Correct implementation 240.
|
2023-05-02 08:58:59 -07:00 |
|
Tim Dettmers
|
9aa232cc39
|
Initial.
|
2023-05-02 07:53:29 -07:00 |
|
Tim Dettmers
|
9192c9de64
|
Tighter and scaled error analysis.
|
2023-05-02 07:50:32 -07:00 |
|
Tim Dettmers
|
f9bfea8f23
|
Baseline for debugging.
|
2023-05-02 07:24:12 -07:00 |
|
Tim Dettmers
|
7cc8ff4727
|
Warp specalization 362.
|
2023-05-01 08:21:12 -07:00 |
|
Tim Dettmers
|
c35ed09b66
|
Double frag 440.
|
2023-04-30 18:19:30 -07:00 |
|
Tim Dettmers
|
ad07d254fb
|
Slow tensor core solution.
|
2023-04-30 17:43:02 -07:00 |
|
Tim Dettmers
|
21723f796a
|
4-bit draft.
|
2023-04-29 21:52:47 -07:00 |
|
Tim Dettmers
|
cad839941b
|
Added bit template.
|
2023-04-28 22:10:42 -07:00 |
|
Tim Dettmers
|
f3e97ccbd2
|
New implementation for batch size 1.
|
2023-04-28 21:29:40 -07:00 |
|
Tim Dettmers
|
f6df4aef6a
|
Added fp16 and thread/item template.
|
2023-04-28 18:26:52 -07:00 |
|
Tim Dettmers
|
c1bfb210c5
|
First baseline kernel.
|
2023-04-28 17:19:02 -07:00 |
|
Tim Dettmers
|
9cab14a3ff
|
Adedd pipeline draft.
|
2023-04-27 15:12:49 -07:00 |
|
Tim Dettmers
|
d1c4c20568
|
Added non-cutlass template.
|
2023-04-27 15:11:26 -07:00 |
|
Tim Dettmers
|
0afc8e9e2f
|
Best attempt at cutlass3.
|
2023-04-26 17:12:34 -07:00 |
|
Tim Dettmers
|
0f9d30207f
|
Added nested quantization for blockwise quantization.
|
2023-04-19 11:48:47 -07:00 |
|
Tim Dettmers
|
7dc198feb7
|
Added 32-bit optimizer for bfloat16 gradients.
|
2023-04-17 18:01:49 -07:00 |
|
Tim Dettmers
|
9e7cdc9ea9
|
Added last SwitchBack refactors. All tests green.
|
2023-04-12 13:41:30 -07:00 |
|
Tim Dettmers
|
7140c01405
|
Merge branch 'main' into fp8_merge
|
2023-04-12 11:44:39 -07:00 |
|
Tim Dettmers
|
dd562c24f1
|
Refactored simulated fp8 modules into research.nn.
|
2023-04-12 11:24:44 -07:00 |
|
Tim Dettmers
|
ec1ea63711
|
Refactored triton into its own folder. Refactored fp8 matmuls.
|
2023-04-12 09:39:39 -07:00 |
|
Tim Dettmers
|
4cd63deff3
|
Fixed CUDA Conda PyTorch 2.0 issues.
|
2023-04-11 12:10:20 -07:00 |
|
Tim Dettmers
|
2eb3108356
|
Fixed bug where beta2 was not passed into Lion 32-bit.
|
2023-04-11 09:16:01 -07:00 |
|
Tim Dettmers
|
792af5c883
|
Fixed noisy tests for 8-bit Lion.
|
2023-04-11 08:42:41 -07:00 |
|
Tim Dettmers
|
ed6f3eb146
|
Merge pull request #159 from TimDettmers/serialize_8bit
Implement proper serialization of Linear8bitLt
|
2023-04-11 07:24:51 -07:00 |
|
Tim Dettmers
|
e9fa03b717
|
Some fixed for loading PEFT modules with Params4bit.
|
2023-04-07 09:59:21 -07:00 |
|
Tim Dettmers
|
1ccb7bdec6
|
Fixed ParamsIn4 init; fixed PyTorch 2.0 test failure.
|
2023-04-03 18:47:00 -07:00 |
|
Tim Dettmers
|
4ea489d3bf
|
Refactor FP4 into 4Bit and integrate NF4 data type.
|
2023-04-03 11:00:12 -07:00 |
|
Tim Dettmers
|
64cc05920d
|
First draft of NF4.
|
2023-04-02 16:10:35 -07:00 |
|
Tim Dettmers
|
4ad999d144
|
Added quantization tree generation.
|
2023-04-02 14:42:45 -07:00 |
|
Tim Dettmers
|
0d332a641f
|
Added normal with extra value.
|
2023-04-02 14:09:08 -07:00 |
|
Tim Dettmers
|
2dd5d69056
|
Generalized FP4 data type.
|
2023-04-02 12:42:01 -07:00 |
|
Tim Dettmers
|
51a21df728
|
Added 8-bit compression to quantization statistics.
|
2023-04-01 16:10:18 -07:00 |
|
Mitchell Wortsman
|
7f87ba83ee
|
cleaning and refactor
|
2023-04-01 18:46:04 +00:00 |
|
Tim Dettmers
|
c4cfe4fbdd
|
Added bf16 Adam.
|
2023-04-01 10:33:03 -07:00 |
|
Tim Dettmers
|
30d21d585c
|
Added triton test.
|
2023-03-31 11:33:26 -07:00 |
|
Tim Dettmers
|
a13a522c4c
|
Added first triton test.
|
2023-03-31 11:20:54 -07:00 |
|
Tim Dettmers
|
8645d1f71c
|
Added normal quant.
|
2023-03-29 18:41:37 -07:00 |
|
Mitchell Wortsman
|
b373034e31
|
test
|
2023-03-29 19:04:53 +00:00 |
|
Mitchell Wortsman
|
5f3d9ada8d
|
triton-v1
|
2023-03-29 06:47:08 +00:00 |
|