Max Ryabinin
|
b599fdb197
|
Only rearrange weight if it exists
|
2023-06-14 19:27:13 +02:00 |
|
Max Ryabinin
|
c1f3f56d2c
|
Rearrange the weights directly in state dict before loading
|
2023-06-09 21:58:39 +02:00 |
|
Max Ryabinin
|
f734076e94
|
Improve memory efficiency of 8-bit serialization
|
2023-06-09 21:39:57 +02:00 |
|
Max Ryabinin
|
4fb37d45c1
|
Extract get_tile_inds to a separate function
|
2023-06-09 21:39:37 +02:00 |
|
Tim Dettmers
|
1b8772a8f3
|
Added PagedLion and bf16 Lion.
|
2023-05-23 19:37:38 -07:00 |
|
Tim Dettmers
|
2bce175d15
|
Fixed Makefile.
|
2023-05-23 18:42:19 -07:00 |
|
Tim Dettmers
|
4bd1151829
|
Fixed gradient accumulation test.
|
2023-05-07 15:06:17 -07:00 |
|
Tim Dettmers
|
675baa79d2
|
Merge remote-tracking branch 'origin/main' into merge
|
2023-05-07 13:34:03 -07:00 |
|
Tim Dettmers
|
f64cfe65aa
|
Fixed prefetch bug for non-paged tensors; added benchmark.
|
2023-05-06 21:49:16 -07:00 |
|
Tim Dettmers
|
41a9c70814
|
Changed prefetching.
|
2023-05-06 18:59:59 -07:00 |
|
Tim Dettmers
|
44d68ff29c
|
Added paged optimizers.
|
2023-05-06 14:59:29 -07:00 |
|
Tim Dettmers
|
ec38ba95b0
|
Added paging.
|
2023-05-06 11:14:06 -07:00 |
|
Tim Dettmers
|
264a948539
|
4-bit draft; 128 vector load 240.
|
2023-05-02 16:15:38 -07:00 |
|
Tim Dettmers
|
f9bfea8f23
|
Baseline for debugging.
|
2023-05-02 07:24:12 -07:00 |
|
Tim Dettmers
|
21723f796a
|
4-bit draft.
|
2023-04-29 21:52:47 -07:00 |
|
Tim Dettmers
|
f6df4aef6a
|
Added fp16 and thread/item template.
|
2023-04-28 18:26:52 -07:00 |
|
Tim Dettmers
|
3aef78342a
|
Added template refactor.
|
2023-04-28 17:34:08 -07:00 |
|
Tim Dettmers
|
c1bfb210c5
|
First baseline kernel.
|
2023-04-28 17:19:02 -07:00 |
|
Tim Dettmers
|
9cab14a3ff
|
Adedd pipeline draft.
|
2023-04-27 15:12:49 -07:00 |
|
Tim Dettmers
|
d1c4c20568
|
Added non-cutlass template.
|
2023-04-27 15:11:26 -07:00 |
|
Tim Dettmers
|
0afc8e9e2f
|
Best attempt at cutlass3.
|
2023-04-26 17:12:34 -07:00 |
|
Tim Dettmers
|
84964db937
|
CUTLASS compiles.
|
2023-04-25 17:15:51 -07:00 |
|
Tim Dettmers
|
0f9d30207f
|
Added nested quantization for blockwise quantization.
|
2023-04-19 11:48:47 -07:00 |
|
Tim Dettmers
|
7dc198feb7
|
Added 32-bit optimizer for bfloat16 gradients.
|
2023-04-17 18:01:49 -07:00 |
|
Tim Dettmers
|
9e7cdc9ea9
|
Added last SwitchBack refactors. All tests green.
|
2023-04-12 13:41:30 -07:00 |
|
Tim Dettmers
|
008dfff9b4
|
Added triton utils.
|
2023-04-12 12:57:46 -07:00 |
|
Tim Dettmers
|
b8ea2b416d
|
Fixed bias conversion in Linear4bit
|
2023-04-12 12:28:35 -07:00 |
|
Tim Dettmers
|
5b612bc6df
|
Added is_available_triton guard to Triton SwitchBackLinear.
|
2023-04-12 12:16:55 -07:00 |
|
Tim Dettmers
|
c3d87e4435
|
Added is_available_triton guard.
|
2023-04-12 12:10:34 -07:00 |
|
Tim Dettmers
|
7140c01405
|
Merge branch 'main' into fp8_merge
|
2023-04-12 11:44:39 -07:00 |
|
Tim Dettmers
|
dd562c24f1
|
Refactored simulated fp8 modules into research.nn.
|
2023-04-12 11:24:44 -07:00 |
|
Tim Dettmers
|
e67bfccbcd
|
Added missing triton and fp8 files.
|
2023-04-12 10:06:18 -07:00 |
|
Tim Dettmers
|
ec1ea63711
|
Refactored triton into its own folder. Refactored fp8 matmuls.
|
2023-04-12 09:39:39 -07:00 |
|
Tim Dettmers
|
7c651012fc
|
Added better error message for debugging on CUDA not detected failures.
|
2023-04-12 07:56:52 -07:00 |
|
Tim Dettmers
|
659a7dfc71
|
Fixing #300.
|
2023-04-11 16:14:29 -07:00 |
|
Tim Dettmers
|
89e3b82731
|
Added more detailed cuda setup debug and debugging instructions.
|
2023-04-11 13:47:10 -07:00 |
|
Tim Dettmers
|
4cd63deff3
|
Fixed CUDA Conda PyTorch 2.0 issues.
|
2023-04-11 12:10:20 -07:00 |
|
Tim Dettmers
|
2bb5c00ba9
|
Added pre/post call to all lib calls. Fixes #120
|
2023-04-11 09:36:56 -07:00 |
|
Tim Dettmers
|
2eb3108356
|
Fixed bug where beta2 was not passed into Lion 32-bit.
|
2023-04-11 09:16:01 -07:00 |
|
Tim Dettmers
|
ed6f3eb146
|
Merge pull request #159 from TimDettmers/serialize_8bit
Implement proper serialization of Linear8bitLt
|
2023-04-11 07:24:51 -07:00 |
|
Tim Dettmers
|
b0ec20c3b3
|
Merge pull request #188 from lucidrains/main
Lion 8 bit
|
2023-04-11 07:22:45 -07:00 |
|
Tim Dettmers
|
d3e0e39def
|
Merge pull request #190 from svgsponer/Fix#157
Fix #157; Add XDG_GREETER_DATA_DIR to ignorelist
|
2023-04-11 07:20:16 -07:00 |
|
Tim Dettmers
|
c7875533ce
|
Merge pull request #213 from tonylins/dev/fix_no_absmax
Gix a bug in (de)quantize_no_absmax with multiple GPUs
|
2023-04-11 07:18:24 -07:00 |
|
Tim Dettmers
|
6b4c5afe21
|
Merge pull request #260 from rapsealk/fix_libsbitsandbytes_cpu_so
Fixed typo libsbitsandbytes_cpu.so
|
2023-04-11 07:15:42 -07:00 |
|
justheuristic
|
5e456be50e
|
Support 1650, 1660
|
2023-04-10 21:26:52 +03:00 |
|
Mitchell Wortsman
|
d677a71607
|
typo
|
2023-04-08 19:36:17 +00:00 |
|
Mitchell Wortsman
|
da524d97c9
|
mem efficient"
|
2023-04-08 19:34:18 +00:00 |
|
Tim Dettmers
|
e9fa03b717
|
Some fixed for loading PEFT modules with Params4bit.
|
2023-04-07 09:59:21 -07:00 |
|
Jeongseok Kang
|
8cceff72db
|
Fixed typo libsbitsandbytes_cpu.so
|
2023-04-05 09:28:41 +09:00 |
|
Tim Dettmers
|
1ccb7bdec6
|
Fixed ParamsIn4 init; fixed PyTorch 2.0 test failure.
|
2023-04-03 18:47:00 -07:00 |
|