Commit Graph

488 Commits

Author SHA1 Message Date
Tim Dettmers
72efa32962
Merge pull request #292 from justheuristic/patch-2
Support nvidia16 GPUs
2023-04-11 07:14:12 -07:00
justheuristic
5e456be50e
Support 1650, 1660 2023-04-10 21:26:52 +03:00
Mitchell Wortsman
d677a71607 typo 2023-04-08 19:36:17 +00:00
Mitchell Wortsman
da524d97c9 mem efficient" 2023-04-08 19:34:18 +00:00
Tim Dettmers
e9fa03b717 Some fixed for loading PEFT modules with Params4bit. 2023-04-07 09:59:21 -07:00
Jeongseok Kang
8cceff72db Fixed typo libsbitsandbytes_cpu.so 2023-04-05 09:28:41 +09:00
Tim Dettmers
1ccb7bdec6 Fixed ParamsIn4 init; fixed PyTorch 2.0 test failure. 2023-04-03 18:47:00 -07:00
Tim Dettmers
4ea489d3bf Refactor FP4 into 4Bit and integrate NF4 data type. 2023-04-03 11:00:12 -07:00
Tim Dettmers
64cc05920d First draft of NF4. 2023-04-02 16:10:35 -07:00
Tim Dettmers
4ad999d144 Added quantization tree generation. 2023-04-02 14:42:45 -07:00
Tim Dettmers
0d332a641f Added normal with extra value. 2023-04-02 14:09:08 -07:00
Tim Dettmers
2dd5d69056 Generalized FP4 data type. 2023-04-02 12:42:01 -07:00
Mitchell Wortsman
eb6c53cf55 clarify in readme 2023-04-01 23:50:12 +00:00
Tim Dettmers
51a21df728 Added 8-bit compression to quantization statistics. 2023-04-01 16:10:18 -07:00
Mitchell Wortsman
2331212b35 add readme for speed bench 2023-04-01 19:13:15 +00:00
Mitchell Wortsman
7f87ba83ee cleaning and refactor 2023-04-01 18:46:04 +00:00
Tim Dettmers
c4cfe4fbdd Added bf16 Adam. 2023-04-01 10:33:03 -07:00
Tim Dettmers
30d21d585c Added triton test. 2023-03-31 11:33:26 -07:00
Tim Dettmers
a13a522c4c Added first triton test. 2023-03-31 11:20:54 -07:00
Tim Dettmers
8645d1f71c Added normal quant. 2023-03-29 18:41:37 -07:00
Mitchell Wortsman
b373034e31 test 2023-03-29 19:04:53 +00:00
Mitchell Wortsman
5f3d9ada8d triton-v1 2023-03-29 06:47:08 +00:00
Tim Dettmers
69810521d3 Some small changes. 2023-03-27 09:12:57 -07:00
Mitchell Wortsman
51f8bb7133 pre-triton update 2023-03-24 05:44:42 +00:00
Ji Lin
b6383ba116 fix a bug in quantize_no_absmax and dequantize_no_absmax with multiple gpus 2023-03-22 22:14:57 -04:00
Phil Wang
2a6828e6fb fix comment 2023-03-22 09:56:50 -07:00
Phil Wang
978ba2db57 another tab/spaces fix 2023-03-22 09:33:47 -07:00
Phil Wang
916000c8bf fix consistent tabs / spaces 2023-03-22 09:27:13 -07:00
Phil Wang
aa9b939edd add some comments, and fix use of g_val 2023-03-22 09:22:19 -07:00
Phil Wang
a43cd2008d add some code in test_optim.py, although it seems to be failing 2023-03-22 09:14:05 -07:00
Phil Wang
9b656f461a follow advice of Tim to fix update of momentum vs parameters in blockwise 8 bit 2023-03-22 07:52:59 -07:00
Max Ryabinin
dcecbb26ca Add force_no_igemmlt to test params 2023-03-22 00:28:49 +01:00
Tim Dettmers
49a04253fb Bumped version for CUDA 12.1 support release. 2023-03-21 15:10:19 -07:00
Tim Dettmers
d032618d7f
Merge pull request #180 from ubik2/patch-1
Update compile_from_source.md to mention cuda12x target
2023-03-21 14:08:32 -07:00
Tim Dettmers
1b0aabc7e4 Added CUDA 12.1. addressing #201 2023-03-21 14:06:08 -07:00
Tim Dettmers
2c8352e316 Bumped version. 2023-03-12 10:24:25 -07:00
Tim Dettmers
ec5fbf4cc4
Merge pull request #115 from kashif/patch-1
Fix for python 3.7
2023-03-12 10:22:15 -07:00
Severin Gsponer
c4866ab06e Fix #157; Add XDG_GREETER_DATA_DIR to ignorelist 2023-03-11 15:35:23 +01:00
Phil Wang
369a51c432 switch all eps to beta2 2023-03-10 14:08:35 -08:00
Phil Wang
6c377b39b6 always pass beta2 into all the 1state functions 2023-03-10 13:00:59 -08:00
Phil Wang
abbe65adfc beta2 is actually accessible in kOptimizerStatic8bit1StateBlockwise 2023-03-10 12:50:14 -08:00
Phil Wang
19b9ef34b9 whoops 2023-03-10 08:59:49 -08:00
Phil Wang
c99b44f774 do the epsilon beta2 switcharoo within the cuda code, and not within the python class (so that the state dict still makes sense) 2023-03-10 08:57:59 -08:00
Phil Wang
8618bed001 swap the order in which momentum and parameters are updated in ops.cu 2023-03-10 08:39:06 -08:00
Phil Wang
c5582724d5 missed adagrad 2023-03-09 14:05:45 -08:00
Phil Wang
af03430992 fix weight decay for lion to be decoupled, using a switch 2023-03-09 14:03:07 -08:00
Phil Wang
ead570a43e remove something rmsprop specific 2023-03-09 11:58:31 -08:00
Phil Wang
c83888aa1a use epsilon as beta2 for lion, complete most of the logic in kernel.cu for all functions 2023-03-09 11:54:54 -08:00
Phil Wang
64bb1ae8d1 add a sign function, for lion 2023-03-09 11:10:28 -08:00
Phil Wang
8de29fc364 forget about tests for now, will test live on local enwik8 training 2023-03-09 10:11:32 -08:00