Commit Graph

478 Commits

Author SHA1 Message Date
Tim Dettmers
0d332a641f Added normal with extra value. 2023-04-02 14:09:08 -07:00
Tim Dettmers
2dd5d69056 Generalized FP4 data type. 2023-04-02 12:42:01 -07:00
Mitchell Wortsman
eb6c53cf55 clarify in readme 2023-04-01 23:50:12 +00:00
Tim Dettmers
51a21df728 Added 8-bit compression to quantization statistics. 2023-04-01 16:10:18 -07:00
Mitchell Wortsman
2331212b35 add readme for speed bench 2023-04-01 19:13:15 +00:00
Mitchell Wortsman
7f87ba83ee cleaning and refactor 2023-04-01 18:46:04 +00:00
Tim Dettmers
c4cfe4fbdd Added bf16 Adam. 2023-04-01 10:33:03 -07:00
Tim Dettmers
30d21d585c Added triton test. 2023-03-31 11:33:26 -07:00
Tim Dettmers
a13a522c4c Added first triton test. 2023-03-31 11:20:54 -07:00
Tim Dettmers
8645d1f71c Added normal quant. 2023-03-29 18:41:37 -07:00
Mitchell Wortsman
b373034e31 test 2023-03-29 19:04:53 +00:00
Mitchell Wortsman
5f3d9ada8d triton-v1 2023-03-29 06:47:08 +00:00
Tim Dettmers
69810521d3 Some small changes. 2023-03-27 09:12:57 -07:00
Mitchell Wortsman
51f8bb7133 pre-triton update 2023-03-24 05:44:42 +00:00
Ji Lin
b6383ba116 fix a bug in quantize_no_absmax and dequantize_no_absmax with multiple gpus 2023-03-22 22:14:57 -04:00
Phil Wang
2a6828e6fb fix comment 2023-03-22 09:56:50 -07:00
Phil Wang
978ba2db57 another tab/spaces fix 2023-03-22 09:33:47 -07:00
Phil Wang
916000c8bf fix consistent tabs / spaces 2023-03-22 09:27:13 -07:00
Phil Wang
aa9b939edd add some comments, and fix use of g_val 2023-03-22 09:22:19 -07:00
Phil Wang
a43cd2008d add some code in test_optim.py, although it seems to be failing 2023-03-22 09:14:05 -07:00
Phil Wang
9b656f461a follow advice of Tim to fix update of momentum vs parameters in blockwise 8 bit 2023-03-22 07:52:59 -07:00
Max Ryabinin
dcecbb26ca Add force_no_igemmlt to test params 2023-03-22 00:28:49 +01:00
Tim Dettmers
49a04253fb Bumped version for CUDA 12.1 support release. 2023-03-21 15:10:19 -07:00
Tim Dettmers
d032618d7f
Merge pull request #180 from ubik2/patch-1
Update compile_from_source.md to mention cuda12x target
2023-03-21 14:08:32 -07:00
Tim Dettmers
1b0aabc7e4 Added CUDA 12.1. addressing #201 2023-03-21 14:06:08 -07:00
Tim Dettmers
2c8352e316 Bumped version. 2023-03-12 10:24:25 -07:00
Tim Dettmers
ec5fbf4cc4
Merge pull request #115 from kashif/patch-1
Fix for python 3.7
2023-03-12 10:22:15 -07:00
Severin Gsponer
c4866ab06e Fix #157; Add XDG_GREETER_DATA_DIR to ignorelist 2023-03-11 15:35:23 +01:00
Phil Wang
369a51c432 switch all eps to beta2 2023-03-10 14:08:35 -08:00
Phil Wang
6c377b39b6 always pass beta2 into all the 1state functions 2023-03-10 13:00:59 -08:00
Phil Wang
abbe65adfc beta2 is actually accessible in kOptimizerStatic8bit1StateBlockwise 2023-03-10 12:50:14 -08:00
Phil Wang
19b9ef34b9 whoops 2023-03-10 08:59:49 -08:00
Phil Wang
c99b44f774 do the epsilon beta2 switcharoo within the cuda code, and not within the python class (so that the state dict still makes sense) 2023-03-10 08:57:59 -08:00
Phil Wang
8618bed001 swap the order in which momentum and parameters are updated in ops.cu 2023-03-10 08:39:06 -08:00
Phil Wang
c5582724d5 missed adagrad 2023-03-09 14:05:45 -08:00
Phil Wang
af03430992 fix weight decay for lion to be decoupled, using a switch 2023-03-09 14:03:07 -08:00
Phil Wang
ead570a43e remove something rmsprop specific 2023-03-09 11:58:31 -08:00
Phil Wang
c83888aa1a use epsilon as beta2 for lion, complete most of the logic in kernel.cu for all functions 2023-03-09 11:54:54 -08:00
Phil Wang
64bb1ae8d1 add a sign function, for lion 2023-03-09 11:10:28 -08:00
Phil Wang
8de29fc364 forget about tests for now, will test live on local enwik8 training 2023-03-09 10:11:32 -08:00
Phil Wang
cb4c3c8c66 do a bunch of typical bookkeeping before getting to main lion logic 2023-03-09 10:10:19 -08:00
Phil Wang
d43ea9722c make sure interface is correct 2023-03-09 09:45:33 -08:00
Phil Wang
7247cb4554 initial commit, slowly work from interface into the kernel 2023-03-09 08:08:46 -08:00
ubik2
dba11b0b2e
Update compile_from_source.md
Add cuda12x to the list of targets
2023-03-06 16:57:57 -08:00
Artidoro Pagnoni
6c31a5fe99 t5 model fix 2023-02-27 14:23:21 -08:00
Max Ryabinin
24609b66af Reduce diff 2023-02-25 06:24:58 +01:00
Max Ryabinin
d15822a54b Refactor _tile_indices into a cached property, fix device bug 2023-02-25 06:23:07 +01:00
Max Ryabinin
cc608c04c2 Revert the layout if weights were reordered 2023-02-25 06:02:06 +01:00
Max Ryabinin
cd4d904a4c Raise an error when loading a quantized checkpoint before quantization 2023-02-25 06:01:34 +01:00
Max Ryabinin
ac3ab281e3 Handle more cases in test_linear_serialization 2023-02-25 06:01:04 +01:00