Commit Graph

36 Commits

Author SHA1 Message Date
Tim Dettmers
0b2ebcdab9 Added launch bounds to fix launch resource error for Lion. 2023-04-11 08:37:02 -07:00
Phil Wang
2a6828e6fb fix comment 2023-03-22 09:56:50 -07:00
Phil Wang
978ba2db57 another tab/spaces fix 2023-03-22 09:33:47 -07:00
Phil Wang
916000c8bf fix consistent tabs / spaces 2023-03-22 09:27:13 -07:00
Phil Wang
aa9b939edd add some comments, and fix use of g_val 2023-03-22 09:22:19 -07:00
Phil Wang
9b656f461a follow advice of Tim to fix update of momentum vs parameters in blockwise 8 bit 2023-03-22 07:52:59 -07:00
Phil Wang
369a51c432 switch all eps to beta2 2023-03-10 14:08:35 -08:00
Phil Wang
6c377b39b6 always pass beta2 into all the 1state functions 2023-03-10 13:00:59 -08:00
Phil Wang
abbe65adfc beta2 is actually accessible in kOptimizerStatic8bit1StateBlockwise 2023-03-10 12:50:14 -08:00
Phil Wang
c5582724d5 missed adagrad 2023-03-09 14:05:45 -08:00
Phil Wang
af03430992 fix weight decay for lion to be decoupled, using a switch 2023-03-09 14:03:07 -08:00
Phil Wang
ead570a43e remove something rmsprop specific 2023-03-09 11:58:31 -08:00
Phil Wang
c83888aa1a use epsilon as beta2 for lion, complete most of the logic in kernel.cu for all functions 2023-03-09 11:54:54 -08:00
Phil Wang
64bb1ae8d1 add a sign function, for lion 2023-03-09 11:10:28 -08:00
Phil Wang
cb4c3c8c66 do a bunch of typical bookkeeping before getting to main lion logic 2023-03-09 10:10:19 -08:00
Tim Dettmers
c91f592ad7
Merge branch 'main' into cleanup 2023-01-02 11:19:16 +01:00
Tim Dettmers
c059bd2848 Added additional blocksizes: {64, 128, 256}. 2022-11-20 14:18:15 -08:00
Tom Aarsen
b104ce3b62
Merge branch 'main' into cleanup 2022-11-17 15:22:29 +01:00
Tim Dettmers
08fa2e7b01 Fixed bug in cpu quant; faster GPU dequant. 2022-11-07 18:06:18 -08:00
Tim Dettmers
6bc2b992be Added blocksizes 2048, 1024, and 512 to blockwise quant. 2022-11-06 16:27:48 -08:00
Tom Aarsen
1eec77d34c Remove trailing whitespace & ensure newline at EOF 2022-10-27 13:11:29 +02:00
Tim Dettmers
dede343033 Added fused bias in dequant_mm. 2022-08-16 11:12:09 -07:00
Tim Dettmers
1ed2fa2f21 Removed storage() from get_ptr; added boilerplate for bias dequant_mm. 2022-08-16 10:56:17 -07:00
Tim Dettmers
5737f2b027 Merge branch 'patch_merge' into extract_outliers 2022-07-26 19:38:01 -07:00
Tim Dettmers
32fa459ed7 Added col_ampere outlier extraction kernel. 2022-07-26 18:15:51 -07:00
Tim Dettmers
bcab99ec87 Working outlier extraction for Turing. 2022-07-26 17:39:30 -07:00
Tim Dettmers
cbb901ac51 Boilerplate and test for extract_outliers. 2022-07-26 12:12:38 -07:00
Tim Dettmers
9268dc9d88 Some progress on build script; added multi-cuda install script. 2022-07-25 19:30:37 -07:00
Tim Dettmers
7d2ecd30c0 Fixed rowcol synchronization bug. 2022-07-22 15:21:37 -07:00
Tim Dettmers
c771b3a75a Most tests passing. 2022-07-22 14:41:05 -07:00
Tim Dettmers
2f8083bd8b Added AdamW. #10 #13 2021-11-28 21:18:11 -08:00
Tim Dettmers
8b3c0f355c Added adagrad with tests (no clipping). 2021-11-10 15:10:02 -08:00
Tim Dettmers
0fb378b4ee Added compilation from source instructions; easier compilation. 2021-10-21 17:22:43 -07:00
Tim Dettmers
a6eae2e7f2 Added skip_zeros; tests are passing. 2021-10-20 19:15:47 -07:00
Tim Dettmers
bb34fd50a1 Initial plumbing for skip_zeros. 2021-10-20 18:37:44 -07:00
Tim Dettmers
7439924891 Initial commit 2021-10-05 19:16:20 -07:00