Commit Graph

44 Commits

Author SHA1 Message Date
Phil Wang
c99b44f774 do the epsilon beta2 switcharoo within the cuda code, and not within the python class (so that the state dict still makes sense) 2023-03-10 08:57:59 -08:00
Phil Wang
8618bed001 swap the order in which momentum and parameters are updated in ops.cu 2023-03-10 08:39:06 -08:00
Phil Wang
c5582724d5 missed adagrad 2023-03-09 14:05:45 -08:00
Phil Wang
af03430992 fix weight decay for lion to be decoupled, using a switch 2023-03-09 14:03:07 -08:00
Phil Wang
ead570a43e remove something rmsprop specific 2023-03-09 11:58:31 -08:00
Phil Wang
c83888aa1a use epsilon as beta2 for lion, complete most of the logic in kernel.cu for all functions 2023-03-09 11:54:54 -08:00
Phil Wang
64bb1ae8d1 add a sign function, for lion 2023-03-09 11:10:28 -08:00
Phil Wang
cb4c3c8c66 do a bunch of typical bookkeeping before getting to main lion logic 2023-03-09 10:10:19 -08:00
Tim Dettmers
c91f592ad7
Merge branch 'main' into cleanup 2023-01-02 11:19:16 +01:00
Tim Dettmers
c059bd2848 Added additional blocksizes: {64, 128, 256}. 2022-11-20 14:18:15 -08:00
Tom Aarsen
b104ce3b62
Merge branch 'main' into cleanup 2022-11-17 15:22:29 +01:00
Tim Dettmers
08fa2e7b01 Fixed bug in cpu quant; faster GPU dequant. 2022-11-07 18:06:18 -08:00
Tim Dettmers
6bc2b992be Added blocksizes 2048, 1024, and 512 to blockwise quant. 2022-11-06 16:27:48 -08:00
Tom Aarsen
1eec77d34c Remove trailing whitespace & ensure newline at EOF 2022-10-27 13:11:29 +02:00
Tim Dettmers
c05dd42ddd Fixed cpu blockwise quantization for small input tensors. 2022-09-13 10:37:53 -07:00
Tim Dettmers
19a7adca7a Fixed 2^31 max size issue for cpu blockwise quant. 2022-09-11 11:55:09 -07:00
Tim Dettmers
ee5b947e63 Fixed issue where Pascal was not displaying proper error. 2022-08-23 16:00:26 -07:00
Tim Dettmers
a6664de072 Enhanced error handling in CUDA SETUP failures. 2022-08-16 19:03:19 -07:00
Tim Dettmers
dede343033 Added fused bias in dequant_mm. 2022-08-16 11:12:09 -07:00
Tim Dettmers
1ed2fa2f21 Removed storage() from get_ptr; added boilerplate for bias dequant_mm. 2022-08-16 10:56:17 -07:00
Tim Dettmers
a4532c59f7 Removed faulty asserts. 2022-08-06 09:31:05 -07:00
Tim Dettmers
cc5b323876 Merge branch 'extract_outliers' into debug 2022-08-04 07:40:48 -07:00
Tim Dettmers
451fd9506e Added fixes for the case that matmullt dim A is zero, e.g. [0, 768]. 2022-08-03 11:54:01 -07:00
Tim Dettmers
2f01865a2f Added CUDA block assert and is_on_gpu check. 2022-08-03 09:05:37 -07:00
Tim Dettmers
5737f2b027 Merge branch 'patch_merge' into extract_outliers 2022-07-26 19:38:01 -07:00
Tim Dettmers
32fa459ed7 Added col_ampere outlier extraction kernel. 2022-07-26 18:15:51 -07:00
Tim Dettmers
bcab99ec87 Working outlier extraction for Turing. 2022-07-26 17:39:30 -07:00
Tim Dettmers
cbb901ac51 Boilerplate and test for extract_outliers. 2022-07-26 12:12:38 -07:00
Tim Dettmers
953b7285dd Fixed cpuonly build. 2022-07-26 09:12:16 -07:00
Tim Dettmers
9268dc9d88 Some progress on build script; added multi-cuda install script. 2022-07-25 19:30:37 -07:00
Tim Dettmers
8b1fd32e3e Fixed makefile; fixed Ampere igemmlt_8 bug. 2022-07-25 14:02:14 -07:00
Tim Dettmers
7d2ecd30c0 Fixed rowcol synchronization bug. 2022-07-22 15:21:37 -07:00
Tim Dettmers
c771b3a75a Most tests passing. 2022-07-22 14:41:05 -07:00
Max Ryabinin
025824d29b Reduce diff 2022-07-01 17:42:58 +03:00
Max Ryabinin
575aa698fa Reduce diff 2022-07-01 17:41:48 +03:00
Max Ryabinin
4d1d5b569f Reduce diff 2022-07-01 17:40:02 +03:00
Max Ryabinin
31ce1b3708 Reduce diff 2022-07-01 17:36:30 +03:00
Max Ryabinin
8258b4364a Add a CPU-only build option 2022-07-01 17:16:10 +03:00
Tim Dettmers
2f8083bd8b Added AdamW. #10 #13 2021-11-28 21:18:11 -08:00
Tim Dettmers
8b3c0f355c Added adagrad with tests (no clipping). 2021-11-10 15:10:02 -08:00
Tim Dettmers
0fb378b4ee Added compilation from source instructions; easier compilation. 2021-10-21 17:22:43 -07:00
Tim Dettmers
a6eae2e7f2 Added skip_zeros; tests are passing. 2021-10-20 19:15:47 -07:00
Tim Dettmers
bb34fd50a1 Initial plumbing for skip_zeros. 2021-10-20 18:37:44 -07:00
Tim Dettmers
7439924891 Initial commit 2021-10-05 19:16:20 -07:00