Commit Graph

265 Commits

Author SHA1 Message Date
Phil Wang
c99b44f774 do the epsilon beta2 switcharoo within the cuda code, and not within the python class (so that the state dict still makes sense) 2023-03-10 08:57:59 -08:00
Phil Wang
8618bed001 swap the order in which momentum and parameters are updated in ops.cu 2023-03-10 08:39:06 -08:00
Phil Wang
c5582724d5 missed adagrad 2023-03-09 14:05:45 -08:00
Phil Wang
af03430992 fix weight decay for lion to be decoupled, using a switch 2023-03-09 14:03:07 -08:00
Phil Wang
ead570a43e remove something rmsprop specific 2023-03-09 11:58:31 -08:00
Phil Wang
c83888aa1a use epsilon as beta2 for lion, complete most of the logic in kernel.cu for all functions 2023-03-09 11:54:54 -08:00
Phil Wang
64bb1ae8d1 add a sign function, for lion 2023-03-09 11:10:28 -08:00
Phil Wang
8de29fc364 forget about tests for now, will test live on local enwik8 training 2023-03-09 10:11:32 -08:00
Phil Wang
cb4c3c8c66 do a bunch of typical bookkeeping before getting to main lion logic 2023-03-09 10:10:19 -08:00
Phil Wang
d43ea9722c make sure interface is correct 2023-03-09 09:45:33 -08:00
Phil Wang
7247cb4554 initial commit, slowly work from interface into the kernel 2023-03-09 08:08:46 -08:00
Tim Dettmers
0f5c394870 Added version 0.37.0. 2023-02-01 20:27:01 -08:00
Tim Dettmers
de53588934 Added Int8 matmul support for all GPUs. Full backward support. 2023-02-01 20:09:31 -08:00
Tim Dettmers
92ab6a8d5f
Merge pull request #119 from stas00/patch-1
improve install instructions
2023-02-01 19:21:36 -08:00
Stas Bekman
c5372a8567
improve install instructions 2023-01-05 13:34:51 -08:00
Tim Dettmers
1341fb44ad Fixed issue where the CUDA SETUP was not printed. 2023-01-04 03:50:53 -08:00
Tim Dettmers
3901ebf7ae Added CUDA 12.0 support; removed CC 3.0 support. 2023-01-04 02:28:33 -08:00
Tim Dettmers
b3de19218e Added error message for unexpected CUDA exception. 2023-01-03 06:57:07 -08:00
Tim Dettmers
81990491ff
Merge pull request #113 from Borzik/fix-warnings
Import missing warn function
2023-01-03 15:46:58 +01:00
Tim Dettmers
9180b4cc11 Added additional error message for cudart error #85 2023-01-03 06:44:11 -08:00
Tim Dettmers
dfb049f8e4 Added Python >= 3.8 requirement. 2023-01-03 06:20:06 -08:00
Tim Dettmers
211ad594df Added error+instructions for unsupported CUDA 10.0 version #82 2023-01-03 06:07:35 -08:00
Felix Borzik
f3800bab75 import warn function 2023-01-03 13:23:34 +00:00
Tim Dettmers
9d353ca786
Merge pull request #87 from lostmsu/main
Add `device` and `dtype` parameters to `StableEmbedding`
2023-01-02 13:22:45 +01:00
Tim Dettmers
7a6563b6c8 Default to CPU library on CUDA error+small refactor. 2023-01-02 03:47:09 -08:00
Tim Dettmers
d9112dc55b
Merge pull request #110 from BlackHC/cublaslt_version
Improve cc version detection for cublaslt
2023-01-02 12:35:53 +01:00
Tim Dettmers
336e24696c CUDASetup only executed once + fixed circular import. 2023-01-02 03:31:43 -08:00
Tim Dettmers
df9a9b0c4c
Merge pull request #77 from Cyberes/main
Allow hiding of the welcome message
2023-01-02 11:28:17 +01:00
Tim Dettmers
be5cecb88f
Merge branch 'main' into main 2023-01-02 11:23:17 +01:00
Tim Dettmers
f0ec93d016
Merge pull request #76 from tomaarsen/cleanup
Cleanup involving a handful of failures, some optimization and a lot of code quality improvements
2023-01-02 11:19:28 +01:00
Tim Dettmers
c91f592ad7
Merge branch 'main' into cleanup 2023-01-02 11:19:16 +01:00
blackhc
ed17aa9a31 Don't mark it as failure though. 2022-12-29 23:50:48 +00:00
blackhc
7b39a5511d Fix issue #97 2022-12-29 23:47:21 +00:00
Tim Dettmers
c059bd2848 Added additional blocksizes: {64, 128, 256}. 2022-11-20 14:18:15 -08:00
Tim Dettmers
eb028e6ebc Fixed k-bit quantization maps. 2022-11-19 07:24:03 -08:00
Tom Aarsen
b104ce3b62
Merge branch 'main' into cleanup 2022-11-17 15:22:29 +01:00
Tim Dettmers
08fa2e7b01 Fixed bug in cpu quant; faster GPU dequant. 2022-11-07 18:06:18 -08:00
Tim Dettmers
62a333ac40 Added pre/post calls do quantize_blockwise. 2022-11-06 17:17:51 -08:00
Tim Dettmers
e0e697b150 Fixed blockwise test and logic. 2022-11-06 16:36:31 -08:00
Tim Dettmers
6bc2b992be Added blocksizes 2048, 1024, and 512 to blockwise quant. 2022-11-06 16:27:48 -08:00
Tim Dettmers
2f2063bac2 Added k<256 quantile estimate. 2022-11-06 13:05:25 -08:00
Tim Dettmers
98cbc4bc4f Added k-bit fp8 map. 2022-11-06 11:59:37 -08:00
Tim Dettmers
caf1832526 Added k-bit linear quantization. 2022-11-06 11:47:54 -08:00
Victor Nova
62d39a237c
add device and dtype parameters to StableEmbedding 2022-11-04 14:12:46 -07:00
Tim Dettmers
1efb87d89d Added FP8 quantization map. 2022-11-03 19:49:50 -07:00
Tom Aarsen
62c0bd2278 Fix several typos in logging and comments
Via codespell
2022-11-01 09:53:47 +01:00
Tom Aarsen
d504050ff7 Call isort over cuda_setup/main.py 2022-11-01 09:46:03 +01:00
Tom Aarsen
30f28b94a0
Merge branch 'main' into cleanup 2022-11-01 09:43:49 +01:00
Tim Dettmers
8d87c0b852 Fixed CUDA setup bugs, including #81. 2022-10-31 18:04:49 -07:00
adpkadspokasdk
8724c990c7 allow hiding of the welcome message 2022-10-27 16:04:49 -06:00