Commit Graph

311 Commits

Author SHA1 Message Date
Tim Dettmers
7c651012fc Added better error message for debugging on CUDA not detected failures. 2023-04-12 07:56:52 -07:00
Tim Dettmers
659a7dfc71 Fixing #300. 2023-04-11 16:14:29 -07:00
Tim Dettmers
eb1c331c84 Updates README and CHANGELOG. 2023-04-11 15:49:01 -07:00
Tim Dettmers
89e3b82731 Added more detailed cuda setup debug and debugging instructions. 2023-04-11 13:47:10 -07:00
Tim Dettmers
4cd63deff3 Fixed CUDA Conda PyTorch 2.0 issues. 2023-04-11 12:10:20 -07:00
Tim Dettmers
2bb5c00ba9 Added pre/post call to all lib calls. Fixes #120 2023-04-11 09:36:56 -07:00
Tim Dettmers
29ab3a6b14 Updated change log. 2023-04-11 09:26:52 -07:00
Tim Dettmers
2eb3108356 Fixed bug where beta2 was not passed into Lion 32-bit. 2023-04-11 09:16:01 -07:00
Tim Dettmers
792af5c883 Fixed noisy tests for 8-bit Lion. 2023-04-11 08:42:41 -07:00
Tim Dettmers
0b2ebcdab9 Added launch bounds to fix launch resource error for Lion. 2023-04-11 08:37:02 -07:00
Tim Dettmers
ed6f3eb146
Merge pull request #159 from TimDettmers/serialize_8bit
Implement proper serialization of Linear8bitLt
2023-04-11 07:24:51 -07:00
Tim Dettmers
b0ec20c3b3
Merge pull request #188 from lucidrains/main
Lion 8 bit
2023-04-11 07:22:45 -07:00
Tim Dettmers
d3e0e39def
Merge pull request #190 from svgsponer/Fix#157
Fix #157; Add XDG_GREETER_DATA_DIR to ignorelist
2023-04-11 07:20:16 -07:00
Tim Dettmers
c7875533ce
Merge pull request #213 from tonylins/dev/fix_no_absmax
Gix a bug in (de)quantize_no_absmax with multiple GPUs
2023-04-11 07:18:24 -07:00
Tim Dettmers
6b4c5afe21
Merge pull request #260 from rapsealk/fix_libsbitsandbytes_cpu_so
Fixed typo libsbitsandbytes_cpu.so
2023-04-11 07:15:42 -07:00
Tim Dettmers
72efa32962
Merge pull request #292 from justheuristic/patch-2
Support nvidia16 GPUs
2023-04-11 07:14:12 -07:00
justheuristic
5e456be50e
Support 1650, 1660 2023-04-10 21:26:52 +03:00
Jeongseok Kang
8cceff72db Fixed typo libsbitsandbytes_cpu.so 2023-04-05 09:28:41 +09:00
Ji Lin
b6383ba116 fix a bug in quantize_no_absmax and dequantize_no_absmax with multiple gpus 2023-03-22 22:14:57 -04:00
Phil Wang
2a6828e6fb fix comment 2023-03-22 09:56:50 -07:00
Phil Wang
978ba2db57 another tab/spaces fix 2023-03-22 09:33:47 -07:00
Phil Wang
916000c8bf fix consistent tabs / spaces 2023-03-22 09:27:13 -07:00
Phil Wang
aa9b939edd add some comments, and fix use of g_val 2023-03-22 09:22:19 -07:00
Phil Wang
a43cd2008d add some code in test_optim.py, although it seems to be failing 2023-03-22 09:14:05 -07:00
Phil Wang
9b656f461a follow advice of Tim to fix update of momentum vs parameters in blockwise 8 bit 2023-03-22 07:52:59 -07:00
Max Ryabinin
dcecbb26ca Add force_no_igemmlt to test params 2023-03-22 00:28:49 +01:00
Tim Dettmers
49a04253fb Bumped version for CUDA 12.1 support release. 2023-03-21 15:10:19 -07:00
Tim Dettmers
d032618d7f
Merge pull request #180 from ubik2/patch-1
Update compile_from_source.md to mention cuda12x target
2023-03-21 14:08:32 -07:00
Tim Dettmers
1b0aabc7e4 Added CUDA 12.1. addressing #201 2023-03-21 14:06:08 -07:00
Tim Dettmers
2c8352e316 Bumped version. 2023-03-12 10:24:25 -07:00
Tim Dettmers
ec5fbf4cc4
Merge pull request #115 from kashif/patch-1
Fix for python 3.7
2023-03-12 10:22:15 -07:00
Severin Gsponer
c4866ab06e Fix #157; Add XDG_GREETER_DATA_DIR to ignorelist 2023-03-11 15:35:23 +01:00
Phil Wang
369a51c432 switch all eps to beta2 2023-03-10 14:08:35 -08:00
Phil Wang
6c377b39b6 always pass beta2 into all the 1state functions 2023-03-10 13:00:59 -08:00
Phil Wang
abbe65adfc beta2 is actually accessible in kOptimizerStatic8bit1StateBlockwise 2023-03-10 12:50:14 -08:00
Phil Wang
19b9ef34b9 whoops 2023-03-10 08:59:49 -08:00
Phil Wang
c99b44f774 do the epsilon beta2 switcharoo within the cuda code, and not within the python class (so that the state dict still makes sense) 2023-03-10 08:57:59 -08:00
Phil Wang
8618bed001 swap the order in which momentum and parameters are updated in ops.cu 2023-03-10 08:39:06 -08:00
Phil Wang
c5582724d5 missed adagrad 2023-03-09 14:05:45 -08:00
Phil Wang
af03430992 fix weight decay for lion to be decoupled, using a switch 2023-03-09 14:03:07 -08:00
Phil Wang
ead570a43e remove something rmsprop specific 2023-03-09 11:58:31 -08:00
Phil Wang
c83888aa1a use epsilon as beta2 for lion, complete most of the logic in kernel.cu for all functions 2023-03-09 11:54:54 -08:00
Phil Wang
64bb1ae8d1 add a sign function, for lion 2023-03-09 11:10:28 -08:00
Phil Wang
8de29fc364 forget about tests for now, will test live on local enwik8 training 2023-03-09 10:11:32 -08:00
Phil Wang
cb4c3c8c66 do a bunch of typical bookkeeping before getting to main lion logic 2023-03-09 10:10:19 -08:00
Phil Wang
d43ea9722c make sure interface is correct 2023-03-09 09:45:33 -08:00
Phil Wang
7247cb4554 initial commit, slowly work from interface into the kernel 2023-03-09 08:08:46 -08:00
ubik2
dba11b0b2e
Update compile_from_source.md
Add cuda12x to the list of targets
2023-03-06 16:57:57 -08:00
Max Ryabinin
24609b66af Reduce diff 2023-02-25 06:24:58 +01:00
Max Ryabinin
d15822a54b Refactor _tile_indices into a cached property, fix device bug 2023-02-25 06:23:07 +01:00