Tim Dettmers
|
72efa32962
|
Merge pull request #292 from justheuristic/patch-2
Support nvidia16 GPUs
|
2023-04-11 07:14:12 -07:00 |
|
justheuristic
|
5e456be50e
|
Support 1650, 1660
|
2023-04-10 21:26:52 +03:00 |
|
Mitchell Wortsman
|
d677a71607
|
typo
|
2023-04-08 19:36:17 +00:00 |
|
Mitchell Wortsman
|
da524d97c9
|
mem efficient"
|
2023-04-08 19:34:18 +00:00 |
|
Tim Dettmers
|
e9fa03b717
|
Some fixed for loading PEFT modules with Params4bit.
|
2023-04-07 09:59:21 -07:00 |
|
Jeongseok Kang
|
8cceff72db
|
Fixed typo libsbitsandbytes_cpu.so
|
2023-04-05 09:28:41 +09:00 |
|
Tim Dettmers
|
1ccb7bdec6
|
Fixed ParamsIn4 init; fixed PyTorch 2.0 test failure.
|
2023-04-03 18:47:00 -07:00 |
|
Tim Dettmers
|
4ea489d3bf
|
Refactor FP4 into 4Bit and integrate NF4 data type.
|
2023-04-03 11:00:12 -07:00 |
|
Tim Dettmers
|
64cc05920d
|
First draft of NF4.
|
2023-04-02 16:10:35 -07:00 |
|
Tim Dettmers
|
4ad999d144
|
Added quantization tree generation.
|
2023-04-02 14:42:45 -07:00 |
|
Tim Dettmers
|
0d332a641f
|
Added normal with extra value.
|
2023-04-02 14:09:08 -07:00 |
|
Tim Dettmers
|
2dd5d69056
|
Generalized FP4 data type.
|
2023-04-02 12:42:01 -07:00 |
|
Mitchell Wortsman
|
eb6c53cf55
|
clarify in readme
|
2023-04-01 23:50:12 +00:00 |
|
Tim Dettmers
|
51a21df728
|
Added 8-bit compression to quantization statistics.
|
2023-04-01 16:10:18 -07:00 |
|
Mitchell Wortsman
|
2331212b35
|
add readme for speed bench
|
2023-04-01 19:13:15 +00:00 |
|
Mitchell Wortsman
|
7f87ba83ee
|
cleaning and refactor
|
2023-04-01 18:46:04 +00:00 |
|
Tim Dettmers
|
c4cfe4fbdd
|
Added bf16 Adam.
|
2023-04-01 10:33:03 -07:00 |
|
Tim Dettmers
|
30d21d585c
|
Added triton test.
|
2023-03-31 11:33:26 -07:00 |
|
Tim Dettmers
|
a13a522c4c
|
Added first triton test.
|
2023-03-31 11:20:54 -07:00 |
|
Tim Dettmers
|
8645d1f71c
|
Added normal quant.
|
2023-03-29 18:41:37 -07:00 |
|
Mitchell Wortsman
|
b373034e31
|
test
|
2023-03-29 19:04:53 +00:00 |
|
Mitchell Wortsman
|
5f3d9ada8d
|
triton-v1
|
2023-03-29 06:47:08 +00:00 |
|
Tim Dettmers
|
69810521d3
|
Some small changes.
|
2023-03-27 09:12:57 -07:00 |
|
Mitchell Wortsman
|
51f8bb7133
|
pre-triton update
|
2023-03-24 05:44:42 +00:00 |
|
Ji Lin
|
b6383ba116
|
fix a bug in quantize_no_absmax and dequantize_no_absmax with multiple gpus
|
2023-03-22 22:14:57 -04:00 |
|
Phil Wang
|
2a6828e6fb
|
fix comment
|
2023-03-22 09:56:50 -07:00 |
|
Phil Wang
|
978ba2db57
|
another tab/spaces fix
|
2023-03-22 09:33:47 -07:00 |
|
Phil Wang
|
916000c8bf
|
fix consistent tabs / spaces
|
2023-03-22 09:27:13 -07:00 |
|
Phil Wang
|
aa9b939edd
|
add some comments, and fix use of g_val
|
2023-03-22 09:22:19 -07:00 |
|
Phil Wang
|
a43cd2008d
|
add some code in test_optim.py, although it seems to be failing
|
2023-03-22 09:14:05 -07:00 |
|
Phil Wang
|
9b656f461a
|
follow advice of Tim to fix update of momentum vs parameters in blockwise 8 bit
|
2023-03-22 07:52:59 -07:00 |
|
Max Ryabinin
|
dcecbb26ca
|
Add force_no_igemmlt to test params
|
2023-03-22 00:28:49 +01:00 |
|
Tim Dettmers
|
49a04253fb
|
Bumped version for CUDA 12.1 support release.
|
2023-03-21 15:10:19 -07:00 |
|
Tim Dettmers
|
d032618d7f
|
Merge pull request #180 from ubik2/patch-1
Update compile_from_source.md to mention cuda12x target
|
2023-03-21 14:08:32 -07:00 |
|
Tim Dettmers
|
1b0aabc7e4
|
Added CUDA 12.1. addressing #201
|
2023-03-21 14:06:08 -07:00 |
|
Tim Dettmers
|
2c8352e316
|
Bumped version.
|
2023-03-12 10:24:25 -07:00 |
|
Tim Dettmers
|
ec5fbf4cc4
|
Merge pull request #115 from kashif/patch-1
Fix for python 3.7
|
2023-03-12 10:22:15 -07:00 |
|
Severin Gsponer
|
c4866ab06e
|
Fix #157; Add XDG_GREETER_DATA_DIR to ignorelist
|
2023-03-11 15:35:23 +01:00 |
|
Phil Wang
|
369a51c432
|
switch all eps to beta2
|
2023-03-10 14:08:35 -08:00 |
|
Phil Wang
|
6c377b39b6
|
always pass beta2 into all the 1state functions
|
2023-03-10 13:00:59 -08:00 |
|
Phil Wang
|
abbe65adfc
|
beta2 is actually accessible in kOptimizerStatic8bit1StateBlockwise
|
2023-03-10 12:50:14 -08:00 |
|
Phil Wang
|
19b9ef34b9
|
whoops
|
2023-03-10 08:59:49 -08:00 |
|
Phil Wang
|
c99b44f774
|
do the epsilon beta2 switcharoo within the cuda code, and not within the python class (so that the state dict still makes sense)
|
2023-03-10 08:57:59 -08:00 |
|
Phil Wang
|
8618bed001
|
swap the order in which momentum and parameters are updated in ops.cu
|
2023-03-10 08:39:06 -08:00 |
|
Phil Wang
|
c5582724d5
|
missed adagrad
|
2023-03-09 14:05:45 -08:00 |
|
Phil Wang
|
af03430992
|
fix weight decay for lion to be decoupled, using a switch
|
2023-03-09 14:03:07 -08:00 |
|
Phil Wang
|
ead570a43e
|
remove something rmsprop specific
|
2023-03-09 11:58:31 -08:00 |
|
Phil Wang
|
c83888aa1a
|
use epsilon as beta2 for lion, complete most of the logic in kernel.cu for all functions
|
2023-03-09 11:54:54 -08:00 |
|
Phil Wang
|
64bb1ae8d1
|
add a sign function, for lion
|
2023-03-09 11:10:28 -08:00 |
|
Phil Wang
|
8de29fc364
|
forget about tests for now, will test live on local enwik8 training
|
2023-03-09 10:11:32 -08:00 |
|