Tim Dettmers
|
0d332a641f
|
Added normal with extra value.
|
2023-04-02 14:09:08 -07:00 |
|
Tim Dettmers
|
2dd5d69056
|
Generalized FP4 data type.
|
2023-04-02 12:42:01 -07:00 |
|
Mitchell Wortsman
|
eb6c53cf55
|
clarify in readme
|
2023-04-01 23:50:12 +00:00 |
|
Tim Dettmers
|
51a21df728
|
Added 8-bit compression to quantization statistics.
|
2023-04-01 16:10:18 -07:00 |
|
Mitchell Wortsman
|
2331212b35
|
add readme for speed bench
|
2023-04-01 19:13:15 +00:00 |
|
Mitchell Wortsman
|
7f87ba83ee
|
cleaning and refactor
|
2023-04-01 18:46:04 +00:00 |
|
Tim Dettmers
|
c4cfe4fbdd
|
Added bf16 Adam.
|
2023-04-01 10:33:03 -07:00 |
|
Tim Dettmers
|
30d21d585c
|
Added triton test.
|
2023-03-31 11:33:26 -07:00 |
|
Tim Dettmers
|
a13a522c4c
|
Added first triton test.
|
2023-03-31 11:20:54 -07:00 |
|
Tim Dettmers
|
8645d1f71c
|
Added normal quant.
|
2023-03-29 18:41:37 -07:00 |
|
Mitchell Wortsman
|
b373034e31
|
test
|
2023-03-29 19:04:53 +00:00 |
|
Mitchell Wortsman
|
5f3d9ada8d
|
triton-v1
|
2023-03-29 06:47:08 +00:00 |
|
Tim Dettmers
|
69810521d3
|
Some small changes.
|
2023-03-27 09:12:57 -07:00 |
|
Mitchell Wortsman
|
51f8bb7133
|
pre-triton update
|
2023-03-24 05:44:42 +00:00 |
|
Ji Lin
|
b6383ba116
|
fix a bug in quantize_no_absmax and dequantize_no_absmax with multiple gpus
|
2023-03-22 22:14:57 -04:00 |
|
Phil Wang
|
2a6828e6fb
|
fix comment
|
2023-03-22 09:56:50 -07:00 |
|
Phil Wang
|
978ba2db57
|
another tab/spaces fix
|
2023-03-22 09:33:47 -07:00 |
|
Phil Wang
|
916000c8bf
|
fix consistent tabs / spaces
|
2023-03-22 09:27:13 -07:00 |
|
Phil Wang
|
aa9b939edd
|
add some comments, and fix use of g_val
|
2023-03-22 09:22:19 -07:00 |
|
Phil Wang
|
a43cd2008d
|
add some code in test_optim.py, although it seems to be failing
|
2023-03-22 09:14:05 -07:00 |
|
Phil Wang
|
9b656f461a
|
follow advice of Tim to fix update of momentum vs parameters in blockwise 8 bit
|
2023-03-22 07:52:59 -07:00 |
|
Max Ryabinin
|
dcecbb26ca
|
Add force_no_igemmlt to test params
|
2023-03-22 00:28:49 +01:00 |
|
Tim Dettmers
|
49a04253fb
|
Bumped version for CUDA 12.1 support release.
|
2023-03-21 15:10:19 -07:00 |
|
Tim Dettmers
|
d032618d7f
|
Merge pull request #180 from ubik2/patch-1
Update compile_from_source.md to mention cuda12x target
|
2023-03-21 14:08:32 -07:00 |
|
Tim Dettmers
|
1b0aabc7e4
|
Added CUDA 12.1. addressing #201
|
2023-03-21 14:06:08 -07:00 |
|
Tim Dettmers
|
2c8352e316
|
Bumped version.
|
2023-03-12 10:24:25 -07:00 |
|
Tim Dettmers
|
ec5fbf4cc4
|
Merge pull request #115 from kashif/patch-1
Fix for python 3.7
|
2023-03-12 10:22:15 -07:00 |
|
Severin Gsponer
|
c4866ab06e
|
Fix #157; Add XDG_GREETER_DATA_DIR to ignorelist
|
2023-03-11 15:35:23 +01:00 |
|
Phil Wang
|
369a51c432
|
switch all eps to beta2
|
2023-03-10 14:08:35 -08:00 |
|
Phil Wang
|
6c377b39b6
|
always pass beta2 into all the 1state functions
|
2023-03-10 13:00:59 -08:00 |
|
Phil Wang
|
abbe65adfc
|
beta2 is actually accessible in kOptimizerStatic8bit1StateBlockwise
|
2023-03-10 12:50:14 -08:00 |
|
Phil Wang
|
19b9ef34b9
|
whoops
|
2023-03-10 08:59:49 -08:00 |
|
Phil Wang
|
c99b44f774
|
do the epsilon beta2 switcharoo within the cuda code, and not within the python class (so that the state dict still makes sense)
|
2023-03-10 08:57:59 -08:00 |
|
Phil Wang
|
8618bed001
|
swap the order in which momentum and parameters are updated in ops.cu
|
2023-03-10 08:39:06 -08:00 |
|
Phil Wang
|
c5582724d5
|
missed adagrad
|
2023-03-09 14:05:45 -08:00 |
|
Phil Wang
|
af03430992
|
fix weight decay for lion to be decoupled, using a switch
|
2023-03-09 14:03:07 -08:00 |
|
Phil Wang
|
ead570a43e
|
remove something rmsprop specific
|
2023-03-09 11:58:31 -08:00 |
|
Phil Wang
|
c83888aa1a
|
use epsilon as beta2 for lion, complete most of the logic in kernel.cu for all functions
|
2023-03-09 11:54:54 -08:00 |
|
Phil Wang
|
64bb1ae8d1
|
add a sign function, for lion
|
2023-03-09 11:10:28 -08:00 |
|
Phil Wang
|
8de29fc364
|
forget about tests for now, will test live on local enwik8 training
|
2023-03-09 10:11:32 -08:00 |
|
Phil Wang
|
cb4c3c8c66
|
do a bunch of typical bookkeeping before getting to main lion logic
|
2023-03-09 10:10:19 -08:00 |
|
Phil Wang
|
d43ea9722c
|
make sure interface is correct
|
2023-03-09 09:45:33 -08:00 |
|
Phil Wang
|
7247cb4554
|
initial commit, slowly work from interface into the kernel
|
2023-03-09 08:08:46 -08:00 |
|
ubik2
|
dba11b0b2e
|
Update compile_from_source.md
Add cuda12x to the list of targets
|
2023-03-06 16:57:57 -08:00 |
|
Artidoro Pagnoni
|
6c31a5fe99
|
t5 model fix
|
2023-02-27 14:23:21 -08:00 |
|
Max Ryabinin
|
24609b66af
|
Reduce diff
|
2023-02-25 06:24:58 +01:00 |
|
Max Ryabinin
|
d15822a54b
|
Refactor _tile_indices into a cached property, fix device bug
|
2023-02-25 06:23:07 +01:00 |
|
Max Ryabinin
|
cc608c04c2
|
Revert the layout if weights were reordered
|
2023-02-25 06:02:06 +01:00 |
|
Max Ryabinin
|
cd4d904a4c
|
Raise an error when loading a quantized checkpoint before quantization
|
2023-02-25 06:01:34 +01:00 |
|
Max Ryabinin
|
ac3ab281e3
|
Handle more cases in test_linear_serialization
|
2023-02-25 06:01:04 +01:00 |
|