Tim Dettmers
|
7140c01405
|
Merge branch 'main' into fp8_merge
|
2023-04-12 11:44:39 -07:00 |
|
Tim Dettmers
|
dd562c24f1
|
Refactored simulated fp8 modules into research.nn.
|
2023-04-12 11:24:44 -07:00 |
|
Tim Dettmers
|
ec1ea63711
|
Refactored triton into its own folder. Refactored fp8 matmuls.
|
2023-04-12 09:39:39 -07:00 |
|
Tim Dettmers
|
4cd63deff3
|
Fixed CUDA Conda PyTorch 2.0 issues.
|
2023-04-11 12:10:20 -07:00 |
|
Tim Dettmers
|
2eb3108356
|
Fixed bug where beta2 was not passed into Lion 32-bit.
|
2023-04-11 09:16:01 -07:00 |
|
Tim Dettmers
|
792af5c883
|
Fixed noisy tests for 8-bit Lion.
|
2023-04-11 08:42:41 -07:00 |
|
Tim Dettmers
|
ed6f3eb146
|
Merge pull request #159 from TimDettmers/serialize_8bit
Implement proper serialization of Linear8bitLt
|
2023-04-11 07:24:51 -07:00 |
|
Mitchell Wortsman
|
7f87ba83ee
|
cleaning and refactor
|
2023-04-01 18:46:04 +00:00 |
|
Tim Dettmers
|
30d21d585c
|
Added triton test.
|
2023-03-31 11:33:26 -07:00 |
|
Tim Dettmers
|
a13a522c4c
|
Added first triton test.
|
2023-03-31 11:20:54 -07:00 |
|
Mitchell Wortsman
|
b373034e31
|
test
|
2023-03-29 19:04:53 +00:00 |
|
Mitchell Wortsman
|
5f3d9ada8d
|
triton-v1
|
2023-03-29 06:47:08 +00:00 |
|
Phil Wang
|
a43cd2008d
|
add some code in test_optim.py, although it seems to be failing
|
2023-03-22 09:14:05 -07:00 |
|
Max Ryabinin
|
dcecbb26ca
|
Add force_no_igemmlt to test params
|
2023-03-22 00:28:49 +01:00 |
|
Phil Wang
|
8de29fc364
|
forget about tests for now, will test live on local enwik8 training
|
2023-03-09 10:11:32 -08:00 |
|
Phil Wang
|
cb4c3c8c66
|
do a bunch of typical bookkeeping before getting to main lion logic
|
2023-03-09 10:10:19 -08:00 |
|
Max Ryabinin
|
ac3ab281e3
|
Handle more cases in test_linear_serialization
|
2023-02-25 06:01:04 +01:00 |
|
Tim Dettmers
|
c5c38ca19c
|
Added matmul_mixed.
|
2023-02-23 10:45:18 -08:00 |
|
Max Ryabinin
|
58b09ee1b1
|
[WIP] Implement proper serialization of Linear8bitLt
|
2023-02-21 12:04:47 +01:00 |
|
Tim Dettmers
|
2489d819c5
|
Added more blocksizes for stochastic rounding; fixed dequant blocksize.
|
2023-02-14 13:55:17 -08:00 |
|
Tim Dettmers
|
2dfa3ce16d
|
Fixed LinearFP8 and added tests.
|
2023-02-13 17:48:52 -08:00 |
|
Tim Dettmers
|
ca3236587a
|
Added forward/backward tests; removed bias.
|
2023-02-13 17:20:52 -08:00 |
|
Tim Dettmers
|
6bdb6c351e
|
Added fp8 simulation layer.
|
2023-02-13 16:53:07 -08:00 |
|
Tim Dettmers
|
de53588934
|
Added Int8 matmul support for all GPUs. Full backward support.
|
2023-02-01 20:09:31 -08:00 |
|
Tim Dettmers
|
c9f505064e
|
Added outlier detector and fake quantization layer.
|
2023-01-28 17:05:22 -08:00 |
|
Tim Dettmers
|
336e24696c
|
CUDASetup only executed once + fixed circular import.
|
2023-01-02 03:31:43 -08:00 |
|
Tim Dettmers
|
c91f592ad7
|
Merge branch 'main' into cleanup
|
2023-01-02 11:19:16 +01:00 |
|
Tim Dettmers
|
eb028e6ebc
|
Fixed k-bit quantization maps.
|
2022-11-19 07:24:03 -08:00 |
|
Tom Aarsen
|
b104ce3b62
|
Merge branch 'main' into cleanup
|
2022-11-17 15:22:29 +01:00 |
|
Tim Dettmers
|
08fa2e7b01
|
Fixed bug in cpu quant; faster GPU dequant.
|
2022-11-07 18:06:18 -08:00 |
|
Tim Dettmers
|
e0e697b150
|
Fixed blockwise test and logic.
|
2022-11-06 16:36:31 -08:00 |
|
Tim Dettmers
|
6bc2b992be
|
Added blocksizes 2048, 1024, and 512 to blockwise quant.
|
2022-11-06 16:27:48 -08:00 |
|
Tim Dettmers
|
2f2063bac2
|
Added k<256 quantile estimate.
|
2022-11-06 13:05:25 -08:00 |
|
Tim Dettmers
|
98cbc4bc4f
|
Added k-bit fp8 map.
|
2022-11-06 11:59:37 -08:00 |
|
Tim Dettmers
|
caf1832526
|
Added k-bit linear quantization.
|
2022-11-06 11:47:54 -08:00 |
|
Tim Dettmers
|
1efb87d89d
|
Added FP8 quantization map.
|
2022-11-03 19:49:50 -07:00 |
|
Tom Aarsen
|
7a3c9af05d
|
Sort imports
Via isort
|
2022-10-27 13:15:21 +02:00 |
|
Tom Aarsen
|
0b078403ee
|
Simplify statements into equivalent, modern variants
via pyupgrade --py37-plus. The changes e.g. are subclassing from object, calling super() with super(ThisClass, self), or old-style syntax formatting.
|
2022-10-27 13:14:13 +02:00 |
|
Tom Aarsen
|
1eec77d34c
|
Remove trailing whitespace & ensure newline at EOF
|
2022-10-27 13:11:29 +02:00 |
|
Tim Dettmers
|
a371be302d
|
Added CUDA SETUP instruction generator.
|
2022-10-25 08:01:19 -07:00 |
|
Tim Dettmers
|
df86625a93
|
Isolated CUDASetup logging; all tests green.
|
2022-10-24 11:54:25 -07:00 |
|
justheuristic
|
76ce9aa6da
|
try fp32
|
2022-09-20 06:51:25 +03:00 |
|
Tim Dettmers
|
292a478716
|
set threshold
|
2022-09-20 06:42:05 +03:00 |
|
justheuristic
|
a07825ac31
|
review
|
2022-09-20 06:40:36 +03:00 |
|
justheuristic
|
cff3a71599
|
cast device
|
2022-09-18 01:26:25 +03:00 |
|
justheuristic
|
32a9a88f98
|
cast device
|
2022-09-18 01:26:12 +03:00 |
|
justheuristic
|
01b4c6a048
|
cast device
|
2022-09-18 01:25:56 +03:00 |
|
justheuristic
|
e4086a2758
|
cast device
|
2022-09-18 01:24:57 +03:00 |
|
justheuristic
|
725cc72993
|
cast device
|
2022-09-18 01:24:44 +03:00 |
|
justheuristic
|
28a9313ddc
|
cast before allclose
|
2022-09-18 01:24:27 +03:00 |
|