0cc4m
|
403557388d
|
Fix merge conflict
|
2023-02-16 22:18:27 +01:00 |
|
Tim Dettmers
|
c91f592ad7
|
Merge branch 'main' into cleanup
|
2023-01-02 11:19:16 +01:00 |
|
broncotc
|
1b52f4243f
|
fixed, works on gfx1030, do save RAM
|
2022-11-24 05:15:08 +00:00 |
|
broncotc
|
2dcf38289d
|
should be hippified, and all cuda checkes cleaned up, makefile not updated yet
|
2022-11-23 17:52:19 -08:00 |
|
Tim Dettmers
|
c059bd2848
|
Added additional blocksizes: {64, 128, 256}.
|
2022-11-20 14:18:15 -08:00 |
|
Tom Aarsen
|
b104ce3b62
|
Merge branch 'main' into cleanup
|
2022-11-17 15:22:29 +01:00 |
|
Tim Dettmers
|
08fa2e7b01
|
Fixed bug in cpu quant; faster GPU dequant.
|
2022-11-07 18:06:18 -08:00 |
|
Tim Dettmers
|
6bc2b992be
|
Added blocksizes 2048, 1024, and 512 to blockwise quant.
|
2022-11-06 16:27:48 -08:00 |
|
Tom Aarsen
|
1eec77d34c
|
Remove trailing whitespace & ensure newline at EOF
|
2022-10-27 13:11:29 +02:00 |
|
Tim Dettmers
|
c05dd42ddd
|
Fixed cpu blockwise quantization for small input tensors.
|
2022-09-13 10:37:53 -07:00 |
|
Tim Dettmers
|
19a7adca7a
|
Fixed 2^31 max size issue for cpu blockwise quant.
|
2022-09-11 11:55:09 -07:00 |
|
Tim Dettmers
|
ee5b947e63
|
Fixed issue where Pascal was not displaying proper error.
|
2022-08-23 16:00:26 -07:00 |
|
Tim Dettmers
|
a6664de072
|
Enhanced error handling in CUDA SETUP failures.
|
2022-08-16 19:03:19 -07:00 |
|
Tim Dettmers
|
dede343033
|
Added fused bias in dequant_mm.
|
2022-08-16 11:12:09 -07:00 |
|
Tim Dettmers
|
1ed2fa2f21
|
Removed storage() from get_ptr; added boilerplate for bias dequant_mm.
|
2022-08-16 10:56:17 -07:00 |
|
Tim Dettmers
|
a4532c59f7
|
Removed faulty asserts.
|
2022-08-06 09:31:05 -07:00 |
|
Tim Dettmers
|
cc5b323876
|
Merge branch 'extract_outliers' into debug
|
2022-08-04 07:40:48 -07:00 |
|
Tim Dettmers
|
451fd9506e
|
Added fixes for the case that matmullt dim A is zero, e.g. [0, 768].
|
2022-08-03 11:54:01 -07:00 |
|
Tim Dettmers
|
2f01865a2f
|
Added CUDA block assert and is_on_gpu check.
|
2022-08-03 09:05:37 -07:00 |
|
Tim Dettmers
|
5737f2b027
|
Merge branch 'patch_merge' into extract_outliers
|
2022-07-26 19:38:01 -07:00 |
|
Tim Dettmers
|
32fa459ed7
|
Added col_ampere outlier extraction kernel.
|
2022-07-26 18:15:51 -07:00 |
|
Tim Dettmers
|
bcab99ec87
|
Working outlier extraction for Turing.
|
2022-07-26 17:39:30 -07:00 |
|
Tim Dettmers
|
cbb901ac51
|
Boilerplate and test for extract_outliers.
|
2022-07-26 12:12:38 -07:00 |
|
Tim Dettmers
|
953b7285dd
|
Fixed cpuonly build.
|
2022-07-26 09:12:16 -07:00 |
|
Tim Dettmers
|
9268dc9d88
|
Some progress on build script; added multi-cuda install script.
|
2022-07-25 19:30:37 -07:00 |
|
Tim Dettmers
|
8b1fd32e3e
|
Fixed makefile; fixed Ampere igemmlt_8 bug.
|
2022-07-25 14:02:14 -07:00 |
|
Tim Dettmers
|
7d2ecd30c0
|
Fixed rowcol synchronization bug.
|
2022-07-22 15:21:37 -07:00 |
|
Tim Dettmers
|
c771b3a75a
|
Most tests passing.
|
2022-07-22 14:41:05 -07:00 |
|
Max Ryabinin
|
025824d29b
|
Reduce diff
|
2022-07-01 17:42:58 +03:00 |
|
Max Ryabinin
|
575aa698fa
|
Reduce diff
|
2022-07-01 17:41:48 +03:00 |
|
Max Ryabinin
|
4d1d5b569f
|
Reduce diff
|
2022-07-01 17:40:02 +03:00 |
|
Max Ryabinin
|
31ce1b3708
|
Reduce diff
|
2022-07-01 17:36:30 +03:00 |
|
Max Ryabinin
|
8258b4364a
|
Add a CPU-only build option
|
2022-07-01 17:16:10 +03:00 |
|
Tim Dettmers
|
2f8083bd8b
|
Added AdamW. #10 #13
|
2021-11-28 21:18:11 -08:00 |
|
Tim Dettmers
|
8b3c0f355c
|
Added adagrad with tests (no clipping).
|
2021-11-10 15:10:02 -08:00 |
|
Tim Dettmers
|
0fb378b4ee
|
Added compilation from source instructions; easier compilation.
|
2021-10-21 17:22:43 -07:00 |
|
Tim Dettmers
|
a6eae2e7f2
|
Added skip_zeros; tests are passing.
|
2021-10-20 19:15:47 -07:00 |
|
Tim Dettmers
|
bb34fd50a1
|
Initial plumbing for skip_zeros.
|
2021-10-20 18:37:44 -07:00 |
|
Tim Dettmers
|
7439924891
|
Initial commit
|
2021-10-05 19:16:20 -07:00 |
|