Tim Dettmers
|
c472bd56f0
|
Added the case that all env variables are empty (CUDA docker).
|
2022-08-05 08:57:52 -07:00 |
|
Tim Dettmers
|
6ad8796cfc
|
Bumping version for TestPyPi release.
|
2022-08-05 07:17:36 -07:00 |
|
Tim Dettmers
|
e35337f05e
|
Now determining cuda version via libcudart.so call.
|
2022-08-05 07:13:24 -07:00 |
|
Tim Dettmers
|
8f84674d67
|
Fixed bugs in cuda setup.
|
2022-08-04 09:16:00 -07:00 |
|
Tim Dettmers
|
758c7175a2
|
Merge branch 'debug' into cuda-bin-switch-and-cli
|
2022-08-04 08:03:00 -07:00 |
|
Tim Dettmers
|
ab72a1294f
|
Added pre/post device call for extract outliers.
|
2022-08-04 07:47:22 -07:00 |
|
Tim Dettmers
|
cc5b323876
|
Merge branch 'extract_outliers' into debug
|
2022-08-04 07:40:48 -07:00 |
|
Tim Dettmers
|
6101a8fb9f
|
Added pre and post device call to transform.
|
2022-08-04 07:28:12 -07:00 |
|
Tim Dettmers
|
320eacb4c2
|
Removed print statement.
|
2022-08-03 14:17:54 -07:00 |
|
Tim Dettmers
|
451fd9506e
|
Added fixes for the case that matmullt dim A is zero, e.g. [0, 768].
|
2022-08-03 11:54:01 -07:00 |
|
Tim Dettmers
|
2f01865a2f
|
Added CUDA block assert and is_on_gpu check.
|
2022-08-03 09:05:37 -07:00 |
|
Titus von Koeller
|
96bc209baf
|
tentative refactoring of the compute capabilities code
|
2022-08-02 21:27:36 -07:00 |
|
Titus von Koeller
|
59a615b386
|
factored cuda_setup.main out into smaller modules and functions
|
2022-08-02 21:26:50 -07:00 |
|
Titus von Koeller
|
3809236428
|
move cuda_setup code into subpackage
|
2022-08-02 07:42:27 -07:00 |
|
Tim Dettmers
|
e120c4a550
|
Fixed syntax error; bumped revision for beta release.
|
2022-08-01 20:05:03 -07:00 |
|
Tim Dettmers
|
3479d02a76
|
Added some more docs and comments.
|
2022-08-01 19:43:09 -07:00 |
|
Tim Dettmers
|
8bf3e9faab
|
Added full env variable search; CONDA_PREFIX priority.
|
2022-08-01 19:22:41 -07:00 |
|
Titus von Koeller
|
c4fe6c69a3
|
deleted function that was moved but accidentally not removed in commit
|
2022-08-01 09:40:41 -07:00 |
|
Titus von Koeller
|
ea7c14f8ef
|
reran black with linelength 80 for greater readability
|
2022-08-01 09:32:47 -07:00 |
|
Titus von Koeller
|
3fd06fb620
|
refactored subshell execution code for greater readability and moved it to utils
|
2022-08-01 09:30:29 -07:00 |
|
Titus von Koeller
|
54efd874a8
|
flake8 found some stuff that needs fixing before the release
|
2022-08-01 03:32:34 -07:00 |
|
Titus von Koeller
|
bfa0e33294
|
ran black and isort for coherent code formatting
|
2022-08-01 03:31:48 -07:00 |
|
Titus von Koeller
|
597a8521b2
|
fix typo
|
2022-08-01 03:22:44 -07:00 |
|
Titus von Koeller
|
57fa64628f
|
minor refactor to more concise syntax
|
2022-08-01 03:22:12 -07:00 |
|
Tim Dettmers
|
4a6ea7e24b
|
Added adjusted build file.
|
2022-07-31 20:59:34 -07:00 |
|
Tim Dettmers
|
28d1e7dc01
|
Initial build script changes (untested on PyPi).
|
2022-07-31 19:41:56 -07:00 |
|
Tim Dettmers
|
dd50382b32
|
Full evaluate_cuda setup with integration test.
|
2022-07-31 17:47:44 -07:00 |
|
Titus von Koeller
|
5d90b38c4d
|
adding CLI tool for CUDA install debugging - intermediate commit
|
2022-07-27 21:16:04 -07:00 |
|
Tim Dettmers
|
bd515328d7
|
Fixed deployment script to check for LD_LIBRARY_PATH.
|
2022-07-27 05:57:50 -07:00 |
|
Tim Dettmers
|
389f66ca5a
|
Fixed direct extraction masking.
|
2022-07-27 01:46:35 -07:00 |
|
Tim Dettmers
|
a409213656
|
Fixed make default to compile with cublaslt.
|
2022-07-26 19:38:17 -07:00 |
|
Tim Dettmers
|
5737f2b027
|
Merge branch 'patch_merge' into extract_outliers
|
2022-07-26 19:38:01 -07:00 |
|
Tim Dettmers
|
47a73d94c3
|
Matmullt with direct outlier extraction for 8-bit inference.
|
2022-07-26 19:15:35 -07:00 |
|
Tim Dettmers
|
32fa459ed7
|
Added col_ampere outlier extraction kernel.
|
2022-07-26 18:15:51 -07:00 |
|
Tim Dettmers
|
bcab99ec87
|
Working outlier extraction for Turing.
|
2022-07-26 17:39:30 -07:00 |
|
Tim Dettmers
|
cbb901ac51
|
Boilerplate and test for extract_outliers.
|
2022-07-26 12:12:38 -07:00 |
|
Tim Dettmers
|
dc8c9efdb3
|
Changed setup.py; deployed on test pypi.
|
2022-07-26 10:32:22 -07:00 |
|
Tim Dettmers
|
953b7285dd
|
Fixed cpuonly build.
|
2022-07-26 09:12:16 -07:00 |
|
Tim Dettmers
|
f2dd703251
|
Added matmul build and flags.
|
2022-07-25 22:34:14 -07:00 |
|
Tim Dettmers
|
9268dc9d88
|
Some progress on build script; added multi-cuda install script.
|
2022-07-25 19:30:37 -07:00 |
|
Tim Dettmers
|
1e88edd8c0
|
Removed rowscale (segfaults on ampere).
|
2022-07-25 17:27:57 -07:00 |
|
Tim Dettmers
|
8b1fd32e3e
|
Fixed makefile; fixed Ampere igemmlt_8 bug.
|
2022-07-25 14:02:14 -07:00 |
|
Tim Dettmers
|
7d2ecd30c0
|
Fixed rowcol synchronization bug.
|
2022-07-22 15:21:37 -07:00 |
|
Tim Dettmers
|
c771b3a75a
|
Most tests passing.
|
2022-07-22 14:41:05 -07:00 |
|
Tim Dettmers
|
4cd7ea62b2
|
Merge pull request #3 from TimDettmers/cpuonly
Add a CPU-only build option
|
2022-07-18 09:51:37 -07:00 |
|
Max Ryabinin
|
fd750cd237
|
Update README.md
|
2022-07-01 17:46:29 +03:00 |
|
Max Ryabinin
|
025824d29b
|
Reduce diff
|
2022-07-01 17:42:58 +03:00 |
|
Max Ryabinin
|
575aa698fa
|
Reduce diff
|
2022-07-01 17:41:48 +03:00 |
|
Max Ryabinin
|
4d1d5b569f
|
Reduce diff
|
2022-07-01 17:40:02 +03:00 |
|
Max Ryabinin
|
31ce1b3708
|
Reduce diff
|
2022-07-01 17:36:30 +03:00 |
|