Commit Graph

  • 8f30f9d7e8 Use rocm_agent_enumerator to determine GPU deftdawg-detect-gpu deftdawg 2023-04-25 05:01:31 +0000
  • b4f1a436de Use rocm_agent_enumerator to determine GPU deftdawg 2023-04-25 04:58:48 +0000
  • a2b6e49d26 Use rocm_agent_enumerator to determine GPU deftdawg 2023-04-25 04:57:40 +0000
  • f49686029a slight tweaks to make it easy to work for Arch Linux users btw master mrq 2023-03-03 02:53:16 +0000
  • aa49b0a6cd Also disable igemmlt for AMD GPUs 0cc4m 2023-02-16 22:18:52 +0100
  • 403557388d Fix merge conflict 0cc4m 2023-02-16 18:57:58 +0100
  • 0f5c394870 Added version 0.37.0. Tim Dettmers 2023-02-01 20:27:01 -0800
  • de53588934 Added Int8 matmul support for all GPUs. Full backward support. Tim Dettmers 2023-02-01 20:09:31 -0800
  • 92ab6a8d5f
    Merge pull request #119 from stas00/patch-1 Tim Dettmers 2023-02-01 19:21:36 -0800
  • c5372a8567
    improve install instructions Stas Bekman 2023-01-05 13:34:51 -0800
  • 1341fb44ad Fixed issue where the CUDA SETUP was not printed. Tim Dettmers 2023-01-04 03:50:53 -0800
  • 3901ebf7ae Added CUDA 12.0 support; removed CC 3.0 support. Tim Dettmers 2023-01-04 02:28:33 -0800
  • b3de19218e Added error message for unexpected CUDA exception. Tim Dettmers 2023-01-03 06:57:07 -0800
  • 81990491ff
    Merge pull request #113 from Borzik/fix-warnings Tim Dettmers 2023-01-03 15:46:58 +0100
  • 9180b4cc11 Added additional error message for cudart error #85 Tim Dettmers 2023-01-03 06:44:11 -0800
  • dfb049f8e4 Added Python >= 3.8 requirement. Tim Dettmers 2023-01-03 06:20:06 -0800
  • 211ad594df Added error+instructions for unsupported CUDA 10.0 version #82 Tim Dettmers 2023-01-03 06:07:35 -0800
  • f3800bab75 import warn function Felix Borzik 2023-01-03 13:23:34 +0000
  • 9d353ca786
    Merge pull request #87 from lostmsu/main Tim Dettmers 2023-01-02 13:22:45 +0100
  • 7a6563b6c8 Default to CPU library on CUDA error+small refactor. Tim Dettmers 2023-01-02 03:47:09 -0800
  • d9112dc55b
    Merge pull request #110 from BlackHC/cublaslt_version Tim Dettmers 2023-01-02 12:35:53 +0100
  • 336e24696c CUDASetup only executed once + fixed circular import. Tim Dettmers 2023-01-02 03:31:43 -0800
  • df9a9b0c4c
    Merge pull request #77 from Cyberes/main Tim Dettmers 2023-01-02 11:28:17 +0100
  • be5cecb88f
    Merge branch 'main' into main Tim Dettmers 2023-01-02 11:23:17 +0100
  • f0ec93d016
    Merge pull request #76 from tomaarsen/cleanup Tim Dettmers 2023-01-02 11:19:28 +0100
  • c91f592ad7
    Merge branch 'main' into cleanup Tim Dettmers 2023-01-02 11:19:16 +0100
  • ed17aa9a31 Don't mark it as failure though. blackhc 2022-12-29 23:50:48 +0000
  • 7b39a5511d Fix issue #97 blackhc 2022-12-29 23:47:21 +0000
  • 1b52f4243f fixed, works on gfx1030, do save RAM broncotc 2022-11-24 05:15:08 +0000
  • 2dcf38289d should be hippified, and all cuda checkes cleaned up, makefile not updated yet broncotc 2022-11-23 17:52:19 -0800
  • c059bd2848 Added additional blocksizes: {64, 128, 256}. Tim Dettmers 2022-11-20 14:18:15 -0800
  • eb028e6ebc Fixed k-bit quantization maps. Tim Dettmers 2022-11-19 07:24:03 -0800
  • b104ce3b62
    Merge branch 'main' into cleanup Tom Aarsen 2022-11-17 15:22:29 +0100
  • 08fa2e7b01 Fixed bug in cpu quant; faster GPU dequant. Tim Dettmers 2022-11-07 18:06:18 -0800
  • 62a333ac40 Added pre/post calls do quantize_blockwise. Tim Dettmers 2022-11-06 17:17:51 -0800
  • e0e697b150 Fixed blockwise test and logic. Tim Dettmers 2022-11-06 16:36:31 -0800
  • 6bc2b992be Added blocksizes 2048, 1024, and 512 to blockwise quant. Tim Dettmers 2022-11-06 16:27:48 -0800
  • 2f2063bac2 Added k<256 quantile estimate. Tim Dettmers 2022-11-06 13:05:25 -0800
  • 98cbc4bc4f Added k-bit fp8 map. Tim Dettmers 2022-11-06 11:59:37 -0800
  • caf1832526 Added k-bit linear quantization. Tim Dettmers 2022-11-06 11:47:54 -0800
  • 62d39a237c
    add device and dtype parameters to StableEmbedding Victor Nova 2022-11-04 14:05:30 -0700
  • 1efb87d89d Added FP8 quantization map. Tim Dettmers 2022-11-03 19:49:50 -0700
  • 62c0bd2278 Fix several typos in logging and comments Tom Aarsen 2022-11-01 09:53:47 +0100
  • d504050ff7 Call isort over cuda_setup/main.py Tom Aarsen 2022-11-01 09:46:03 +0100
  • 30f28b94a0
    Merge branch 'main' into cleanup Tom Aarsen 2022-11-01 09:43:49 +0100
  • 8d87c0b852 Fixed CUDA setup bugs, including #81. Tim Dettmers 2022-10-31 18:04:49 -0700
  • 8724c990c7 allow hiding of the welcome message adpkadspokasdk 2022-10-27 16:04:49 -0600
  • 2a91e15113 Remove outdated linter log Tom Aarsen 2022-10-27 20:50:49 +0200
  • 4844aef4ff Fixing bad error when GPU was not detected for #73. Tim Dettmers 2022-10-27 08:54:30 -0700
  • 96ab2af1ef Bump version. Tim Dettmers 2022-10-27 07:09:08 -0700
  • 29e239e4d1
    Merge pull request #72 from tomaarsen/hotfix/uncalled_func Tim Dettmers 2022-10-27 07:06:54 -0700
  • c6dad28a08 Remove extraneous get_ptr calls Tom Aarsen 2022-10-27 13:53:16 +0200
  • 7727fa4c8c Remove f-prefix from strings that don't use formatting Tom Aarsen 2022-10-27 13:36:39 +0200
  • 54bd6ed1d6 Remove unused imports Tom Aarsen 2022-10-27 13:32:01 +0200
  • ef70f2adcd Fix bad indentation Tom Aarsen 2022-10-27 13:27:18 +0200
  • 697bd02c60 Resolve dangerous default value [] as argument Tom Aarsen 2022-10-27 13:25:51 +0200
  • b5cf706341 Removing unnecessary else's Tom Aarsen 2022-10-27 13:25:07 +0200
  • 4a05df34c2 Fix critical bug in PytorchLARS().step: Undefined variable Tom Aarsen 2022-10-27 13:19:09 +0200
  • f6978ae2a2 Fix critical bug in histogram_scatter_add_2d: Undefined variable Tom Aarsen 2022-10-27 13:16:53 +0200
  • 7a3c9af05d Sort imports Tom Aarsen 2022-10-27 13:15:21 +0200
  • 0b078403ee Simplify statements into equivalent, modern variants Tom Aarsen 2022-10-27 13:14:13 +0200
  • 1eec77d34c Remove trailing whitespace & ensure newline at EOF Tom Aarsen 2022-10-27 13:11:29 +0200
  • 31f6689504 Remove references to unused cli Tom Aarsen 2022-10-27 13:10:32 +0200
  • 4faf6cb7e9 Replace seemingly incorrect use of CUDA_RUNTIME_LIB Tom Aarsen 2022-10-26 09:43:57 +0200
  • c584482f1f Resolve cases of CUDASetup.get_instance not being called when used Tom Aarsen 2022-10-26 09:37:16 +0200
  • a371be302d Added CUDA SETUP instruction generator. Tim Dettmers 2022-10-25 08:01:19 -0700
  • 62e1649357 Bumped version. Fixes for diverse issues relating CUDA SETUP. Tim Dettmers 2022-10-24 14:47:56 -0700
  • df86625a93 Isolated CUDASetup logging; all tests green. Tim Dettmers 2022-10-24 11:54:25 -0700
  • b844e104b7 Updated docs (#32) and changelog. Tim Dettmers 2022-10-09 19:31:43 -0700
  • 62b6a9399d Added CUDA 11.8 install and deployment. Tim Dettmers 2022-10-09 19:02:28 -0700
  • ed2e3b9db4
    Merge pull request #36 from tomaarsen/hotfix/os_error_name_too_long Tim Dettmers 2022-10-09 16:47:11 -0700
  • 76699b4a8d
    Merge pull request #37 from tomaarsen/hotfix/colab_just_cpu Tim Dettmers 2022-10-09 16:43:58 -0700
  • 7740c6e9c9 Fixed url in setup.py (#38), updated changelog. Tim Dettmers 2022-09-19 21:13:40 -0700
  • 439f2b0c10
    Merge pull request #33 from dbaranchuk/memory-efficient-backward Tim Dettmers 2022-09-19 21:09:25 -0700
  • 76ce9aa6da try fp32 justheuristic 2022-09-20 06:51:25 +0300
  • 292a478716 set threshold Tim Dettmers 2022-09-20 06:42:05 +0300
  • a07825ac31 review justheuristic 2022-09-20 06:40:36 +0300
  • 9b7d307b8c review Tim Dettmers 2022-09-20 06:36:32 +0300
  • cff3a71599 cast device justheuristic 2022-09-18 01:26:25 +0300
  • 32a9a88f98 cast device justheuristic 2022-09-18 01:26:12 +0300
  • 01b4c6a048 cast device justheuristic 2022-09-18 01:25:56 +0300
  • e4086a2758 cast device justheuristic 2022-09-18 01:24:57 +0300
  • 725cc72993 cast device justheuristic 2022-09-18 01:24:44 +0300
  • 28a9313ddc cast before allclose justheuristic 2022-09-18 01:24:27 +0300
  • 95dafc6475 cast before allclose justheuristic 2022-09-18 01:22:31 +0300
  • 37f805bb44 debug justheuristic 2022-09-18 01:21:12 +0300
  • 6a826c41a6 pre-cast justheuristic 2022-09-18 01:20:34 +0300
  • d9b8789818 debug justheuristic 2022-09-18 01:13:58 +0300
  • 5d65817101 debug justheuristic 2022-09-18 01:09:24 +0300
  • 4da2227fcb debug justheuristic 2022-09-18 01:03:21 +0300
  • 4b4a9effd1 debugprint justheuristic 2022-09-18 01:02:13 +0300
  • 7906dc4c9a debugpritn justheuristic 2022-09-18 00:57:26 +0300
  • 2cd047e35d run backward justheuristic 2022-09-18 00:55:53 +0300
  • 591f60395a add memory efficient backward justheuristic 2022-09-18 00:52:53 +0300
  • 579b8c782f reduce diff justheuristic 2022-09-18 00:47:58 +0300
  • 76ece2c126 rollback justheuristic 2022-09-18 00:43:56 +0300
  • 18f142e268 addmm_ justheuristic 2022-09-18 00:43:02 +0300
  • f6670329fb bump threshold to 0.21 justheuristic 2022-09-18 00:42:23 +0300
  • fa8e07c7c5 more lenient threshold justheuristic 2022-09-18 00:38:02 +0300
  • ab9dee062d cast edge case justheuristic 2022-09-18 00:36:46 +0300