Tim Dettmers
|
486488bccb
|
Bumped version.
|
2023-07-14 12:55:57 -07:00 |
|
Tim Dettmers
|
6c6e5fcb53
|
Added changelog entry.
|
2023-07-14 12:55:04 -07:00 |
|
Tim Dettmers
|
55f4c398a0
|
Polished CUDA SETUP replacement and added docs.
|
2023-07-14 12:50:59 -07:00 |
|
Tim Dettmers
|
1ab6758b36
|
Changed CUDA setup to use PyTorch default; added a weak test.
|
2023-07-13 23:58:41 -07:00 |
|
Tim Dettmers
|
ac155f7415
|
Merge branch 'main' into bugfixes
|
2023-07-13 21:55:35 -07:00 |
|
Tim Dettmers
|
e8df8d64a2
|
Merge pull request #375 from rapsealk/fix/libcuda-to-torch
Replace libcudart.so with PyTorch's CUDA APIs
|
2023-07-13 21:54:47 -07:00 |
|
Tim Dettmers
|
c00402f17e
|
Fixed a bug in absmax float conversion.
|
2023-07-13 21:47:38 -07:00 |
|
Tim Dettmers
|
6689afaec4
|
Merge pull request #567 from apbard/patch-1
[BugFix] replace view+continuous with reshape
|
2023-07-13 21:45:00 -07:00 |
|
Tim Dettmers
|
67475257a9
|
Added documentation for NF4; failing 8-bit matmul; fixed absmax bug. #529 #543
|
2023-07-13 21:41:43 -07:00 |
|
Tim Dettmers
|
8a20cd864b
|
Added missing scipy requirement. Addressing #544
|
2023-07-13 21:25:07 -07:00 |
|
Tim Dettmers
|
097b1cc5da
|
Fixed bug caused by undefined default type of absmax. #553
|
2023-07-13 21:23:33 -07:00 |
|
Tim Dettmers
|
7b6cfe1738
|
Added H100 support for CUDA 11.8 precompiled binaries.
|
2023-07-13 21:16:23 -07:00 |
|
Tim Dettmers
|
817bdf6325
|
Bumped version after hotfix.
|
2023-07-11 17:16:05 -07:00 |
|
Tim Dettmers
|
90b0ac57b0
|
Fixed missing bias in bnb.matmul_4bit for inference; more tests.
|
2023-07-11 17:13:33 -07:00 |
|
Tim Dettmers
|
dc96e9e7c8
|
Test for bloom that fails with inference kernels.
|
2023-07-11 15:40:20 -07:00 |
|
Tim Dettmers
|
ae7cd6ad14
|
Bump version.
|
2023-07-11 05:58:25 -07:00 |
|
Tim Dettmers
|
ba51d95d43
|
Added more extensive gemv tests; blocksize guard for gemv.
|
2023-07-11 05:55:49 -07:00 |
|
Tim Dettmers
|
b8da4a165a
|
Bump on version.
|
2023-07-10 16:40:22 -07:00 |
|
Tim Dettmers
|
a26a321e07
|
Removed debugging statement.
|
2023-07-10 14:34:19 -07:00 |
|
Tim Dettmers
|
306f6b2362
|
Fixed accidential deletion of limits in kernel.
|
2023-07-10 14:24:33 -07:00 |
|
Tim Dettmers
|
2221f4cee0
|
Fixed potential memory leak.
|
2023-07-10 13:57:44 -07:00 |
|
Tim Dettmers
|
490153b29f
|
Added generation tests.
|
2023-07-10 12:19:16 -07:00 |
|
Tim Dettmers
|
1c774ecebb
|
Added ARCH guard for bfloat16 computations.
|
2023-07-10 09:53:23 -07:00 |
|
Tim Dettmers
|
0a1cced375
|
Fixed typo in cuda_install.sh.
|
2023-07-10 06:40:19 -07:00 |
|
Tim Dettmers
|
0d344b70ba
|
Changelog and version bump.
|
2023-07-10 06:38:57 -07:00 |
|
Tim Dettmers
|
73aa4e0a33
|
Fixed Makefile and added CUDA 12.2 install.
|
2023-07-10 06:34:04 -07:00 |
|
Tim Dettmers
|
5f492d437e
|
Merge remote-tracking branch 'origin/inference'
|
2023-07-10 06:24:24 -07:00 |
|
Tim Dettmers
|
196d6f5dc1
|
Merge pull request #469 from shadeMe/linear-layer-device
Add `device` parameter to `Linear` subclasses and `Embedding`
|
2023-07-10 06:17:13 -07:00 |
|
Tim Dettmers
|
5fab673442
|
Added fp32 compute type for gemv_4bit.
|
2023-07-09 21:06:01 -07:00 |
|
Tim Dettmers
|
cef519c89e
|
Added test for Param4bit.to() and fixed double quant behavior.
|
2023-07-09 17:16:50 -07:00 |
|
Tim Dettmers
|
6a905be5ce
|
Fixed a bug where gemv_4bit would return a wrongly sized tensor.
|
2023-07-09 15:34:02 -07:00 |
|
Tim Dettmers
|
0f0390acb2
|
Added double quantization support and tests.
|
2023-07-09 15:32:03 -07:00 |
|
Tim Dettmers
|
94168d79d7
|
Added FP4 fast inference support.
|
2023-07-09 14:46:19 -07:00 |
|
Tim Dettmers
|
4b88d69de7
|
Added abitrary data types; fixed a bug for small matrices.
|
2023-07-09 12:04:09 -07:00 |
|
Tim Dettmers
|
eefbf60270
|
Turning optimization (float accumulation). 185 vs 50.
|
2023-07-08 16:31:58 -07:00 |
|
Tim Dettmers
|
7e49b5b938
|
Added warp_shuffle indexing 185 vs 54.
|
2023-07-08 14:27:12 -07:00 |
|
Alessandro Pietro Bardelli
|
463630dc73
|
[BugFix] replace view+continuous with reshape
|
2023-07-06 12:26:03 +02:00 |
|
Jeongseok Kang
|
a24aae30bf
|
Merge branch 'main' into fix/libcuda-to-torch
|
2023-07-06 15:43:42 +09:00 |
|
Tim Dettmers
|
02fd80cb81
|
Added bfloat16 quantizations and tests.
|
2023-07-04 19:58:31 -07:00 |
|
Tim Dettmers
|
dfe6900b94
|
Vectorized loads, conflict free NF4; 52 vs 172.
|
2023-07-04 15:20:10 -07:00 |
|
Tim Dettmers
|
f89ff93e26
|
Initial 4-bit naive batch size 1, 81 vs 185.
|
2023-07-03 18:45:38 -07:00 |
|
Tim Dettmers
|
4395d68cf6
|
Release 0.39.1.
|
2023-06-19 19:40:41 -07:00 |
|
Tim Dettmers
|
2d321a7524
|
Merge pull request #503 from TimDettmers/efficient_8bit_serialize
Make 8-bit serialization more memory-efficient (v2)
|
2023-06-19 11:28:30 -07:00 |
|
Max Ryabinin
|
b599fdb197
|
Only rearrange weight if it exists
|
2023-06-14 19:27:13 +02:00 |
|
Max Ryabinin
|
c1f3f56d2c
|
Rearrange the weights directly in state dict before loading
|
2023-06-09 21:58:39 +02:00 |
|
Max Ryabinin
|
f734076e94
|
Improve memory efficiency of 8-bit serialization
|
2023-06-09 21:39:57 +02:00 |
|
Max Ryabinin
|
4fb37d45c1
|
Extract get_tile_inds to a separate function
|
2023-06-09 21:39:37 +02:00 |
|
shadeMe
|
db49ad43ab
|
Add device parameter to Embedding
|
2023-06-01 17:43:49 +02:00 |
|
shadeMe
|
9cac5dd1b6
|
Add device parameter to Linear subclasses
|
2023-06-01 17:43:30 +02:00 |
|
Tim Dettmers
|
e54d2730fc
|
Added debugging functions.
|
2023-05-30 20:42:21 -07:00 |
|