Commit Graph

43 Commits

Author SHA1 Message Date
mrq
c88f97a9c8 drop support for gfx903 because depending on hipblaslt gums up too many things 2023-10-12 19:16:14 -05:00
arlo-phoenix
d10197bc93 Add HIP to cuda defines
collected by hipifying all files and then comparing with original
Cuda file
2023-08-05 02:11:46 +02:00
Tim Dettmers
7be5f2c7b3 Guard for prefetchAsync GPU capability. #470 #451 #477 2023-07-16 21:12:03 -07:00
Tim Dettmers
5fab673442 Added fp32 compute type for gemv_4bit. 2023-07-09 21:06:01 -07:00
Tim Dettmers
4b88d69de7 Added abitrary data types; fixed a bug for small matrices. 2023-07-09 12:04:09 -07:00
Tim Dettmers
02fd80cb81 Added bfloat16 quantizations and tests. 2023-07-04 19:58:31 -07:00
Tim Dettmers
f89ff93e26 Initial 4-bit naive batch size 1, 81 vs 185. 2023-07-03 18:45:38 -07:00
Tim Dettmers
1b8772a8f3 Added PagedLion and bf16 Lion. 2023-05-23 19:37:38 -07:00
Tim Dettmers
675baa79d2 Merge remote-tracking branch 'origin/main' into merge 2023-05-07 13:34:03 -07:00
Tim Dettmers
ec38ba95b0 Added paging. 2023-05-06 11:14:06 -07:00
Tim Dettmers
ad07d254fb Slow tensor core solution. 2023-04-30 17:43:02 -07:00
Tim Dettmers
21723f796a 4-bit draft. 2023-04-29 21:52:47 -07:00
Tim Dettmers
cad839941b Added bit template. 2023-04-28 22:10:42 -07:00
Tim Dettmers
f3e97ccbd2 New implementation for batch size 1. 2023-04-28 21:29:40 -07:00
Tim Dettmers
f6df4aef6a Added fp16 and thread/item template. 2023-04-28 18:26:52 -07:00
Tim Dettmers
3aef78342a Added template refactor. 2023-04-28 17:34:08 -07:00
Tim Dettmers
c1bfb210c5 First baseline kernel. 2023-04-28 17:19:02 -07:00
Tim Dettmers
9cab14a3ff Adedd pipeline draft. 2023-04-27 15:12:49 -07:00
Tim Dettmers
0afc8e9e2f Best attempt at cutlass3. 2023-04-26 17:12:34 -07:00
Tim Dettmers
7dc198feb7 Added 32-bit optimizer for bfloat16 gradients. 2023-04-17 18:01:49 -07:00
Tim Dettmers
7140c01405 Merge branch 'main' into fp8_merge 2023-04-12 11:44:39 -07:00
Tim Dettmers
64cc05920d First draft of NF4. 2023-04-02 16:10:35 -07:00
Tim Dettmers
c4cfe4fbdd Added bf16 Adam. 2023-04-01 10:33:03 -07:00
Phil Wang
cb4c3c8c66 do a bunch of typical bookkeeping before getting to main lion logic 2023-03-09 10:10:19 -08:00
Tim Dettmers
2489d819c5 Added more blocksizes for stochastic rounding; fixed dequant blocksize. 2023-02-14 13:55:17 -08:00
Tim Dettmers
3ac5840c03 Added fp4 quant/dequant and dequant optimizations. 2023-02-04 14:52:04 -08:00
Tom Aarsen
b104ce3b62
Merge branch 'main' into cleanup 2022-11-17 15:22:29 +01:00
Tim Dettmers
6bc2b992be Added blocksizes 2048, 1024, and 512 to blockwise quant. 2022-11-06 16:27:48 -08:00
Tom Aarsen
1eec77d34c Remove trailing whitespace & ensure newline at EOF 2022-10-27 13:11:29 +02:00
Tim Dettmers
19a7adca7a Fixed 2^31 max size issue for cpu blockwise quant. 2022-09-11 11:55:09 -07:00
Tim Dettmers
1ed2fa2f21 Removed storage() from get_ptr; added boilerplate for bias dequant_mm. 2022-08-16 10:56:17 -07:00
Tim Dettmers
5737f2b027 Merge branch 'patch_merge' into extract_outliers 2022-07-26 19:38:01 -07:00
Tim Dettmers
cbb901ac51 Boilerplate and test for extract_outliers. 2022-07-26 12:12:38 -07:00
Tim Dettmers
953b7285dd Fixed cpuonly build. 2022-07-26 09:12:16 -07:00
Tim Dettmers
8b1fd32e3e Fixed makefile; fixed Ampere igemmlt_8 bug. 2022-07-25 14:02:14 -07:00
Tim Dettmers
c771b3a75a Most tests passing. 2022-07-22 14:41:05 -07:00
Max Ryabinin
575aa698fa Reduce diff 2022-07-01 17:41:48 +03:00
Max Ryabinin
4d1d5b569f Reduce diff 2022-07-01 17:40:02 +03:00
Max Ryabinin
8258b4364a Add a CPU-only build option 2022-07-01 17:16:10 +03:00
Tim Dettmers
8b3c0f355c Added adagrad with tests (no clipping). 2021-11-10 15:10:02 -08:00
Tim Dettmers
a6eae2e7f2 Added skip_zeros; tests are passing. 2021-10-20 19:15:47 -07:00
Tim Dettmers
bb34fd50a1 Initial plumbing for skip_zeros. 2021-10-20 18:37:44 -07:00
Tim Dettmers
7439924891 Initial commit 2021-10-05 19:16:20 -07:00