Commit Graph

73 Commits

Author SHA1 Message Date
Xingxing Zhang
c766630327 support lm prefix computation in one go 2023-06-03 15:37:47 +00:00
Shuming Ma
b59b6f87b9
Merge pull request #30 from njb-ms/main
make pgs global
2023-04-26 12:12:55 +08:00
johan bjorck
a4d830b87d make num experts optional arg 2023-04-24 17:29:39 +00:00
johan bjorck
886c8ab408 make pgs global 2023-04-22 00:32:05 +00:00
Shuming Ma
4ae3b248ee
Merge pull request #21 from microsoft/0.2.0
v0.2.0
2023-03-15 15:21:52 +08:00
Shuming Ma
73b766812a v0.2.0 2023-03-15 00:20:00 -07:00
Li Dong
36c0e55004
Update README.md 2023-03-13 21:40:20 +08:00
Wenhui Wang
599df73687 b3 incremental decoding 2023-03-09 12:02:36 +08:00
shumingma
891f84f302 Fix MoE sample size 2023-03-08 01:19:36 -08:00
shumingma
0a07df1e5b Update Bert MoE 2023-03-07 21:21:48 -08:00
shumingma
c397ebb013 Fix Bert MoE 2023-03-07 21:11:05 -08:00
shumingma
670113e446 Update MoE criterions 2023-03-07 20:53:41 -08:00
shumingma
8d8b80a731 Merge branch 'main' of https://github.com/microsoft/torchscale into main 2023-03-05 19:24:31 -08:00
shumingma
a788e67ef2 Fix Bert dense 2023-03-05 19:24:14 -08:00
Shuming Ma
a491af1113
Merge pull request #20 from buaahsh/main
fx BERT + moe
2023-03-05 23:31:45 +08:00
Shaohan Huang
5b0be94ab8
add --pad-to-max-length in bert+moe example 2023-03-05 19:39:04 +08:00
Shaohan Huang
95aea9c1b4
set numpy version 2023-03-05 19:36:07 +08:00
buaahsh
bc140c65bb fx bert moe 2023-03-05 07:43:58 +00:00
shumingma
32cb51ae38 v0.1.2 2023-03-04 01:11:34 -08:00
Shuming Ma
27d818674f
Merge pull request #19 from buaahsh/patch-1
Update README.md
2023-03-04 10:55:52 +08:00
Shaohan Huang
cbdbc1dfc8
Update README.md 2023-03-04 07:37:01 +08:00
shumingma
20c1e6c611 Bert MoE 2023-03-02 02:54:19 -08:00
shumingma
0cb9695501 Remove inplace 2023-01-18 22:44:26 -08:00
shumingma
9f105b591d Support Pytorch LayerNorm 2023-01-16 20:17:28 -08:00
Shuming Ma
82f140a6c4
Merge pull request #12 from microsoft/bsz
Batch size first
2023-01-16 20:07:52 +08:00
shumingma
1a5d2c26fe Batch size first 2023-01-05 01:19:51 -08:00
Shuming Ma
776b070d68
Merge pull request #11 from microsoft/xpos
Adding the official implementation of Xpos (https://arxiv.org/abs/2212.10554)
2023-01-05 11:08:13 +08:00
shumingma
9d968a24ed Update XPos 2023-01-03 22:54:24 -08:00
shumingma
f9d98f4b68 Add XPOS 2022-12-29 20:48:43 -08:00
shumingma
aa36203042 Fix multiway checkpointing 2022-12-27 22:32:02 -08:00
gitnlp
22438a8525
Update README.md 2022-12-23 08:26:08 +08:00
Shuming Ma
21ed0056d7
Merge pull request #9 from MatthewChang/fix_output_projection_decoder
fix a bug that overrides the default constructed output_projection
2022-12-22 11:19:25 +08:00
Matthew Chang
adcd995595 fix a bug which overrides the default constructed output_projection when none is
passed in
2022-12-21 16:24:44 -06:00
shumingma
7e12b582e4 Support latest fairseq 2022-12-15 03:44:15 -08:00
shumingma
2518ea030c Fix example fsdp 2022-12-08 04:20:27 -08:00
Shuming Ma
6d62bbbf67
Merge pull request #8 from buaahsh/main
don't need attn weight in decoder
2022-12-06 19:06:26 +08:00
buaahsh
2005ab1f26 don't need attn weight in decoder 2022-12-06 18:31:17 +08:00
shumingma
be167b3dda Add an example for vocab 2022-12-01 20:40:09 -08:00
shumingma
7b29d32f03 Remove unused parameters 2022-11-29 21:36:03 -08:00
Shuming Ma
5adbe971cf
Merge pull request #5 from kashif/typo
fix typo
2022-11-29 18:00:13 +08:00
Kashif Rasul
e8be99f8f1 fix typo 2022-11-29 10:48:56 +01:00
Shuming Ma
559b5fdf56
Merge pull request #4 from kashif/kashif-patch-1
remove lambda
2022-11-29 12:21:53 +08:00
Kashif Rasul
c69aba2a73 fix call to activation_fn 2022-11-29 00:11:38 +01:00
Kashif Rasul
be14bc23a1 typo 2022-11-29 00:11:02 +01:00
Kashif Rasul
e7d5ec2ad7
remove lambda 2022-11-29 00:02:26 +01:00
gitnlp
c0ad46d7b8
Update README.md 2022-11-28 22:29:46 +08:00
gitnlp
800ea8d39f
Update README.md 2022-11-27 22:45:31 +08:00
gitnlp
8dd8055826
Update README.md 2022-11-27 22:38:02 +08:00
shumingma
7eca1a531c Code reformatting 2022-11-26 09:01:02 -08:00
shumingma
1354614d44 Update config file 2022-11-26 08:15:08 -08:00