Commit Graph

106 Commits

Author SHA1 Message Date
buaahsh
bc140c65bb fx bert moe 2023-03-05 07:43:58 +00:00
shumingma
32cb51ae38 v0.1.2 2023-03-04 01:11:34 -08:00
Shuming Ma
27d818674f
Merge pull request #19 from buaahsh/patch-1
Update README.md
2023-03-04 10:55:52 +08:00
Shaohan Huang
cbdbc1dfc8
Update README.md 2023-03-04 07:37:01 +08:00
shumingma
20c1e6c611 Bert MoE 2023-03-02 02:54:19 -08:00
shumingma
0cb9695501 Remove inplace 2023-01-18 22:44:26 -08:00
shumingma
9f105b591d Support Pytorch LayerNorm 2023-01-16 20:17:28 -08:00
Shuming Ma
82f140a6c4
Merge pull request #12 from microsoft/bsz
Batch size first
2023-01-16 20:07:52 +08:00
shumingma
1a5d2c26fe Batch size first 2023-01-05 01:19:51 -08:00
Shuming Ma
776b070d68
Merge pull request #11 from microsoft/xpos
Adding the official implementation of Xpos (https://arxiv.org/abs/2212.10554)
2023-01-05 11:08:13 +08:00
shumingma
9d968a24ed Update XPos 2023-01-03 22:54:24 -08:00
shumingma
f9d98f4b68 Add XPOS 2022-12-29 20:48:43 -08:00
shumingma
aa36203042 Fix multiway checkpointing 2022-12-27 22:32:02 -08:00
gitnlp
22438a8525
Update README.md 2022-12-23 08:26:08 +08:00
Shuming Ma
21ed0056d7
Merge pull request #9 from MatthewChang/fix_output_projection_decoder
fix a bug that overrides the default constructed output_projection
2022-12-22 11:19:25 +08:00
Matthew Chang
adcd995595 fix a bug which overrides the default constructed output_projection when none is
passed in
2022-12-21 16:24:44 -06:00
shumingma
7e12b582e4 Support latest fairseq 2022-12-15 03:44:15 -08:00
shumingma
2518ea030c Fix example fsdp 2022-12-08 04:20:27 -08:00
Shuming Ma
6d62bbbf67
Merge pull request #8 from buaahsh/main
don't need attn weight in decoder
2022-12-06 19:06:26 +08:00
buaahsh
2005ab1f26 don't need attn weight in decoder 2022-12-06 18:31:17 +08:00
shumingma
be167b3dda Add an example for vocab 2022-12-01 20:40:09 -08:00
shumingma
7b29d32f03 Remove unused parameters 2022-11-29 21:36:03 -08:00
Shuming Ma
5adbe971cf
Merge pull request #5 from kashif/typo
fix typo
2022-11-29 18:00:13 +08:00
Kashif Rasul
e8be99f8f1 fix typo 2022-11-29 10:48:56 +01:00
Shuming Ma
559b5fdf56
Merge pull request #4 from kashif/kashif-patch-1
remove lambda
2022-11-29 12:21:53 +08:00
Kashif Rasul
c69aba2a73 fix call to activation_fn 2022-11-29 00:11:38 +01:00
Kashif Rasul
be14bc23a1 typo 2022-11-29 00:11:02 +01:00
Kashif Rasul
e7d5ec2ad7
remove lambda 2022-11-29 00:02:26 +01:00
gitnlp
c0ad46d7b8
Update README.md 2022-11-28 22:29:46 +08:00
gitnlp
800ea8d39f
Update README.md 2022-11-27 22:45:31 +08:00
gitnlp
8dd8055826
Update README.md 2022-11-27 22:38:02 +08:00
shumingma
7eca1a531c Code reformatting 2022-11-26 09:01:02 -08:00
shumingma
1354614d44 Update config file 2022-11-26 08:15:08 -08:00
shumingma
994e4665a2 flake8 lint checks 2022-11-26 08:10:15 -08:00
shumingma
4714557e89 Update features section 2022-11-24 20:42:10 -08:00
shumingma
5cbb7980a9 Add features section 2022-11-24 01:06:46 -08:00
Li Dong
afd9094fb5
Merge pull request #1 from buaahsh/main
Fix decoder_embed_dim in Fairseq example
2022-11-24 15:54:50 +08:00
Shaohan Huang
bdf759f116 decoder_embed_dim -> args.decoder_embed_dim 2022-11-24 14:30:39 +08:00
Li Dong
1fce6ee98b
Update README.md 2022-11-24 13:51:25 +08:00
shumingma
be3cf93e84 Add paper link 2022-11-23 21:44:52 -08:00
shumingma
05636d0eb4 change pic path 2022-11-23 20:33:29 -08:00
shumingma
79284f5e8a Merge branch 'main' of https://github.com/microsoft/torchscale into main 2022-11-23 20:31:00 -08:00
Li Dong
78f6e8a205 Update README.md
xmoe bibtex
2022-11-23 20:29:19 -08:00
gitnlp
5042ce960d Update README.md 2022-11-23 20:28:29 -08:00
shumingma
ec24e55f6a update pic path 2022-11-23 20:25:12 -08:00
Li Dong
660a291402
Update README.md
xmoe bibtex
2022-11-24 11:40:38 +08:00
gitnlp
51abba7c8b
Update README.md 2022-11-24 09:29:34 +08:00
shumingma
65fe50f466 update copyright 2022-11-23 08:36:55 -08:00
shumingma
ede048831f torchscale released 2022-11-23 08:21:58 -08:00
shumingma
41f6ee5687 Update README.md 2022-11-17 01:18:20 -08:00