Commit Graph

96 Commits

Author SHA1 Message Date
Li Dong
d1fefe9c22
rollback LN epsilon in retention
rollback 2c29de0fb3
2023-09-27 20:40:36 +08:00
Shuming Ma
258eda3308
Update vocab links 2023-08-11 16:46:37 +08:00
Shuming Ma
70e047a53b
Update README.md 2023-08-11 11:26:19 +08:00
Li Dong
8b07f19ba0
Update README.md 2023-08-10 13:15:42 +08:00
Shuming Ma
e2db7ae123
Merge pull request #51 from sunyt32/retnet-official
fix chunkwise inconsistency bug
2023-08-04 13:51:53 +08:00
sunyt32
0b1f113985 fix chunkwise inconsistency bug 2023-08-04 05:48:58 +00:00
Li Dong
0faee72d6f
Merge pull request #50 from wangmengzhi/main-2
Adding sqrt in the recurrent_forward of retnet to make it consistent with parallel_forward
2023-08-04 09:02:38 +08:00
wangmengzhi
7f0bf80a7e
Adding sqrt in the recurrent_forward of retnet to make it consistent with parallel_forward
Adding sqrt in the recurrent_forward of retnet can avoid numerical underflow thus improving consistency and performance. https://github.com/microsoft/torchscale/issues/47
2023-08-04 08:18:10 +08:00
Shuming Ma
7d231743f4
Merge pull request #46 from sunyt32/retnet-official
Update epsilon in retention
2023-08-02 14:07:44 +08:00
sunyt32
2c29de0fb3 Update epsilon in retention 2023-08-02 05:38:30 +00:00
Shuming Ma
5356b252c4 Update MT config 2023-07-31 09:17:03 -07:00
gitnlp
ea07735c7b
Update README.md 2023-07-27 12:33:20 +08:00
gitnlp
63a4f2df2e
Update README.md 2023-07-27 12:32:37 +08:00
gitnlp
3573227c88
Update README.md 2023-07-26 18:40:30 +08:00
gitnlp
f58c8247be
Update README.md 2023-07-26 18:39:23 +08:00
gitnlp
774003903e
Update README.md 2023-07-26 18:38:49 +08:00
Li Dong
bf65397b26 RetNet 2023-07-24 14:30:13 +08:00
Shuming Ma
89d8c6de9e Fix encdec config issue 2023-07-23 23:07:21 -07:00
Shuming Ma
2b101355d7
Merge pull request #33 from XingxingZhang/lm_sampling
support lm prefix computation in one go
2023-06-04 02:01:54 +08:00
Shuming Ma
859c4c6bcc
Merge pull request #31 from klae01/feature/pytest
add basic test
2023-06-04 02:01:29 +08:00
Shuming Ma
0cef83d675
Merge pull request #27 from JonathanRayner/patch-1
Bump timm version to latest
2023-06-04 02:01:04 +08:00
Xingxing Zhang
c766630327 support lm prefix computation in one go 2023-06-03 15:37:47 +00:00
klae01
d675712846 add basic test 2023-05-26 13:30:00 +09:00
Shuming Ma
b59b6f87b9
Merge pull request #30 from njb-ms/main
make pgs global
2023-04-26 12:12:55 +08:00
johan bjorck
a4d830b87d make num experts optional arg 2023-04-24 17:29:39 +00:00
johan bjorck
886c8ab408 make pgs global 2023-04-22 00:32:05 +00:00
Jonathan Rayner
691e843ed5
Bump timm version to latest 2023-04-11 11:02:21 +01:00
Shuming Ma
4ae3b248ee
Merge pull request #21 from microsoft/0.2.0
v0.2.0
2023-03-15 15:21:52 +08:00
Shuming Ma
73b766812a v0.2.0 2023-03-15 00:20:00 -07:00
Li Dong
36c0e55004
Update README.md 2023-03-13 21:40:20 +08:00
Wenhui Wang
599df73687 b3 incremental decoding 2023-03-09 12:02:36 +08:00
shumingma
891f84f302 Fix MoE sample size 2023-03-08 01:19:36 -08:00
shumingma
0a07df1e5b Update Bert MoE 2023-03-07 21:21:48 -08:00
shumingma
c397ebb013 Fix Bert MoE 2023-03-07 21:11:05 -08:00
shumingma
670113e446 Update MoE criterions 2023-03-07 20:53:41 -08:00
shumingma
8d8b80a731 Merge branch 'main' of https://github.com/microsoft/torchscale into main 2023-03-05 19:24:31 -08:00
shumingma
a788e67ef2 Fix Bert dense 2023-03-05 19:24:14 -08:00
Shuming Ma
a491af1113
Merge pull request #20 from buaahsh/main
fx BERT + moe
2023-03-05 23:31:45 +08:00
Shaohan Huang
5b0be94ab8
add --pad-to-max-length in bert+moe example 2023-03-05 19:39:04 +08:00
Shaohan Huang
95aea9c1b4
set numpy version 2023-03-05 19:36:07 +08:00
buaahsh
bc140c65bb fx bert moe 2023-03-05 07:43:58 +00:00
shumingma
32cb51ae38 v0.1.2 2023-03-04 01:11:34 -08:00
Shuming Ma
27d818674f
Merge pull request #19 from buaahsh/patch-1
Update README.md
2023-03-04 10:55:52 +08:00
Shaohan Huang
cbdbc1dfc8
Update README.md 2023-03-04 07:37:01 +08:00
shumingma
20c1e6c611 Bert MoE 2023-03-02 02:54:19 -08:00
shumingma
0cb9695501 Remove inplace 2023-01-18 22:44:26 -08:00
shumingma
9f105b591d Support Pytorch LayerNorm 2023-01-16 20:17:28 -08:00
Shuming Ma
82f140a6c4
Merge pull request #12 from microsoft/bsz
Batch size first
2023-01-16 20:07:52 +08:00
shumingma
1a5d2c26fe Batch size first 2023-01-05 01:19:51 -08:00
Shuming Ma
776b070d68
Merge pull request #11 from microsoft/xpos
Adding the official implementation of Xpos (https://arxiv.org/abs/2212.10554)
2023-01-05 11:08:13 +08:00