Commit Graph

106 Commits

Author SHA1 Message Date
mrq
008f1b6d18 added compat flags because I guess the maintainer assumed no one was actually using the retnet and thinks they can change things willy nilly 2023-10-05 16:38:57 -05:00
mrq
ce77afe916 added arg to change RelPos's base 2023-10-05 16:02:08 -05:00
Shuming Ma
881d03079d
Merge pull request #70 from sunyt32/retnet-official
fix fairseq example
2023-09-29 12:59:56 +08:00
sunyt32
50174a3078 fix fairseq example 2023-09-29 03:50:24 +00:00
Shuming Ma
ab1d9d677a
Merge pull request #69 from sunyt32/retnet-official
Update new RetNet settings
2023-09-29 10:07:48 +08:00
sunyt32
05a9628309 fix bug 2023-09-28 17:39:26 +00:00
sunyt32
59fc5f7d3d fix bug 2023-09-28 17:05:53 +00:00
sunyt32
fd8234c2ac rollback variant name 2023-09-28 16:44:51 +00:00
sunyt32
7f07609361 update README.md 2023-09-28 14:26:56 +00:00
sunyt32
5c89ffbeea modify rms norm and value dim in retention 2023-09-28 14:24:37 +00:00
Li Dong
d1fefe9c22
rollback LN epsilon in retention
rollback 2c29de0fb3
2023-09-27 20:40:36 +08:00
Shuming Ma
258eda3308
Update vocab links 2023-08-11 16:46:37 +08:00
Shuming Ma
70e047a53b
Update README.md 2023-08-11 11:26:19 +08:00
Li Dong
8b07f19ba0
Update README.md 2023-08-10 13:15:42 +08:00
Shuming Ma
e2db7ae123
Merge pull request #51 from sunyt32/retnet-official
fix chunkwise inconsistency bug
2023-08-04 13:51:53 +08:00
sunyt32
0b1f113985 fix chunkwise inconsistency bug 2023-08-04 05:48:58 +00:00
Li Dong
0faee72d6f
Merge pull request #50 from wangmengzhi/main-2
Adding sqrt in the recurrent_forward of retnet to make it consistent with parallel_forward
2023-08-04 09:02:38 +08:00
wangmengzhi
7f0bf80a7e
Adding sqrt in the recurrent_forward of retnet to make it consistent with parallel_forward
Adding sqrt in the recurrent_forward of retnet can avoid numerical underflow thus improving consistency and performance. https://github.com/microsoft/torchscale/issues/47
2023-08-04 08:18:10 +08:00
Shuming Ma
7d231743f4
Merge pull request #46 from sunyt32/retnet-official
Update epsilon in retention
2023-08-02 14:07:44 +08:00
sunyt32
2c29de0fb3 Update epsilon in retention 2023-08-02 05:38:30 +00:00
Shuming Ma
5356b252c4 Update MT config 2023-07-31 09:17:03 -07:00
gitnlp
ea07735c7b
Update README.md 2023-07-27 12:33:20 +08:00
gitnlp
63a4f2df2e
Update README.md 2023-07-27 12:32:37 +08:00
gitnlp
3573227c88
Update README.md 2023-07-26 18:40:30 +08:00
gitnlp
f58c8247be
Update README.md 2023-07-26 18:39:23 +08:00
gitnlp
774003903e
Update README.md 2023-07-26 18:38:49 +08:00
Li Dong
bf65397b26 RetNet 2023-07-24 14:30:13 +08:00
Shuming Ma
89d8c6de9e Fix encdec config issue 2023-07-23 23:07:21 -07:00
Shuming Ma
2b101355d7
Merge pull request #33 from XingxingZhang/lm_sampling
support lm prefix computation in one go
2023-06-04 02:01:54 +08:00
Shuming Ma
859c4c6bcc
Merge pull request #31 from klae01/feature/pytest
add basic test
2023-06-04 02:01:29 +08:00
Shuming Ma
0cef83d675
Merge pull request #27 from JonathanRayner/patch-1
Bump timm version to latest
2023-06-04 02:01:04 +08:00
Xingxing Zhang
c766630327 support lm prefix computation in one go 2023-06-03 15:37:47 +00:00
klae01
d675712846 add basic test 2023-05-26 13:30:00 +09:00
Shuming Ma
b59b6f87b9
Merge pull request #30 from njb-ms/main
make pgs global
2023-04-26 12:12:55 +08:00
johan bjorck
a4d830b87d make num experts optional arg 2023-04-24 17:29:39 +00:00
johan bjorck
886c8ab408 make pgs global 2023-04-22 00:32:05 +00:00
Jonathan Rayner
691e843ed5
Bump timm version to latest 2023-04-11 11:02:21 +01:00
Shuming Ma
4ae3b248ee
Merge pull request #21 from microsoft/0.2.0
v0.2.0
2023-03-15 15:21:52 +08:00
Shuming Ma
73b766812a v0.2.0 2023-03-15 00:20:00 -07:00
Li Dong
36c0e55004
Update README.md 2023-03-13 21:40:20 +08:00
Wenhui Wang
599df73687 b3 incremental decoding 2023-03-09 12:02:36 +08:00
shumingma
891f84f302 Fix MoE sample size 2023-03-08 01:19:36 -08:00
shumingma
0a07df1e5b Update Bert MoE 2023-03-07 21:21:48 -08:00
shumingma
c397ebb013 Fix Bert MoE 2023-03-07 21:11:05 -08:00
shumingma
670113e446 Update MoE criterions 2023-03-07 20:53:41 -08:00
shumingma
8d8b80a731 Merge branch 'main' of https://github.com/microsoft/torchscale into main 2023-03-05 19:24:31 -08:00
shumingma
a788e67ef2 Fix Bert dense 2023-03-05 19:24:14 -08:00
Shuming Ma
a491af1113
Merge pull request #20 from buaahsh/main
fx BERT + moe
2023-03-05 23:31:45 +08:00
Shaohan Huang
5b0be94ab8
add --pad-to-max-length in bert+moe example 2023-03-05 19:39:04 +08:00
Shaohan Huang
95aea9c1b4
set numpy version 2023-03-05 19:36:07 +08:00