Commit Graph

106 Commits (main)
 

Author SHA1 Message Date
mrq 008f1b6d18 added compat flags because I guess the maintainer assumed no one was actually using the retnet and thinks they can change things willy nilly 2023-10-05 16:38:57 +07:00
mrq ce77afe916 added arg to change RelPos's base 2023-10-05 16:02:08 +07:00
Shuming Ma 881d03079d
Merge pull request #70 from sunyt32/retnet-official
fix fairseq example
2023-09-29 12:59:56 +07:00
sunyt32 50174a3078 fix fairseq example 2023-09-29 03:50:24 +07:00
Shuming Ma ab1d9d677a
Merge pull request #69 from sunyt32/retnet-official
Update new RetNet settings
2023-09-29 10:07:48 +07:00
sunyt32 05a9628309 fix bug 2023-09-28 17:39:26 +07:00
sunyt32 59fc5f7d3d fix bug 2023-09-28 17:05:53 +07:00
sunyt32 fd8234c2ac rollback variant name 2023-09-28 16:44:51 +07:00
sunyt32 7f07609361 update README.md 2023-09-28 14:26:56 +07:00
sunyt32 5c89ffbeea modify rms norm and value dim in retention 2023-09-28 14:24:37 +07:00
Li Dong d1fefe9c22
rollback LN epsilon in retention
rollback 2c29de0fb3
2023-09-27 20:40:36 +07:00
Shuming Ma 258eda3308
Update vocab links 2023-08-11 16:46:37 +07:00
Shuming Ma 70e047a53b
Update README.md 2023-08-11 11:26:19 +07:00
Li Dong 8b07f19ba0
Update README.md 2023-08-10 13:15:42 +07:00
Shuming Ma e2db7ae123
Merge pull request #51 from sunyt32/retnet-official
fix chunkwise inconsistency bug
2023-08-04 13:51:53 +07:00
sunyt32 0b1f113985 fix chunkwise inconsistency bug 2023-08-04 05:48:58 +07:00
Li Dong 0faee72d6f
Merge pull request #50 from wangmengzhi/main-2
Adding sqrt in the recurrent_forward of retnet to make it consistent with parallel_forward
2023-08-04 09:02:38 +07:00
wangmengzhi 7f0bf80a7e
Adding sqrt in the recurrent_forward of retnet to make it consistent with parallel_forward
Adding sqrt in the recurrent_forward of retnet can avoid numerical underflow thus improving consistency and performance. https://github.com/microsoft/torchscale/issues/47
2023-08-04 08:18:10 +07:00
Shuming Ma 7d231743f4
Merge pull request #46 from sunyt32/retnet-official
Update epsilon in retention
2023-08-02 14:07:44 +07:00
sunyt32 2c29de0fb3 Update epsilon in retention 2023-08-02 05:38:30 +07:00
Shuming Ma 5356b252c4 Update MT config 2023-07-31 09:17:03 +07:00
gitnlp ea07735c7b
Update README.md 2023-07-27 12:33:20 +07:00
gitnlp 63a4f2df2e
Update README.md 2023-07-27 12:32:37 +07:00
gitnlp 3573227c88
Update README.md 2023-07-26 18:40:30 +07:00
gitnlp f58c8247be
Update README.md 2023-07-26 18:39:23 +07:00
gitnlp 774003903e
Update README.md 2023-07-26 18:38:49 +07:00
Li Dong bf65397b26 RetNet 2023-07-24 14:30:13 +07:00
Shuming Ma 89d8c6de9e Fix encdec config issue 2023-07-23 23:07:21 +07:00
Shuming Ma 2b101355d7
Merge pull request #33 from XingxingZhang/lm_sampling
support lm prefix computation in one go
2023-06-04 02:01:54 +07:00
Shuming Ma 859c4c6bcc
Merge pull request #31 from klae01/feature/pytest
add basic test
2023-06-04 02:01:29 +07:00
Shuming Ma 0cef83d675
Merge pull request #27 from JonathanRayner/patch-1
Bump timm version to latest
2023-06-04 02:01:04 +07:00
Xingxing Zhang c766630327 support lm prefix computation in one go 2023-06-03 15:37:47 +07:00
klae01 d675712846 add basic test 2023-05-26 13:30:00 +07:00
Shuming Ma b59b6f87b9
Merge pull request #30 from njb-ms/main
make pgs global
2023-04-26 12:12:55 +07:00
johan bjorck a4d830b87d make num experts optional arg 2023-04-24 17:29:39 +07:00
johan bjorck 886c8ab408 make pgs global 2023-04-22 00:32:05 +07:00
Jonathan Rayner 691e843ed5
Bump timm version to latest 2023-04-11 11:02:21 +07:00
Shuming Ma 4ae3b248ee
Merge pull request #21 from microsoft/0.2.0
v0.2.0
2023-03-15 15:21:52 +07:00
Shuming Ma 73b766812a v0.2.0 2023-03-15 00:20:00 +07:00
Li Dong 36c0e55004
Update README.md 2023-03-13 21:40:20 +07:00
Wenhui Wang 599df73687 b3 incremental decoding 2023-03-09 12:02:36 +07:00
shumingma 891f84f302 Fix MoE sample size 2023-03-08 01:19:36 +07:00
shumingma 0a07df1e5b Update Bert MoE 2023-03-07 21:21:48 +07:00
shumingma c397ebb013 Fix Bert MoE 2023-03-07 21:11:05 +07:00
shumingma 670113e446 Update MoE criterions 2023-03-07 20:53:41 +07:00
shumingma 8d8b80a731 Merge branch 'main' of https://github.com/microsoft/torchscale into main 2023-03-05 19:24:31 +07:00
shumingma a788e67ef2 Fix Bert dense 2023-03-05 19:24:14 +07:00
Shuming Ma a491af1113
Merge pull request #20 from buaahsh/main
fx BERT + moe
2023-03-05 23:31:45 +07:00
Shaohan Huang 5b0be94ab8
add --pad-to-max-length in bert+moe example 2023-03-05 19:39:04 +07:00
Shaohan Huang 95aea9c1b4
set numpy version 2023-03-05 19:36:07 +07:00