Shuming Ma
|
881d03079d
|
Merge pull request #70 from sunyt32/retnet-official
fix fairseq example
|
2023-09-29 12:59:56 +08:00 |
|
sunyt32
|
50174a3078
|
fix fairseq example
|
2023-09-29 03:50:24 +00:00 |
|
Shuming Ma
|
ab1d9d677a
|
Merge pull request #69 from sunyt32/retnet-official
Update new RetNet settings
|
2023-09-29 10:07:48 +08:00 |
|
sunyt32
|
05a9628309
|
fix bug
|
2023-09-28 17:39:26 +00:00 |
|
sunyt32
|
59fc5f7d3d
|
fix bug
|
2023-09-28 17:05:53 +00:00 |
|
sunyt32
|
fd8234c2ac
|
rollback variant name
|
2023-09-28 16:44:51 +00:00 |
|
sunyt32
|
7f07609361
|
update README.md
|
2023-09-28 14:26:56 +00:00 |
|
sunyt32
|
5c89ffbeea
|
modify rms norm and value dim in retention
|
2023-09-28 14:24:37 +00:00 |
|
Li Dong
|
d1fefe9c22
|
rollback LN epsilon in retention
rollback 2c29de0fb3
|
2023-09-27 20:40:36 +08:00 |
|
Shuming Ma
|
258eda3308
|
Update vocab links
|
2023-08-11 16:46:37 +08:00 |
|
Shuming Ma
|
70e047a53b
|
Update README.md
|
2023-08-11 11:26:19 +08:00 |
|
Li Dong
|
8b07f19ba0
|
Update README.md
|
2023-08-10 13:15:42 +08:00 |
|
Shuming Ma
|
e2db7ae123
|
Merge pull request #51 from sunyt32/retnet-official
fix chunkwise inconsistency bug
|
2023-08-04 13:51:53 +08:00 |
|
sunyt32
|
0b1f113985
|
fix chunkwise inconsistency bug
|
2023-08-04 05:48:58 +00:00 |
|
Li Dong
|
0faee72d6f
|
Merge pull request #50 from wangmengzhi/main-2
Adding sqrt in the recurrent_forward of retnet to make it consistent with parallel_forward
|
2023-08-04 09:02:38 +08:00 |
|
wangmengzhi
|
7f0bf80a7e
|
Adding sqrt in the recurrent_forward of retnet to make it consistent with parallel_forward
Adding sqrt in the recurrent_forward of retnet can avoid numerical underflow thus improving consistency and performance. https://github.com/microsoft/torchscale/issues/47
|
2023-08-04 08:18:10 +08:00 |
|
Shuming Ma
|
7d231743f4
|
Merge pull request #46 from sunyt32/retnet-official
Update epsilon in retention
|
2023-08-02 14:07:44 +08:00 |
|
sunyt32
|
2c29de0fb3
|
Update epsilon in retention
|
2023-08-02 05:38:30 +00:00 |
|
Shuming Ma
|
5356b252c4
|
Update MT config
|
2023-07-31 09:17:03 -07:00 |
|
gitnlp
|
ea07735c7b
|
Update README.md
|
2023-07-27 12:33:20 +08:00 |
|
gitnlp
|
63a4f2df2e
|
Update README.md
|
2023-07-27 12:32:37 +08:00 |
|
gitnlp
|
3573227c88
|
Update README.md
|
2023-07-26 18:40:30 +08:00 |
|
gitnlp
|
f58c8247be
|
Update README.md
|
2023-07-26 18:39:23 +08:00 |
|
gitnlp
|
774003903e
|
Update README.md
|
2023-07-26 18:38:49 +08:00 |
|
Li Dong
|
bf65397b26
|
RetNet
|
2023-07-24 14:30:13 +08:00 |
|
Shuming Ma
|
89d8c6de9e
|
Fix encdec config issue
|
2023-07-23 23:07:21 -07:00 |
|
Shuming Ma
|
2b101355d7
|
Merge pull request #33 from XingxingZhang/lm_sampling
support lm prefix computation in one go
|
2023-06-04 02:01:54 +08:00 |
|
Shuming Ma
|
859c4c6bcc
|
Merge pull request #31 from klae01/feature/pytest
add basic test
|
2023-06-04 02:01:29 +08:00 |
|
Shuming Ma
|
0cef83d675
|
Merge pull request #27 from JonathanRayner/patch-1
Bump timm version to latest
|
2023-06-04 02:01:04 +08:00 |
|
Xingxing Zhang
|
c766630327
|
support lm prefix computation in one go
|
2023-06-03 15:37:47 +00:00 |
|
klae01
|
d675712846
|
add basic test
|
2023-05-26 13:30:00 +09:00 |
|
Shuming Ma
|
b59b6f87b9
|
Merge pull request #30 from njb-ms/main
make pgs global
|
2023-04-26 12:12:55 +08:00 |
|
johan bjorck
|
a4d830b87d
|
make num experts optional arg
|
2023-04-24 17:29:39 +00:00 |
|
johan bjorck
|
886c8ab408
|
make pgs global
|
2023-04-22 00:32:05 +00:00 |
|
Jonathan Rayner
|
691e843ed5
|
Bump timm version to latest
|
2023-04-11 11:02:21 +01:00 |
|
Shuming Ma
|
4ae3b248ee
|
Merge pull request #21 from microsoft/0.2.0
v0.2.0
|
2023-03-15 15:21:52 +08:00 |
|
Shuming Ma
|
73b766812a
|
v0.2.0
|
2023-03-15 00:20:00 -07:00 |
|
Li Dong
|
36c0e55004
|
Update README.md
|
2023-03-13 21:40:20 +08:00 |
|
Wenhui Wang
|
599df73687
|
b3 incremental decoding
|
2023-03-09 12:02:36 +08:00 |
|
shumingma
|
891f84f302
|
Fix MoE sample size
|
2023-03-08 01:19:36 -08:00 |
|
shumingma
|
0a07df1e5b
|
Update Bert MoE
|
2023-03-07 21:21:48 -08:00 |
|
shumingma
|
c397ebb013
|
Fix Bert MoE
|
2023-03-07 21:11:05 -08:00 |
|
shumingma
|
670113e446
|
Update MoE criterions
|
2023-03-07 20:53:41 -08:00 |
|
shumingma
|
8d8b80a731
|
Merge branch 'main' of https://github.com/microsoft/torchscale into main
|
2023-03-05 19:24:31 -08:00 |
|
shumingma
|
a788e67ef2
|
Fix Bert dense
|
2023-03-05 19:24:14 -08:00 |
|
Shuming Ma
|
a491af1113
|
Merge pull request #20 from buaahsh/main
fx BERT + moe
|
2023-03-05 23:31:45 +08:00 |
|
Shaohan Huang
|
5b0be94ab8
|
add --pad-to-max-length in bert+moe example
|
2023-03-05 19:39:04 +08:00 |
|
Shaohan Huang
|
95aea9c1b4
|
set numpy version
|
2023-03-05 19:36:07 +08:00 |
|
buaahsh
|
bc140c65bb
|
fx bert moe
|
2023-03-05 07:43:58 +00:00 |
|
shumingma
|
32cb51ae38
|
v0.1.2
|
2023-03-04 01:11:34 -08:00 |
|