gitnlp
|
f58c8247be
|
Update README.md
|
2023-07-26 18:39:23 +08:00 |
|
gitnlp
|
774003903e
|
Update README.md
|
2023-07-26 18:38:49 +08:00 |
|
Li Dong
|
bf65397b26
|
RetNet
|
2023-07-24 14:30:13 +08:00 |
|
Shuming Ma
|
89d8c6de9e
|
Fix encdec config issue
|
2023-07-23 23:07:21 -07:00 |
|
Shuming Ma
|
2b101355d7
|
Merge pull request #33 from XingxingZhang/lm_sampling
support lm prefix computation in one go
|
2023-06-04 02:01:54 +08:00 |
|
Shuming Ma
|
859c4c6bcc
|
Merge pull request #31 from klae01/feature/pytest
add basic test
|
2023-06-04 02:01:29 +08:00 |
|
Shuming Ma
|
0cef83d675
|
Merge pull request #27 from JonathanRayner/patch-1
Bump timm version to latest
|
2023-06-04 02:01:04 +08:00 |
|
Xingxing Zhang
|
c766630327
|
support lm prefix computation in one go
|
2023-06-03 15:37:47 +00:00 |
|
klae01
|
d675712846
|
add basic test
|
2023-05-26 13:30:00 +09:00 |
|
Shuming Ma
|
b59b6f87b9
|
Merge pull request #30 from njb-ms/main
make pgs global
|
2023-04-26 12:12:55 +08:00 |
|
johan bjorck
|
a4d830b87d
|
make num experts optional arg
|
2023-04-24 17:29:39 +00:00 |
|
johan bjorck
|
886c8ab408
|
make pgs global
|
2023-04-22 00:32:05 +00:00 |
|
Jonathan Rayner
|
691e843ed5
|
Bump timm version to latest
|
2023-04-11 11:02:21 +01:00 |
|
Shuming Ma
|
4ae3b248ee
|
Merge pull request #21 from microsoft/0.2.0
v0.2.0
|
2023-03-15 15:21:52 +08:00 |
|
Shuming Ma
|
73b766812a
|
v0.2.0
|
2023-03-15 00:20:00 -07:00 |
|
Li Dong
|
36c0e55004
|
Update README.md
|
2023-03-13 21:40:20 +08:00 |
|
Wenhui Wang
|
599df73687
|
b3 incremental decoding
|
2023-03-09 12:02:36 +08:00 |
|
shumingma
|
891f84f302
|
Fix MoE sample size
|
2023-03-08 01:19:36 -08:00 |
|
shumingma
|
0a07df1e5b
|
Update Bert MoE
|
2023-03-07 21:21:48 -08:00 |
|
shumingma
|
c397ebb013
|
Fix Bert MoE
|
2023-03-07 21:11:05 -08:00 |
|
shumingma
|
670113e446
|
Update MoE criterions
|
2023-03-07 20:53:41 -08:00 |
|
shumingma
|
8d8b80a731
|
Merge branch 'main' of https://github.com/microsoft/torchscale into main
|
2023-03-05 19:24:31 -08:00 |
|
shumingma
|
a788e67ef2
|
Fix Bert dense
|
2023-03-05 19:24:14 -08:00 |
|
Shuming Ma
|
a491af1113
|
Merge pull request #20 from buaahsh/main
fx BERT + moe
|
2023-03-05 23:31:45 +08:00 |
|
Shaohan Huang
|
5b0be94ab8
|
add --pad-to-max-length in bert+moe example
|
2023-03-05 19:39:04 +08:00 |
|
Shaohan Huang
|
95aea9c1b4
|
set numpy version
|
2023-03-05 19:36:07 +08:00 |
|
buaahsh
|
bc140c65bb
|
fx bert moe
|
2023-03-05 07:43:58 +00:00 |
|
shumingma
|
32cb51ae38
|
v0.1.2
|
2023-03-04 01:11:34 -08:00 |
|
Shuming Ma
|
27d818674f
|
Merge pull request #19 from buaahsh/patch-1
Update README.md
|
2023-03-04 10:55:52 +08:00 |
|
Shaohan Huang
|
cbdbc1dfc8
|
Update README.md
|
2023-03-04 07:37:01 +08:00 |
|
shumingma
|
20c1e6c611
|
Bert MoE
|
2023-03-02 02:54:19 -08:00 |
|
shumingma
|
0cb9695501
|
Remove inplace
|
2023-01-18 22:44:26 -08:00 |
|
shumingma
|
9f105b591d
|
Support Pytorch LayerNorm
|
2023-01-16 20:17:28 -08:00 |
|
Shuming Ma
|
82f140a6c4
|
Merge pull request #12 from microsoft/bsz
Batch size first
|
2023-01-16 20:07:52 +08:00 |
|
shumingma
|
1a5d2c26fe
|
Batch size first
|
2023-01-05 01:19:51 -08:00 |
|
Shuming Ma
|
776b070d68
|
Merge pull request #11 from microsoft/xpos
Adding the official implementation of Xpos (https://arxiv.org/abs/2212.10554)
|
2023-01-05 11:08:13 +08:00 |
|
shumingma
|
9d968a24ed
|
Update XPos
|
2023-01-03 22:54:24 -08:00 |
|
shumingma
|
f9d98f4b68
|
Add XPOS
|
2022-12-29 20:48:43 -08:00 |
|
shumingma
|
aa36203042
|
Fix multiway checkpointing
|
2022-12-27 22:32:02 -08:00 |
|
gitnlp
|
22438a8525
|
Update README.md
|
2022-12-23 08:26:08 +08:00 |
|
Shuming Ma
|
21ed0056d7
|
Merge pull request #9 from MatthewChang/fix_output_projection_decoder
fix a bug that overrides the default constructed output_projection
|
2022-12-22 11:19:25 +08:00 |
|
Matthew Chang
|
adcd995595
|
fix a bug which overrides the default constructed output_projection when none is
passed in
|
2022-12-21 16:24:44 -06:00 |
|
shumingma
|
7e12b582e4
|
Support latest fairseq
|
2022-12-15 03:44:15 -08:00 |
|
shumingma
|
2518ea030c
|
Fix example fsdp
|
2022-12-08 04:20:27 -08:00 |
|
Shuming Ma
|
6d62bbbf67
|
Merge pull request #8 from buaahsh/main
don't need attn weight in decoder
|
2022-12-06 19:06:26 +08:00 |
|
buaahsh
|
2005ab1f26
|
don't need attn weight in decoder
|
2022-12-06 18:31:17 +08:00 |
|
shumingma
|
be167b3dda
|
Add an example for vocab
|
2022-12-01 20:40:09 -08:00 |
|
shumingma
|
7b29d32f03
|
Remove unused parameters
|
2022-11-29 21:36:03 -08:00 |
|
Shuming Ma
|
5adbe971cf
|
Merge pull request #5 from kashif/typo
fix typo
|
2022-11-29 18:00:13 +08:00 |
|
Kashif Rasul
|
e8be99f8f1
|
fix typo
|
2022-11-29 10:48:56 +01:00 |
|