Commit Graph

  • 008f1b6d18 added compat flags because I guess the maintainer assumed no one was actually using the retnet and thinks they can change things willy nilly main mrq 2023-10-05 16:38:57 -0500
  • ce77afe916 added arg to change RelPos's base mrq 2023-09-19 19:18:49 -0500
  • 881d03079d
    Merge pull request #70 from sunyt32/retnet-official Shuming Ma 2023-09-29 12:59:56 +0800
  • 50174a3078 fix fairseq example sunyt32 2023-09-29 03:50:24 +0000
  • ab1d9d677a
    Merge pull request #69 from sunyt32/retnet-official Shuming Ma 2023-09-29 10:07:48 +0800
  • 05a9628309 fix bug sunyt32 2023-09-28 17:39:26 +0000
  • 59fc5f7d3d fix bug sunyt32 2023-09-28 17:05:53 +0000
  • fd8234c2ac rollback variant name sunyt32 2023-09-28 16:44:51 +0000
  • 7f07609361 update README.md sunyt32 2023-09-28 14:26:56 +0000
  • 5c89ffbeea modify rms norm and value dim in retention sunyt32 2023-09-28 14:24:37 +0000
  • d1fefe9c22
    rollback LN epsilon in retention Li Dong 2023-09-27 20:40:36 +0800
  • ac1c1d9ba6
    Merge 7bfdad13f8 into 258eda3308 Alexander Goryunov 2023-08-15 11:59:09 +0300
  • 5d16e572d5
    Merge dd69dcb5e9 into 258eda3308 usryokousha 2023-08-12 19:07:42 +0900
  • 258eda3308
    Update vocab links Shuming Ma 2023-08-11 16:46:37 +0800
  • 70e047a53b
    Update README.md Shuming Ma 2023-08-11 11:26:19 +0800
  • 8b07f19ba0
    Update README.md Li Dong 2023-08-10 13:15:42 +0800
  • e2db7ae123
    Merge pull request #51 from sunyt32/retnet-official Shuming Ma 2023-08-04 13:51:53 +0800
  • 0b1f113985 fix chunkwise inconsistency bug sunyt32 2023-08-04 05:48:58 +0000
  • 0faee72d6f
    Merge pull request #50 from wangmengzhi/main-2 Li Dong 2023-08-04 09:02:38 +0800
  • 7f0bf80a7e
    Adding sqrt in the recurrent_forward of retnet to make it consistent with parallel_forward wangmengzhi 2023-08-04 08:18:10 +0800
  • 7d231743f4
    Merge pull request #46 from sunyt32/retnet-official Shuming Ma 2023-08-02 14:07:44 +0800
  • 2c29de0fb3 Update epsilon in retention sunyt32 2023-08-02 05:38:30 +0000
  • 5356b252c4 Update MT config Shuming Ma 2023-07-31 09:17:03 -0700
  • ea07735c7b
    Update README.md gitnlp 2023-07-27 12:33:20 +0800
  • 63a4f2df2e
    Update README.md gitnlp 2023-07-27 12:32:37 +0800
  • 3573227c88
    Update README.md gitnlp 2023-07-26 18:40:30 +0800
  • f58c8247be
    Update README.md gitnlp 2023-07-26 18:39:23 +0800
  • 774003903e
    Update README.md gitnlp 2023-07-26 18:38:49 +0800
  • bf65397b26 RetNet Li Dong 2023-07-24 14:30:13 +0800
  • 89d8c6de9e Fix encdec config issue Shuming Ma 2023-07-23 23:07:21 -0700
  • ef91197a4f
    Merge 61110ca844 into 2b101355d7 Alexander Goryunov 2023-07-10 18:02:25 +0000
  • 61110ca844 formatting Alexander Goryunov 2023-07-10 20:57:42 +0300
  • 341ef458b4 type hints Alexander Goryunov 2023-07-10 20:48:32 +0300
  • a2063b7000 Rework configs to remove redundant code Alexander Goryunov 2023-07-10 20:48:20 +0300
  • 7bfdad13f8 Remove inheritance from object Alexander Goryunov 2023-07-10 19:29:01 +0300
  • 2b101355d7
    Merge pull request #33 from XingxingZhang/lm_sampling Shuming Ma 2023-06-04 02:01:54 +0800
  • 859c4c6bcc
    Merge pull request #31 from klae01/feature/pytest Shuming Ma 2023-06-04 02:01:29 +0800
  • 0cef83d675
    Merge pull request #27 from JonathanRayner/patch-1 Shuming Ma 2023-06-04 02:01:04 +0800
  • c766630327 support lm prefix computation in one go Xingxing Zhang 2023-06-03 15:37:47 +0000
  • d675712846 add basic test klae01 2023-05-26 13:29:12 +0900
  • dd69dcb5e9
    Merge pull request #1 from mranzinger/efficient usryokousha 2023-05-17 22:22:53 +0900
  • 29c6eadb83 Masks are now optional, and not created. Fixes required to support FlashAttention (e.g. no mask, fp/bf16) Mike Ranzinger 2023-05-09 19:21:25 +0000
  • b59b6f87b9
    Merge pull request #30 from njb-ms/main Shuming Ma 2023-04-26 12:12:55 +0800
  • a4d830b87d make num experts optional arg johan bjorck 2023-04-24 17:29:39 +0000
  • 62cedb9c8f
    Update multihead_attention.py Mike Ranzinger 2023-04-23 18:45:48 -0700
  • d4a62ccfb5
    Update multihead_attention.py Mike Ranzinger 2023-04-23 18:28:08 -0700
  • 412a1a3878
    Update multihead_attention.py Mike Ranzinger 2023-04-23 18:17:41 -0700
  • a5a94191a1
    Update multihead_attention.py Mike Ranzinger 2023-04-23 18:08:47 -0700
  • 886c8ab408 make pgs global johan bjorck 2023-04-22 00:32:05 +0000
  • 691e843ed5
    Bump timm version to latest Jonathan Rayner 2023-04-11 11:02:21 +0100
  • 18677d237d
    Merge 85e3534ef4 into 4ae3b248ee alveranuno 2023-04-08 01:25:17 +0300
  • 85e3534ef4 Bug bounty test - please ignore.... (mabcgj) alveranuno 2023-04-07 22:25:14 +0000
  • 37b64d41ce incorporated fast attention into attention Matthew Smith 2023-03-31 11:15:36 +0900
  • 4ae3b248ee
    Merge pull request #21 from microsoft/0.2.0 Shuming Ma 2023-03-15 15:21:52 +0800
  • 73b766812a v0.2.0 0.2.0 Shuming Ma 2023-03-15 00:20:00 -0700
  • 36c0e55004
    Update README.md Li Dong 2023-03-13 21:40:20 +0800
  • 599df73687 b3 incremental decoding Wenhui Wang 2023-03-09 12:02:36 +0800
  • 891f84f302 Fix MoE sample size shumingma 2023-03-08 01:19:36 -0800
  • 0a07df1e5b Update Bert MoE shumingma 2023-03-07 21:21:48 -0800
  • c397ebb013 Fix Bert MoE shumingma 2023-03-07 21:11:05 -0800
  • 670113e446 Update MoE criterions shumingma 2023-03-07 20:53:41 -0800
  • 8d8b80a731 Merge branch 'main' of https://github.com/microsoft/torchscale into main shumingma 2023-03-05 19:24:31 -0800
  • a788e67ef2 Fix Bert dense shumingma 2023-03-05 19:24:14 -0800
  • a491af1113
    Merge pull request #20 from buaahsh/main Shuming Ma 2023-03-05 23:31:45 +0800
  • 5b0be94ab8
    add --pad-to-max-length in bert+moe example Shaohan Huang 2023-03-05 19:39:04 +0800
  • 95aea9c1b4
    set numpy version Shaohan Huang 2023-03-05 19:36:07 +0800
  • bc140c65bb fx bert moe buaahsh 2023-03-05 07:43:58 +0000
  • 32cb51ae38 v0.1.2 shumingma 2023-03-04 01:11:34 -0800
  • 27d818674f
    Merge pull request #19 from buaahsh/patch-1 Shuming Ma 2023-03-04 10:55:52 +0800
  • cbdbc1dfc8
    Update README.md Shaohan Huang 2023-03-04 07:37:01 +0800
  • 20c1e6c611 Bert MoE shumingma 2023-03-02 02:54:19 -0800
  • 0cb9695501 Remove inplace shumingma 2023-01-18 22:44:26 -0800
  • 9f105b591d Support Pytorch LayerNorm shumingma 2023-01-16 20:17:28 -0800
  • 82f140a6c4
    Merge pull request #12 from microsoft/bsz Shuming Ma 2023-01-16 20:07:52 +0800
  • 1a5d2c26fe Batch size first bsz shumingma 2023-01-05 01:19:51 -0800
  • 776b070d68
    Merge pull request #11 from microsoft/xpos Shuming Ma 2023-01-05 11:08:13 +0800
  • 9d968a24ed Update XPos xpos shumingma 2023-01-03 22:54:24 -0800
  • f9d98f4b68 Add XPOS shumingma 2022-12-29 20:48:43 -0800
  • aa36203042 Fix multiway checkpointing shumingma 2022-12-27 22:32:02 -0800
  • 22438a8525
    Update README.md gitnlp 2022-12-23 08:26:08 +0800
  • 21ed0056d7
    Merge pull request #9 from MatthewChang/fix_output_projection_decoder Shuming Ma 2022-12-22 11:19:25 +0800
  • adcd995595 fix a bug which overrides the default constructed output_projection when none is passed in Matthew Chang 2022-12-21 16:24:44 -0600
  • 7e12b582e4 Support latest fairseq shumingma 2022-12-15 03:44:15 -0800
  • 2518ea030c Fix example fsdp shumingma 2022-12-08 04:20:27 -0800
  • 6d62bbbf67
    Merge pull request #8 from buaahsh/main Shuming Ma 2022-12-06 19:06:26 +0800
  • 2005ab1f26 don't need attn weight in decoder buaahsh 2022-12-06 18:31:17 +0800
  • be167b3dda Add an example for vocab shumingma 2022-12-01 20:40:09 -0800
  • 7b29d32f03 Remove unused parameters shumingma 2022-11-29 21:36:03 -0800
  • 5adbe971cf
    Merge pull request #5 from kashif/typo Shuming Ma 2022-11-29 18:00:13 +0800
  • e8be99f8f1 fix typo Kashif Rasul 2022-11-29 10:48:56 +0100
  • 559b5fdf56
    Merge pull request #4 from kashif/kashif-patch-1 Shuming Ma 2022-11-29 12:21:53 +0800
  • c69aba2a73 fix call to activation_fn Kashif Rasul 2022-11-29 00:11:38 +0100
  • be14bc23a1 typo Kashif Rasul 2022-11-29 00:11:02 +0100
  • e7d5ec2ad7
    remove lambda Kashif Rasul 2022-11-29 00:02:26 +0100
  • c0ad46d7b8
    Update README.md gitnlp 2022-11-28 22:29:46 +0800
  • 800ea8d39f
    Update README.md gitnlp 2022-11-27 22:45:31 +0800
  • 8dd8055826
    Update README.md gitnlp 2022-11-27 22:38:02 +0800
  • 7eca1a531c Code reformatting shumingma 2022-11-26 09:01:02 -0800
  • 1354614d44 Update config file shumingma 2022-11-26 08:15:08 -0800
  • 994e4665a2 flake8 lint checks shumingma 2022-11-26 08:10:15 -0800