diff --git a/README.md b/README.md index 6c38163..fe66d75 100644 --- a/README.md +++ b/README.md @@ -69,6 +69,20 @@ We also support the `Decoder` architecture and the `EncoderDecoder` architecture >>> print(encdec) ``` +It takes only several lines of code to create a RetNet model: + +```python +# Creating a RetNet model +>>> import torch +>>> from torchscale.architecture.config import RetNetConfig +>>> from torchscale.architecture.retnet import RetNetDecoder + +>>> config = RetNetConfig(vocab_size=64000) +>>> retnet = RetNetDecoder(config) + +>>> print(retnet) +``` + ## Key Features - [DeepNorm to improve the training stability of Post-LayerNorm Transformers](https://arxiv.org/abs/2203.00555) @@ -97,6 +111,9 @@ We also support the `Decoder` architecture and the `EncoderDecoder` architecture - [SparseClip: improving the gradient clipping for sparse MoE models](https://arxiv.org/abs/2211.13184) * we provide a [sample code](examples/fairseq/utils/sparse_clip.py) that can be easily adapted to the FairSeq (or other) repo. +- [Retentive Network: A Successor to Transformer for Large Language Models](https://arxiv.org/abs/2307.08621) + * created by `config = RetNetConfig(vocab_size=64000)` and `retnet = RetNetDecoder(config)`. + Most of the features above can be used by simply passing the corresponding parameters to the config. For example: ```python @@ -111,7 +128,7 @@ Most of the features above can be used by simply passing the corresponding param ## Examples -We have the examples of how to use TorchScale in the following scenarios/tasks: +We have examples of how to use TorchScale in the following scenarios/tasks: - Language @@ -199,6 +216,16 @@ If you find this repository useful, please consider citing our work: } ``` +``` +@article{retnet, + author={Yutao Sun and Li Dong and Shaohan Huang and Shuming Ma and Yuqing Xia and Jilong Xue and Jianyong Wang and Furu Wei}, + title = {Retentive Network: A Successor to {Transformer} for Large Language Models}, + journal = {ArXiv}, + volume = {abs/2307.08621}, + year = {2023} +} +``` + ## Contributing This project welcomes contributions and suggestions. Most contributions require you to agree to a @@ -210,13 +237,11 @@ a CLA and decorate the PR appropriately (e.g., status check, comment). Simply fo provided by the bot. You will only need to do this once across all repos using our CLA. This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). -For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or +For more information, see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [Furu Wei](mailto:fuwei@microsoft.com) and [Shuming Ma](mailto:shumma@microsoft.com) with any additional questions or comments. ## Trademarks -This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft -trademarks or logos is subject to and must follow -[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). +This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. -Any use of third-party trademarks or logos are subject to those third-party's policies. +Any use of third-party trademarks or logos is subject to those third-party's policies.