From 51abba7c8be0b6f37bfac1df820535f2320d21f7 Mon Sep 17 00:00:00 2001 From: gitnlp <36983436+gitnlp@users.noreply.github.com> Date: Thu, 24 Nov 2022 09:29:34 +0800 Subject: [PATCH 1/2] Update README.md --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 9e5c774..1d2a6fb 100644 --- a/README.md +++ b/README.md @@ -5,8 +5,8 @@ MIT License

-TorchScale is a PyTorch library that allows researchers and developeres to scale up Transformers efficiently and effectively. -It has the implemetention of fundamental research to improve modeling generality and capability, as well as training stability and efficiency of scaling Transformers. +TorchScale is a PyTorch library that allows researchers and developers to scale up Transformers efficiently and effectively. +It has the implementation of fundamental research to improve modeling generality and capability, as well as training stability and efficiency of scaling Transformers. - Stability - [**DeepNet**](https://arxiv.org/abs/2203.00555): scaling Transformers to 1,000 Layers and beyond - Generality - [**Foundation Transformers (Magneto)**](https://arxiv.org/abs/2210.06423) @@ -192,4 +192,4 @@ This project may contain trademarks or logos for projects, products, or services trademarks or logos is subject to and must follow [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. -Any use of third-party trademarks or logos are subject to those third-party's policies. \ No newline at end of file +Any use of third-party trademarks or logos are subject to those third-party's policies. From 660a2914029e2224669ad30ae0461a7b72b25b4c Mon Sep 17 00:00:00 2001 From: Li Dong Date: Thu, 24 Nov 2022 11:40:38 +0800 Subject: [PATCH 2/2] Update README.md xmoe bibtex --- README.md | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 1d2a6fb..f4bd087 100644 --- a/README.md +++ b/README.md @@ -154,21 +154,12 @@ If you find this repository useful, please consider citing our work: ``` ``` -@article{xmoe, - author = {Zewen Chi and - Li Dong and - Shaohan Huang and - Damai Dai and - Shuming Ma and - Barun Patra and - Saksham Singhal and - Payal Bajaj and - Xia Song and - Furu Wei}, - title = {On the Representation Collapse of Sparse Mixture of Experts}, - journal = {CoRR}, - volume = {abs/2204.09179}, - year = {2022} +@inproceedings{xmoe, + title={On the Representation Collapse of Sparse Mixture of Experts}, + author={Zewen Chi and Li Dong and Shaohan Huang and Damai Dai and Shuming Ma and Barun Patra and Saksham Singhal and Payal Bajaj and Xia Song and Xian-Ling Mao and Heyan Huang and Furu Wei}, + booktitle={Advances in Neural Information Processing Systems}, + year={2022}, + url={https://openreview.net/forum?id=mWaYC6CZf5} } ```