Update features section

2022-11-24 20:42:10 -08:00 · 2022-11-24 20:42:10 -08:00 · 4714557e89
commit 4714557e89
parent 5cbb7980a9
1 changed files with 7 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -67,17 +67,21 @@ We also support the `Decoder` architecture and the `EncoderDecoder` architecture
 ## Key Features

 - [DeepNorm to improve the training stability of Post-LayerNorm Transformers](https://arxiv.org/abs/2203.00555)
-  * enabled by setting *deepnorm=True* in the `Config` class.
+  * enabled by setting *deepnorm=True* in the `Config` class. 
+  * It adjusts both the residual connection and the initialization method according to the model architecture (i.e., encoder, decoder, or encoder-decoder).

 - [SubLN for the model generality and the training stability](https://arxiv.org/abs/2210.06423)
-  * enabled by *subln=True*. This is enabled by default.
+  * enabled by *subln=True*. This is enabled by default. 
+  * It introduces another LayerNorm to each sublayer and adjusts the initialization according to the model architecture.
  * Note that SubLN and DeepNorm cannot be used in one single model.

 - [X-MoE: efficient and finetunable sparse MoE modeling](https://arxiv.org/abs/2204.09179)
-  * enabled by *use_xmoe=True*.
+  * enabled by *use_xmoe=True*. 
+  * It replaces every *'moe_freq'* `FeedForwardNetwork` layers with the X-MoE layers.

 - [Multiway architecture for multimodality](https://arxiv.org/abs/2208.10442)
  * enabled by *multiway=True*.
+  * It provides a pool of Transformer's parameters used for different modalities.

 - [Relative position bias](https://arxiv.org/abs/1910.10683)
  * enabled by adjusting *rel_pos_buckets* and *max_rel_pos*.