diff --git a/README.md b/README.md
index 9ddda06..d3f5b3e 100644
--- a/README.md
+++ b/README.md
@@ -67,17 +67,21 @@ We also support the `Decoder` architecture and the `EncoderDecoder` architecture
 ## Key Features
 
 - [DeepNorm to improve the training stability of Post-LayerNorm Transformers](https://arxiv.org/abs/2203.00555)
-  * enabled by setting *deepnorm=True* in the `Config` class.
+  * enabled by setting *deepnorm=True* in the `Config` class. 
+  * It adjusts both the residual connection and the initialization method according to the model architecture (i.e., encoder, decoder, or encoder-decoder).
 
 - [SubLN for the model generality and the training stability](https://arxiv.org/abs/2210.06423)
-  * enabled by *subln=True*. This is enabled by default.
+  * enabled by *subln=True*. This is enabled by default. 
+  * It introduces another LayerNorm to each sublayer and adjusts the initialization according to the model architecture.
   * Note that SubLN and DeepNorm cannot be used in one single model.
 
 - [X-MoE: efficient and finetunable sparse MoE modeling](https://arxiv.org/abs/2204.09179)
-  * enabled by *use_xmoe=True*.
+  * enabled by *use_xmoe=True*. 
+  * It replaces every *'moe_freq'* `FeedForwardNetwork` layers with the X-MoE layers.
 
 - [Multiway architecture for multimodality](https://arxiv.org/abs/2208.10442)
   * enabled by *multiway=True*.
+  * It provides a pool of Transformer's parameters used for different modalities.
 
 - [Relative position bias](https://arxiv.org/abs/1910.10683)
   * enabled by adjusting *rel_pos_buckets* and *max_rel_pos*.