diff --git a/README.md b/README.md index fe66d75..e757cbb 100644 --- a/README.md +++ b/README.md @@ -155,7 +155,7 @@ We plan to provide more examples regarding different tasks (e.g. vision pretrain ### Stability Evaluation

- +

The training curve is smooth by using TorchScale, while the baseline Transformer cannot converge. @@ -163,7 +163,7 @@ The training curve is smooth by using TorchScale, while the baseline Transformer ### Scaling-up Experiments

- +

TorchScale supports arbitrary depths and widths, successfully scaling-up the models without pain.