diff --git a/datasets/DATASETS.md b/datasets/DATASETS.md
new file mode 100644
index 00000000..9d37f611
--- /dev/null
+++ b/datasets/DATASETS.md
@@ -0,0 +1,255 @@
+There are three kinds of datasets: training dataset, validation dataset, and testing dataset. Usually, we do not explicitly distinguish between the validation and testing datasets in image/video restoration. So we use the validation/testing dataset in our description.
+We recommend to use [LMDB](https://lmdb.readthedocs.io/en/release/) (Lightning Memory-Mapped Database) formats for the training datasets, and directly read images (using image folder) during validation/testing. So there is no need to prepare LMDB files for evaluation/testing datasets.
+
+---
+We organize the training datasets in LMDB format for **faster training IO speed**. If you do not want to use LMDB, you can also use the **image folder**.
+Besides the standard LMDB folder, we add an extra `meta_info.pkl` file to record the **meta information** of the dataset, such as the dataset name, keys and resolution of each image in the dataset.
+
+Take the DIV2K dataset in LMDB for example, the folder structure and meta information are as follows:
+#### folder structure
+```
+- DIV2K800_sub.lmdb
+|--- data.mdb
+|--- lock.mdb
+|--- meta_info.pkl
+```
+#### meta information in `meta_info.pkl`
+`meta_info.pkl` is a python-pickled dict.
+
+| Key | Value |
+|:----------:|:---------------------------------------------------------:|
+| name | `DIV2K800_sub_GT` |
+| keys | [ `0001_s001`, `0001_s002`, ..., `0800_s040` ] |
+| resolution | [ `3_480_480` ] |
+
+If all the images in the LMDB file have the same resolution, only one copy of `resolution` is stored. Otherwise, each key has its corresponding `resolution`.
+
+----
+
+## Table of Contents
+1. [Prepare DIV2K](#prepare-div2k)
+1. [Common Image SR Datasets](#common-image-sr-datasets)
+1. [Prepare Vimeo90K](#prepare-vimeo90k)
+1. [Prepare REDS](#prepare-reds)
+
+The following shows how to prepare the datasets in detail.
+It is recommended to symlink the dataset root to $MMSR/datasets. If your folder structure is different, you may need to change the corresponding paths in config files.
+
+## Prepare DIV2K
+[DIV2K](https://data.vision.ee.ethz.ch/cvl/DIV2K/) is a widely-used dataset in image super-resolution. In many research works, a MATLAB bicubic downsampling kernel is assumed. It may not be practical because the MATLAB bicubic downsampling kernel is not a good approximation for the implicit degradation kernels in real-world scenarios. And there is another topic named **blind restoration** that deals with this gap.
+
+We provide a demo script for DIV2K X4 datasets preparation.
+```
+cd codes/data_scripts
+bash prepare_DIV2K_x4_dataset.sh
+```
+The specific steps are as follows:
+
+**Step 1**: Download the GT images and corresponding LR images from the [official DIV2K website](https://data.vision.ee.ethz.ch/cvl/DIV2K/).
+Here are shortcuts for the download links:
+
+| Name | links (training) | links (validation)|
+|:----------:|:----------:|:----------:|
+|Ground-Truth|[DIV2K_train_HR](http://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_train_HR.zip)|[DIV2K_valid_HR](http://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_valid_HR.zip)|
+|LRx2 (MATLAB bicubic)|[DIV2K_train_LR_bicubic_X2](http://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_train_LR_bicubic_X2.zip)|[DIV2K_valid_LR_bicubic_X2](http://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_valid_LR_bicubic_X2.zip)|
+|LRx3 (MATLAB bicubic)|[DIV2K_train_LR_bicubic_X3](http://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_train_LR_bicubic_X3.zip)|[DIV2K_valid_LR_bicubic_X3](http://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_valid_LR_bicubic_X3.zip)|
+|LRx4 (MATLAB bicubic)|[DIV2K_train_LR_bicubic_X4](http://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_train_LR_bicubic_X4.zip)|[DIV2K_valid_LR_bicubic_X4](http://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_valid_LR_bicubic_X4.zip)|
+|LRx8 (MATLAB bicubic)|[DIV2K_train_LR_bicubic_X8](http://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_train_LR_x8.zip)|[DIV2K_valid_LR_bicubic_X8](http://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_valid_LR_x8.zip)|
+
+**Step 2**: Rename the downloaded LR images to have the same name as those of GT.
Run the script `data_scripts/rename.py`. Remember to modify the folder path.
+
+**Step 3 (optional)**: Generate low-resolution counterparts.
If you have downloaded the LR datasets, skip this step. Otherwise, you can use the script `data_scripts/generate_mod_LR_bic.m` or `data_scripts/generate_mod_LR_bic.py` to generate LR images. Make sure the LR and GT pairs have the same name.
+
+**Step 4**: Crop to sub-images.
DIV2K has 2K resolution (e.g., 2048 × 1080) images but the training patches are usually very small (e.g., 128x128). So there is a waste if reading the whole image but only using a very small part of it. In order to accelerate the IO speed during training, we crop the 2K resolution images to sub-images (here, we crop to 480x480 sub-images). You can skip this step if your have a high IO speed.
+Note that the size of sub-images is different from the training patch size (`GT_size`) defined in the config file. Specifically, the sub-images with 480x480 are stored in the LMDB files. The dataloader will further randomly crop the sub-images to `GT_size x GT_size` patches for training.
+Use the script `data_scripts/extract_subimages.py` with `mode = 'pair'`. Remember to modify the following configurations if you have different settings:
+```
+GT_folder = '../../datasets/DIV2K/DIV2K800'
+LR_folder = '../../datasets/DIV2K/DIV2K800_bicLRx4'
+save_GT_folder = '../../datasets/DIV2K/DIV2K800_sub'
+save_LR_folder = '../../datasets/DIV2K/DIV2K800_sub_bicLRx4'
+scale_ratio = 4
+```
+**Step 5**: Create LMDB files.
You need to run the script `data_scripts/create_lmdb.py` separately for GT and LR images.
+
+**Step 6**: Test the dataloader with the script `data_scripts/test_dataloader.py`.
+
+This procedure is also applied to other datasets, such as 291 images, or your custom datasets.
+```
+@InProceedings{Agustsson_2017_CVPR_Workshops,
+ author = {Agustsson, Eirikur and Timofte, Radu},
+ title = {NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study},
+ booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
+ month = {July},
+ year = {2017}
+}
+```
+## Common Image SR Datasets
+We provide a list of common image super-resolution datasets. You can download the images from the official website or Google Drive or Baidu Drive.
+
+
Name | +Datasets | +Short Description | +Google Drive | +Baidu Drive | +
---|---|---|---|---|
Classical SR Training | +T91 | +91 images for training | +Google Drive | +Baidu Drive | +
BSDS200 | +A subset (train) of BSD500 for training | +|||
General100 | +100 images for training | +|||
Classical SR Testing | +Set5 | +Set5 test dataset | +||
Set14 | +Set14 test dataset | +|||
BSDS100 | +A subset (test) of BSD500 for testing | +|||
urban100 | +100 building images for testing (regular structures) | +|||
manga109 | +109 images of Japanese manga for testing | +|||
historical | +10 gray LR images without the ground-truth | +|||
2K Resolution | +DIV2K | +proposed in NTIRE17 (800 train and 100 validation) | +Google Drive | +Baidu Drive | +
Flickr2K | +2650 2K images from Flickr for training | +|||
DF2K | +A merged training dataset of DIV2K and Flickr2K | +|||
OST (Outdoor Scenes) | +OST Training | +7 categories images with rich textures | +Google Drive | +Baidu Drive | +
OST300 | +300 test images of outdoor scences | +|||
PIRM | +PIRM | +PIRM self-val, val, test datasets | +Google Drive | +Baidu Drive | +