(QoL improvements for) DLAS - A configuration-driven trainer for generative models

Go to file

James Betker 5189b11dac Add combined dataset for training across multiple datasets		2020-09-11 08:44:06 -06:00
.idea	Update requirements	2020-08-03 16:57:56 -06:00
codes	Add combined dataset for training across multiple datasets	2020-09-11 08:44:06 -06:00
.flake8	mmsr	2019-08-23 21:42:47 +08:00
.gitignore	Add GPU mem tracing module	2020-06-14 11:02:54 -06:00
.style.yapf	mmsr	2019-08-23 21:42:47 +08:00
LICENSE	Initial commit	2019-08-23 21:04:30 +08:00
README.md	README update	2020-05-15 14:03:44 -06:00
sandbox.py	Misc	2020-08-12 08:46:15 -06:00

README.md

MMSR

MMSR is an open source image and video super-resolution toolbox based on PyTorch. It is a part of the open-mmlab project developed by Multimedia Laboratory, CUHK. MMSR is based on our previous projects: BasicSR, ESRGAN, and EDVR.

My (@neonbjb) Modifications

After tinkering with MMSR, I really began to like a lot about how the codebase was laid out and the general practices being used. I have since worked to extend it to more general use cases, as well as implement several GAN training features. The additions are too many to list, but I'll give it a shot:

FP16 support.
Alternative dataset support (notably a disjoint dataset for training a generator to style-transfer between imagesets).
Addition of several new architectures, including a ResNet-based discrimator, a downsampling generator (for training image corruptors), and a fix-and-upsample generator.
Fixup resblock support which resists the exploding gradients which necessitate batch norms. Most of the fixup architectures can be trained with BN turned off, though they take longer to train and are occasionally divergent in FP16 mode.
Batch testing for performing generator augmentation on large sets of images.
Model swapout during training - randomly select a past D or G and substitute it in for a short time to increase variance on the respective model.
Adding random noise on both the inputs of the discriminator and generator. The discriminator variety has a decay.
Decaying the influence of the feature loss.
"Corruption" generators which can alter an input before it is fed through the SRGAN pipeline.
Outputting "state" images which are very useful in debugging what is actually going on in the pipeline.
Skip layers between the generator and discriminator.
Support for any number of image resolutions into the discriminators. The original MMSR only accepted 128x128 images.
"Megabatches" - gradient accumulation across multiple batches before performing an optimizer step.
Image cropping can be disabled. I prefer to do this in preprocessing.
Tensorboard logs for an experiment are cleared out when the experiment is restarted anew.
A LOT more data is logged to tensorboard.

Note that this codebase is far from clean. I've notably broken LMDB support in a couple of places. Likely everything other than SRGAN doesn't work too well anymore either. I will get around to documenting all this in the near future once the repo stabilizes a bit. For now, you're on your own!

Dependencies and Installation

Python 3 (Recommend to use Anaconda)
PyTorch >= 1.1
NVIDIA GPU + CUDA
Deformable Convolution. We use mmdetection's dcn implementation. Please first compile it.
```
cd ./codes/models/archs/dcn
python setup.py develop
```
Python packages: pip install -r requirements.txt

Dataset Preparation

We use datasets in LDMB format for faster IO speed. Please refer to DATASETS.md for more details.

Training and Testing

Please see wiki- Training and Testing for the basic usage, i.e., training and testing.

Model Zoo and Baselines

Results and pre-trained models are available in the wiki-Model Zoo.

Contributing

We appreciate all contributions. Please refer to mmdetection for contributing guideline.

Python code style
We adopt PEP8 as the preferred code style. We use flake8 as the linter and yapf as the formatter. Please upgrade to the latest yapf (>=0.27.0) and refer to the yapf configuration and flake8 configuration.

Before you create a PR, make sure that your code lints and is formatted by yapf.

License

This project is released under the Apache 2.0 license.