- Checkpointed pretty much the entire model - enabling recurrent inputs
- Added two new models for test - adding depth (again) and removing SPSR (in lieu of the new losses)
Some lessons learned:
- Biases are fairly important as a relief valve. They dont need to be everywhere, but
most computationally heavy branches should have a bias.
- GroupNorm in SPSR is not a great idea. Since image gradients are represented
in this model, normal means and standard deviations are not applicable. (imggrad
has a high representation of 0).
- Don't fuck with the mainline of any generative model. As much as possible, all
additions should be done through residual connections. Never pollute the mainline
with reference data, do that in branches. It basically leaves the mode untrainable.
SPSR_model really isn't that different from SRGAN_model. Rather than continuing to re-implement
everything I've done in SRGAN_model, port the new stuff from SPSR over.
This really demonstrates the need to refactor SRGAN_model a bit to make it cleaner. It is quite the
beast these days..
This is done by pre-training a feature net that predicts the features
of HR images from LR images. Then use the original feature network
and this new one in tandem to work only on LR/Gen images.
The logic is that the discriminator may be incapable of providing a truly
targeted loss for all image regions since it has to be too generic
(basically the same argument for the switched generator). So add some
switches in! See how it works!
- Swap multiple blocks in the image instead of just one. The discriminator was clearly
learning that most blocks have one region that needs to be fixed.
- Relax block size constraints. This was in place to gaurantee that the discriminator
signal was clean. Instead, just downsample the "loss image" with bilinear interpolation.
The result is noisier, but this is actually probably healthy for the discriminator.
Big update to SRG2 architecture to pull in a lot of things that have been learned:
- Use group norm instead of batch norm
- Initialize the weights on the transformations low like is done in RRDB rather than using the scalar. Models live or die by their early stages, and this ones early stage is pretty weak
- Transform multiplexer to use u-net like architecture.
- Just use one set of configuration variables instead of a list - flat networks performed fine in this regard.
- Removed a bunch of unnecessary image loggers. These were just consuming space and never being viewed
- Got rid of support of artificial var_ref support. The new pixdisc is what i wanted to implement then - it's much better.
- Add pixgan GAN mechanism. This is purpose-built for the pixdisc. It is intended to promote a healthy discriminator
- Megabatchfactor was applied twice on metrics, fixed that
Adds pix_gan (untested) which swaps a portion of the fake and real image with each other, then expects the discriminator
to properly discriminate the swapped regions.
doResizeLoss has a 50% chance to resize the LQ image to 50% size,
then back to original size. This is useful to training a generator to
recover these lost pixel values while also being able to do
repairs on higher resolution images during training.
- Makes skip connections between the generator and discriminator more
extensible by adding additional configuration options for them and supporting
1 and 0 skips.
- Places the temp/ directory with sample images from the training process appear
in the training directory instead of the codes/ directory.
This is a simpler resnet-based generator which performs mutations
on an input interspersed with interpolate-upsampling. It is a two
part generator:
1) A component that "fixes" LQ images with a long string of resnet
blocks. This component is intended to remove compression artifacts
and other noise from a LQ image.
2) A component that can double the image size. The idea is that this
component be trained so that it can work at most reasonable
resolutions, such that it can be repeatedly applied to itself to
perform multiple upsamples.
The motivation here is to simplify what is being done inside of RRDB.
I don't believe the complexity inside of that network is justified.
Add RRDBNetXL, which performs processing at multiple image sizes.
Add DiscResnet_passthrough, which allows passthrough of image at different sizes for discrimination.
Adjust the rest of the repo to allow generators that return more than just a single image.
This bad boy is for a workflow where you train a model on disjoint image sets to
downsample a "good" set of images like a "bad" set of images looks. You then
use that downsampler to generate a training set of paired images for supersampling.