The logic is that the discriminator may be incapable of providing a truly
targeted loss for all image regions since it has to be too generic
(basically the same argument for the switched generator). So add some
switches in! See how it works!
The latest discriminator architecture was already pretty much a unet. This
one makes that official and uses shared layers. It also upsamples one additional
time and throws out the lowest upsampling result.
The intent is to delete the old vgg pixdisc, but I'll keep it around for a bit since
I'm still trying out a few models with it.
Big update to SRG2 architecture to pull in a lot of things that have been learned:
- Use group norm instead of batch norm
- Initialize the weights on the transformations low like is done in RRDB rather than using the scalar. Models live or die by their early stages, and this ones early stage is pretty weak
- Transform multiplexer to use u-net like architecture.
- Just use one set of configuration variables instead of a list - flat networks performed fine in this regard.
Something strange is going on. These networks do not respond to
discriminator gradients properly anymore. SRG1 did, however so
reverting back to last known good state to figure out why.
An evolution of SwitchedResidualGenerator, this variant nests attention
modules upon themselves to extend the representative capacity of the
model significantly.
- Add filters_mid spec which allows a expansion->squeeze for the transformation layers.
- Add scale and bias AFTER the switch
- Remove identity transform (models were converging on this)
- Move attention image generation and temperature setting into new function which gets called every step with a save path
The concept here is to use switching to split the generator into two functions:
interpretation and transformation. Transformation is done at the pixel level by
relatively simple conv layers, while interpretation is computed at various levels
by far more complicated conv stacks. The two are merged using the switching
mechanism.
This architecture is far less computationally intensive that RRDB.
- Add LowDimRRDB; essentially a "normal RRDB" but the RDB blocks process at a low dimension using PixelShuffle
- Add switching wrappers around it
- Add support for switching on top of multi-headed inputs and outputs
- Moves PixelUnshuffle to arch_util
Renames AttentiveRRDB to SwitchedRRDB. Moves SwitchedConv to
an external repo (neonbjb/switchedconv). Switchs RDB blocks instead
of conv blocks. Works good!
- Makes skip connections between the generator and discriminator more
extensible by adding additional configuration options for them and supporting
1 and 0 skips.
- Places the temp/ directory with sample images from the training process appear
in the training directory instead of the codes/ directory.
This network is just a fixed (pre-trained) generator
that performs a corruption transformation that the
generator-in-training is expected to undo alongside
SR.
Implements a ResGenv2 architecture which slightly increases the complexity
of the final output layer but causes it to be shared across all skip outputs.
It's been a tough day figuring out WTH is going on with my discriminators.
It appears the raw FixUp discriminator can get into an "defective" state where
they stop trying to learn and just predict as close to "0" D_fake and D_real as
possible. In this state they provide no feedback to the generator and never
recover. Adding batch norm back in seems to fix this so it must be some sort
of parameterization error.. Should look into fixing this in the future.
This is a simpler resnet-based generator which performs mutations
on an input interspersed with interpolate-upsampling. It is a two
part generator:
1) A component that "fixes" LQ images with a long string of resnet
blocks. This component is intended to remove compression artifacts
and other noise from a LQ image.
2) A component that can double the image size. The idea is that this
component be trained so that it can work at most reasonable
resolutions, such that it can be repeatedly applied to itself to
perform multiple upsamples.
The motivation here is to simplify what is being done inside of RRDB.
I don't believe the complexity inside of that network is justified.