- Most branches get their own noise vector now.
- First attention branch has the intended sole purpose of raw image processing
- Remove norms from joiner block
Basically just cleaning up the code, removing some bad conventions,
and reducing complexity somewhat so that I can play around with
this arch a bit more easily.
Also makes all processing blocks have a conformant signature.
Alters ExpansionBlock to perform a processing conv on the passthrough
before the conjoin operation - this will break backwards compatibilty with SRG2.
Big update to SRG2 architecture to pull in a lot of things that have been learned:
- Use group norm instead of batch norm
- Initialize the weights on the transformations low like is done in RRDB rather than using the scalar. Models live or die by their early stages, and this ones early stage is pretty weak
- Transform multiplexer to use u-net like architecture.
- Just use one set of configuration variables instead of a list - flat networks performed fine in this regard.
Relu produced good performance gains over LeakyRelu, but
GAN performance degraded significantly. Try SiLU as an alternative
to see if it's the leaky-ness we are looking for or the smooth activation
curvature.
- Add LowDimRRDB; essentially a "normal RRDB" but the RDB blocks process at a low dimension using PixelShuffle
- Add switching wrappers around it
- Add support for switching on top of multi-headed inputs and outputs
- Moves PixelUnshuffle to arch_util
Renames AttentiveRRDB to SwitchedRRDB. Moves SwitchedConv to
an external repo (neonbjb/switchedconv). Switchs RDB blocks instead
of conv blocks. Works good!
After doing some thinking and reading on the subject, it occurred to me that
I was treating the generator like a discriminator by focusing the network
complexity at the feature levels. It makes far more sense to process each conv
level equally for the generator, hence the FlatProcessorNet in this commit. This
network borrows some of the residual pass-through logic from RRDB which makes
the gradient path exceptionally short for pretty much all model parameters and
can be trained in O1 optimization mode without overflows again.