Also makes all processing blocks have a conformant signature.
Alters ExpansionBlock to perform a processing conv on the passthrough
before the conjoin operation - this will break backwards compatibilty with SRG2.
Big update to SRG2 architecture to pull in a lot of things that have been learned:
- Use group norm instead of batch norm
- Initialize the weights on the transformations low like is done in RRDB rather than using the scalar. Models live or die by their early stages, and this ones early stage is pretty weak
- Transform multiplexer to use u-net like architecture.
- Just use one set of configuration variables instead of a list - flat networks performed fine in this regard.
- Get rid of forwards(), it makes numeric_stability.py not work properly.
- Do stability auditing across layers.
- Upsample last instead of first, work in much higher dimensionality for transforms.
Move to a fully fixup residual network for the switch (no
batch norms). Fix a bunch of other small bugs. Add in a
temporary latent feed-forward from the bottom of the
switch. Fix several initialization issues.
- Just use resnet blocks for the multiplexer trunk of the generator
- Every block initializes itself, rather than everything at the end
- Cleans up some messy parts of the architecture, including unnecessary
kernel sizes and places where BN is not used properly.
An evolution of SwitchedResidualGenerator, this variant nests attention
modules upon themselves to extend the representative capacity of the
model significantly.
Got rid of the converged multiplexer bases but kept the configurable architecture. The
new multiplexers look a lot like the old one.
Took some queues from the transformer architecture: translate image to a higher filter-space
and stay there for the duration of the models computation. Also perform convs after each
switch to allow the model to anneal issues that arise.
Found out that batch norm is causing the switches to init really poorly -
not using a significant number of transforms. Might be a great time to
re-consider using the attention norm, but for now just re-enable it.