Let the Generator get to a point where it is at least competing with the discriminator before firing off.
Backwards from most GAN architectures, but this one is a bit different from most.
Relu produced good performance gains over LeakyRelu, but
GAN performance degraded significantly. Try SiLU as an alternative
to see if it's the leaky-ness we are looking for or the smooth activation
curvature.
Something strange is going on. These networks do not respond to
discriminator gradients properly anymore. SRG1 did, however so
reverting back to last known good state to figure out why.
- Get rid of forwards(), it makes numeric_stability.py not work properly.
- Do stability auditing across layers.
- Upsample last instead of first, work in much higher dimensionality for transforms.
Move to a fully fixup residual network for the switch (no
batch norms). Fix a bunch of other small bugs. Add in a
temporary latent feed-forward from the bottom of the
switch. Fix several initialization issues.
- Just use resnet blocks for the multiplexer trunk of the generator
- Every block initializes itself, rather than everything at the end
- Cleans up some messy parts of the architecture, including unnecessary
kernel sizes and places where BN is not used properly.
An evolution of SwitchedResidualGenerator, this variant nests attention
modules upon themselves to extend the representative capacity of the
model significantly.
Got rid of the converged multiplexer bases but kept the configurable architecture. The
new multiplexers look a lot like the old one.
Took some queues from the transformer architecture: translate image to a higher filter-space
and stay there for the duration of the models computation. Also perform convs after each
switch to allow the model to anneal issues that arise.