Commit Graph

376 Commits

Author SHA1 Message Date
James Betker
9ccf771629 Fix feature validation, wrong device
Only shows up in distributed training for some reason.
2020-07-23 10:16:34 -06:00
James Betker
a7541b6d8d Fix illegal tb_logger use in distributed training 2020-07-23 09:14:01 -06:00
James Betker
bba283776c Enable find_unused_parameters for DistributedDataParallel
attention_norm has some parameters which are not used to compute grad,
which is causing failures in the distributed case.
2020-07-23 09:08:13 -06:00
James Betker
dbf6147504 Add switched discriminator
The logic is that the discriminator may be incapable of providing a truly
targeted loss for all image regions since it has to be too generic
(basically the same argument for the switched generator). So add some
switches in! See how it works!
2020-07-22 20:52:59 -06:00
James Betker
8a0a1569f3 Enable force_multiple in LQ_dataset 2020-07-22 20:51:16 -06:00
James Betker
106b8da315 Assert that temperature is set properly in eval mode. 2020-07-22 20:50:59 -06:00
James Betker
c74b9ee2e4 Add a way to disable grad on portions of the generator graph to save memory 2020-07-22 11:40:42 -06:00
James Betker
e3adafbeac Add convert_model.py and a hacky way to add extra layers to a model 2020-07-22 11:39:45 -06:00
James Betker
7f7e17e291 Update feature discriminator further
Move the feature/disc losses closer and add a feature computation layer.
2020-07-20 20:54:45 -06:00
James Betker
46aa776fbb Allow feature discriminator unet to only output closest layer to feature output 2020-07-19 19:05:08 -06:00
James Betker
8a9f215653 Huge set of mods to support progressive generator growth 2020-07-18 14:18:48 -06:00
James Betker
47a525241f Make attention norm optional 2020-07-18 07:24:02 -06:00
James Betker
ad97a6a18a Progressive SRG first check-in 2020-07-18 07:23:26 -06:00
James Betker
b08b1cad45 Fix feature decay 2020-07-16 23:27:06 -06:00
James Betker
3e7a83896b Fix pixgan debugging issues 2020-07-16 11:45:19 -06:00
James Betker
a1bff64d1a More fixes 2020-07-16 10:48:48 -06:00
James Betker
240f254263 More loss fixes 2020-07-16 10:45:50 -06:00
James Betker
6cfa67d831 Fix featuredisc broadcast error 2020-07-16 10:18:30 -06:00
James Betker
8d061a2687 Add u-net discriminator with feature output 2020-07-16 10:10:09 -06:00
James Betker
0c4c388e15 Remove dualoutputsrg
Good idea, didn't pan out.
2020-07-16 10:09:24 -06:00
James Betker
4bcc409fc7 Fix loadSRG2 typo 2020-07-14 10:20:53 -06:00
James Betker
1e4083a35b Apply temperature mods to all SRG models
(Honestly this needs to be base classed at this point)
2020-07-14 10:19:35 -06:00
James Betker
7659bd6818 Fix temperature equation 2020-07-14 10:17:14 -06:00
James Betker
853468ef82 Allow legacy state_dicts in srg2 2020-07-14 10:03:45 -06:00
James Betker
1b1431133b Add DualOutputSRG
Also removes the old multi-return mechanism that Generators support.
Also fixes AttentionNorm.
2020-07-14 09:28:24 -06:00
James Betker
a2285ff2ee Scale anorm by transform count 2020-07-13 08:49:09 -06:00
James Betker
dd0bbd9a7c Enable AttentionNorm on SRG2 2020-07-13 08:38:17 -06:00
James Betker
4c0f770f2a Fix inverted temperature curve bug 2020-07-12 11:02:50 -06:00
James Betker
14d23b9d20 Fixes, do fake swaps less often in pixgan discriminator 2020-07-11 21:22:11 -06:00
James Betker
ba6187859a err5 2020-07-10 23:02:56 -06:00
James Betker
902527dfaa err4 2020-07-10 23:00:21 -06:00
James Betker
020b3361fa err3 2020-07-10 22:57:34 -06:00
James Betker
b3a2c21250 err2 2020-07-10 22:52:02 -06:00
James Betker
716433db1f err1 2020-07-10 22:50:56 -06:00
James Betker
ef9f1307eb Sometimes don't use compression artifacts 2020-07-10 22:25:53 -06:00
James Betker
0b7193392f Implement unet disc
The latest discriminator architecture was already pretty much a unet. This
one makes that official and uses shared layers. It also upsamples one additional
time and throws out the lowest upsampling result.

The intent is to delete the old vgg pixdisc, but I'll keep it around for a bit since
I'm still trying out a few models with it.
2020-07-10 16:24:42 -06:00
James Betker
812c684f7d Update pixgan swap algorithm
- Swap multiple blocks in the image instead of just one. The discriminator was clearly
  learning that most blocks have one region that needs to be fixed.
- Relax block size constraints. This was in place to gaurantee that the discriminator
  signal was clean. Instead, just downsample the "loss image" with bilinear interpolation.
  The result is noisier, but this is actually probably healthy for the discriminator.
2020-07-10 15:56:14 -06:00
James Betker
33ca3832e1 Move ExpansionBlock to arch_util
Also makes all processing blocks have a conformant signature.

Alters ExpansionBlock to perform a processing conv on the passthrough
before the conjoin operation - this will break backwards compatibilty with SRG2.
2020-07-10 15:53:41 -06:00
James Betker
5e8b52f34c Misc changes 2020-07-10 09:45:48 -06:00
James Betker
5f2c722a10 SRG2 revival
Big update to SRG2 architecture to pull in a lot of things that have been learned:
- Use group norm instead of batch norm
- Initialize the weights on the transformations low like is done in RRDB rather than using the scalar. Models live or die by their early stages, and this ones early stage is pretty weak
- Transform multiplexer to use u-net like architecture.
- Just use one set of configuration variables instead of a list - flat networks performed fine in this regard.
2020-07-09 17:34:51 -06:00
James Betker
12da993da8 More fixes... 2020-07-08 22:07:09 -06:00
James Betker
7d6eb28b87 More fixes 2020-07-08 22:00:57 -06:00
James Betker
b2507be13c Fix up pixgan loss and pixdisc 2020-07-08 21:27:48 -06:00
James Betker
26a4a66d1c Bug fixes and new gan mechanism
- Removed a bunch of unnecessary image loggers. These were just consuming space and never being viewed
- Got rid of support of artificial var_ref support. The new pixdisc is what i wanted to implement then - it's much better.
- Add pixgan GAN mechanism. This is purpose-built for the pixdisc. It is intended to promote a healthy discriminator
- Megabatchfactor was applied twice on metrics, fixed that

Adds pix_gan (untested) which swaps a portion of the fake and real image with each other, then expects the discriminator
to properly discriminate the swapped regions.
2020-07-08 17:40:26 -06:00
James Betker
4305be97b4 Update log metrics
They should now be universal regardless of job configuration
2020-07-07 15:33:22 -06:00
James Betker
8a4eb8241d SRG3 work
Operates on top of a pre-trained SpineNET backbone (trained on CoCo 2017 with RetinaNet)

This variant is extremely shallow.
2020-07-07 13:46:40 -06:00
James Betker
0acad81035 More SRG2 adjustments.. 2020-07-06 22:40:40 -06:00
James Betker
086b2f0570 More bugs 2020-07-06 22:28:07 -06:00
James Betker
d4d4f85fc0 Bug fixes 2020-07-06 22:25:40 -06:00
James Betker
3c31bea1ac SRG2 architectural changes 2020-07-06 22:22:29 -06:00
James Betker
9a1c3241f5 Switch discriminator to groupnorm 2020-07-06 20:59:59 -06:00
James Betker
60c6352843 Misc 2020-07-06 20:44:07 -06:00
James Betker
6beefa6d0c PixDisc - Add two more levels of losses coming from this gen at higher resolutions 2020-07-06 11:15:52 -06:00
James Betker
2636d3b620 Fix assertion error 2020-07-06 09:23:53 -06:00
James Betker
8f92c0a088 Interpolate attention well before softmax 2020-07-06 09:18:30 -06:00
James Betker
72f90cabf8 More pixdisc fixes 2020-07-05 22:03:16 -06:00
James Betker
909007ee6a Add G_warmup
Let the Generator get to a point where it is at least competing with the discriminator before firing off.

Backwards from most GAN architectures, but this one is a bit different from most.
2020-07-05 21:58:35 -06:00
James Betker
a47a5dca43 Fix pixdisc bug 2020-07-05 21:57:52 -06:00
James Betker
d0957bd7d4 Alter weight initialization for transformation blocks 2020-07-05 17:32:46 -06:00
James Betker
16d1bf6dd7 Replace ConvBnRelus in SRG2 with Silus 2020-07-05 17:29:20 -06:00
James Betker
10f7e49214 Add ConvBnSilu to replace ConvBnRelu
Relu produced good performance gains over LeakyRelu, but
GAN performance degraded significantly. Try SiLU as an alternative
to see if it's the leaky-ness we are looking for or the smooth activation
curvature.
2020-07-05 13:39:08 -06:00
James Betker
9934e5d082 Move SRG1 to identical to new 2020-07-05 08:49:34 -06:00
James Betker
416538f31c SRG1 conjoined except ConvBnRelu 2020-07-05 08:44:17 -06:00
James Betker
c58c2b09ca Back to remove all biases (looks like a ConvBnRelu made its way in..) 2020-07-04 22:41:02 -06:00
James Betker
86cda86e94 Re-add biases, also add new init
A/B testing where we lost our GAN competitiveness.
2020-07-04 22:24:42 -06:00
James Betker
b03741f30e Remove all biases from generator
Continuing to investigate loss of GAN competitiveness, this is a big difference
between "old" SRG1 and "new".
2020-07-04 22:19:55 -06:00
James Betker
726e946e79 Turn BN off in SRG1
This wont work well but just testing if GAN performance comes back
2020-07-04 14:51:27 -06:00
James Betker
0ee39d419b OrderedDict not needed 2020-07-04 14:09:27 -06:00
James Betker
9048105b72 Break out SRG1 as separate network
Something strange is going on. These networks do not respond to
discriminator gradients properly anymore. SRG1 did, however so
reverting back to last known good state to figure out why.
2020-07-04 13:28:50 -06:00
James Betker
188de5e15a Misc changes 2020-07-04 13:22:50 -06:00
James Betker
510b2f887d Remove RDB from srg2
Doesnt seem to work so great.
2020-07-03 22:31:20 -06:00
James Betker
77d3765364 Fix new feature loss calc 2020-07-03 22:20:13 -06:00
James Betker
ed6a15e768 Add feature to dataset which allows it to force images to be a certain size. 2020-07-03 15:19:16 -06:00
James Betker
da4335c25e Add a feature-based validation test 2020-07-03 15:18:57 -06:00
James Betker
703dec4472 Add SpineNet & integrate with SRG
New version of SRG uses SpineNet for a switch backbone.
2020-07-03 12:07:31 -06:00
James Betker
3ed7a2b9ab Move ConvBnRelu/Lelu to arch_util 2020-07-03 12:06:38 -06:00
James Betker
ea9c6765ca Move train imports into init_dist 2020-07-02 15:11:21 -06:00
James Betker
e9ee67ff10 Integrate RDB into SRG
The last RDB for each cluster is switched.
2020-07-01 17:19:55 -06:00
James Betker
6ac6c95177 Fix scaling bug 2020-07-01 16:42:27 -06:00
James Betker
30653181ba Experiment: get rid of post_switch_conv 2020-07-01 16:30:40 -06:00
James Betker
17191de836 Experiment: bring initialize_weights back again
Something really strange going on here..
2020-07-01 15:58:13 -06:00
James Betker
d1d573de07 Experiment: new init and post-switch-conv 2020-07-01 15:25:54 -06:00
James Betker
480d1299d7 Remove RRDB with switching
This idea never really panned out, removing it.
2020-07-01 12:08:32 -06:00
James Betker
e2398ac83c Experiment: revert initialization changes 2020-07-01 12:08:09 -06:00
James Betker
78276afcaa Experiment: Back to lelu 2020-07-01 11:43:25 -06:00
James Betker
b945021c90 SRG v2 - Move to Relu, rely on Module-based initialization 2020-07-01 11:33:32 -06:00
James Betker
ee6443ad7d Add numeric stability computation script 2020-07-01 11:30:34 -06:00
James Betker
c0bb123504 Misc changes 2020-07-01 11:28:23 -06:00
James Betker
604763be68 NSG r7
Converts the switching trunk to a VGG-style network to make it more comparable
to SRG architectures.
2020-07-01 09:54:29 -06:00
James Betker
87f1e9c56f Invert ResGen2 to operate in LR space 2020-06-30 20:57:40 -06:00
James Betker
e07d8abafb NSG rev 6
- Disable style passthrough
- Process multiplexers starting at base resolution
2020-06-30 20:47:26 -06:00
James Betker
3ce1a1878d NSG improvements (r5)
- Get rid of forwards(), it makes numeric_stability.py not work properly.
- Do stability auditing across layers.
- Upsample last instead of first, work in much higher dimensionality for transforms.
2020-06-30 16:59:57 -06:00
James Betker
75f148022d Even more NSG improvements (r4) 2020-06-30 13:52:47 -06:00
James Betker
773753073f More NSG improvements (v3)
Move to a fully fixup residual network for the switch (no
batch norms). Fix a bunch of other small bugs. Add in a
temporary latent feed-forward from the bottom of the
switch. Fix several initialization issues.
2020-06-29 20:26:51 -06:00
James Betker
4b82d0815d NSG improvements
- Just use resnet blocks for the multiplexer trunk of the generator
- Every block initializes itself, rather than everything at the end
- Cleans up some messy parts of the architecture, including unnecessary
  kernel sizes and places where BN is not used properly.
2020-06-29 10:09:51 -06:00
James Betker
978036e7b3 Add NestedSwitchGenerator
An evolution of SwitchedResidualGenerator, this variant nests attention
modules upon themselves to extend the representative capacity of the
model significantly.
2020-06-28 21:22:05 -06:00
James Betker
6f2bc36c61 Distill_torchscript mods
Starts down the path of writing a custom trace that works using torch's hook mechanism.
2020-06-27 08:28:09 -06:00
James Betker
db08dedfe2 Add recover_tensorboard_log
Generates a tb_logger from raw console output. Useful for colab sessions
that crash.
2020-06-27 08:26:57 -06:00
James Betker
c8a670842e Missed networks.py in last commit 2020-06-25 18:36:06 -06:00
James Betker
407224eba1 Re-work SwitchedResgen2
Got rid of the converged multiplexer bases but kept the configurable architecture. The
new multiplexers look a lot like the old one.

Took some queues from the transformer architecture: translate image to a higher filter-space
and stay there for the duration of the models computation. Also perform convs after each
switch to allow the model to anneal issues that arise.
2020-06-25 18:17:05 -06:00
James Betker
42a10b34ce Re-enable batch norm on switch processing blocks
Found out that batch norm is causing the switches to init really poorly -
not using a significant number of transforms. Might be a great time to
re-consider using the attention norm, but for now just re-enable it.
2020-06-24 21:15:17 -06:00
James Betker
4001db1ede Add ConfigurableSwitchComputer 2020-06-24 19:49:37 -06:00
James Betker
83c3b8b982 Add parameterized noise injection into resgen 2020-06-23 10:16:02 -06:00
James Betker
0584c3b587 Add negative_transforms switch to resgen 2020-06-23 09:41:12 -06:00
James Betker
dfcbe5f2db Add capability to place additional conv into discriminator
This should allow us to support larger images sizes. May need
to add another one of these.
2020-06-23 09:40:33 -06:00
James Betker
bad33de906 Add simple resize to extract images 2020-06-23 09:39:51 -06:00
James Betker
030648f2bc Remove batchnorms from resgen 2020-06-22 17:23:36 -06:00
James Betker
68bcab03ae Add growth channel to switch_growths for flat networks 2020-06-22 10:40:16 -06:00
James Betker
3b81712c49 Remove BN from transforms 2020-06-19 16:52:56 -06:00
James Betker
61364ec7d0 Fix inverse temperature curve logic and add upsample factor 2020-06-19 09:18:30 -06:00
James Betker
0551139b8d Fix resgen temperature curve below 1
It needs to be inverted to maintain a true linear curve
2020-06-18 16:08:07 -06:00
James Betker
efc80f041c Save & load amp state 2020-06-18 11:38:48 -06:00
James Betker
2e3b6bad77 Log tensorboard directly into experiments directory 2020-06-18 11:33:02 -06:00
James Betker
778e7b6931 Add a double-step to attention temperature 2020-06-18 11:29:31 -06:00
James Betker
d2d5e097d5 Add profiling to SRGAN for testing timings 2020-06-18 11:29:10 -06:00
James Betker
45a900fafe Misc 2020-06-18 11:28:55 -06:00
James Betker
59b0533b06 Fix attimage step size 2020-06-17 18:45:24 -06:00
James Betker
645d0ca767 ResidualGen mods
- Add filters_mid spec which allows a expansion->squeeze for the transformation layers.
- Add scale and bias AFTER the switch
- Remove identity transform (models were converging on this)
- Move attention image generation and temperature setting into new function which gets called every step with a save path
2020-06-17 17:18:28 -06:00
James Betker
6f8406fbdc Fixed ConfigurableSwitchedGenerator bug 2020-06-16 16:53:57 -06:00
James Betker
7d541642aa Get rid of SwitchedResidualGenerator
Just use the configurable one instead..
2020-06-16 16:23:29 -06:00
James Betker
379b96eb55 Output histograms with SwitchedResidualGenerator
This also fixes the initialization weight for the configurable generator.
2020-06-16 15:54:37 -06:00
James Betker
f8b67f134b Get proper contiguous view for backwards compatibility 2020-06-16 14:27:16 -06:00
James Betker
2def96203e Mods to SwitchedResidualGenerator_arch
- Increased processing for high-resolution switches
- Do stride=2 first in HalvingProcessingBlock
2020-06-16 14:19:12 -06:00
James Betker
70c764b9d4 Create a configurable SwichedResidualGenerator
Also move attention image generator out of repo
2020-06-16 13:24:07 -06:00
James Betker
df1046c318 New arch: SwitchedResidualGenerator_arch
The concept here is to use switching to split the generator into two functions:
interpretation and transformation. Transformation is done at the pixel level by
relatively simple conv layers, while interpretation is computed at various levels
by far more complicated conv stacks. The two are merged using the switching
mechanism.

This architecture is far less computationally intensive that RRDB.
2020-06-16 11:23:50 -06:00
James Betker
ddfd7f67a0 Get rid of biggan
Not really sure it's a great fit for what is being done here.
2020-06-16 11:21:44 -06:00
James Betker
0a714e8451 Fix initialization in mhead switched rrdb 2020-06-15 21:32:03 -06:00
James Betker
be7982b9ae Add skip heads to switcher
These pass through the input so that it can be selected by the attention mechanism.
2020-06-14 12:46:54 -06:00
James Betker
6c27ddc9b5 Misc 2020-06-14 11:03:02 -06:00
James Betker
6c0e9f45c7 Add GPU mem tracing module 2020-06-14 11:02:54 -06:00
James Betker
48532a0a8a Fix initial_stride on lowdim models 2020-06-14 11:02:16 -06:00
James Betker
532704af40 Multiple modifications for experimental RRDB architectures
- Add LowDimRRDB; essentially a "normal RRDB" but the RDB blocks process at a low dimension using PixelShuffle
- Add switching wrappers around it
- Add support for switching on top of multi-headed inputs and outputs
- Moves PixelUnshuffle to arch_util
2020-06-13 11:37:27 -06:00
James Betker
e89f28ead0 Update multirrdb to do HR fixing in the base image dimension. 2020-06-11 08:43:39 -06:00
James Betker
d3b2cbfe7c Fix loading new state dicts for RRDB 2020-06-11 08:25:57 -06:00
James Betker
5ca53e7786 Add alternative first block for PixShuffleRRDB 2020-06-10 21:45:24 -06:00
James Betker
43b7fccc89 Fix mhead attention integration bug for RRDB 2020-06-10 12:02:33 -06:00
James Betker
12e8fad079 Add serveral new RRDB architectures 2020-06-09 13:28:55 -06:00
James Betker
296135ec18 Add doResizeLoss to dataset
doResizeLoss has a 50% chance to resize the LQ image to 50% size,
then back to original size. This is useful to training a generator to
recover these lost pixel values while also being able to do
repairs on higher resolution images during training.
2020-06-08 11:27:06 -06:00
James Betker
786a4288d6 Allow switched RRDBNet to record metrics and decay temperature 2020-06-08 11:10:38 -06:00
James Betker
ae3301c0ea SwitchedRRDB work
Renames AttentiveRRDB to SwitchedRRDB. Moves SwitchedConv to
an external repo (neonbjb/switchedconv). Switchs RDB blocks instead
of conv blocks. Works good!
2020-06-08 08:47:34 -06:00
James Betker
93528ff8df Merge branch 'gan_lab' of https://github.com/neonbjb/mmsr into gan_lab 2020-06-07 16:59:31 -06:00
James Betker
805bd129b7 Switched conv partial impl 2020-06-07 16:59:22 -06:00
James Betker
9e203d07c4 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-06-07 16:56:12 -06:00
James Betker
299d855b34 Enable forced learning rates 2020-06-07 16:56:05 -06:00
James Betker
efb5b3d078 Add switched_conv 2020-06-07 16:45:07 -06:00
James Betker
063719c5cc Fix attention conv bugs 2020-06-06 18:31:02 -06:00
James Betker
cbedd6340a Add RRDB with attention 2020-06-05 21:02:08 -06:00
James Betker
ef5d8a0ed1 Misc 2020-06-05 21:01:50 -06:00
James Betker
318a604405 Allow weighting of input data
This essentially allows you to give some datasets more
importance than others for the purposes of reaching a more
refined network.
2020-06-04 10:05:21 -06:00
James Betker
edf0f8582e Fix rrdb bug 2020-06-02 11:15:55 -06:00