James Betker
6657a406ac
Mods needed to support training a corruptor again:
...
- Allow original SPSRNet to have a specifiable block increment
- Cleanup
- Bug fixes in code that hasnt been touched in awhile.
2020-09-04 15:33:39 -06:00
James Betker
bfdfaab911
Checkpoint RRDB
...
Greatly reduces memory consumption with a low performance penalty
2020-09-04 15:32:00 -06:00
James Betker
696242064c
Use tensor checkpointing to drastically reduce memory usage
...
This comes at the expense of computation, but since we can use much larger
batches, it results in a net speedup.
2020-09-03 11:33:36 -06:00
James Betker
365813bde3
Add InterpolateInjector
2020-09-03 11:32:47 -06:00
James Betker
d90c96e55e
Fix greyscale injector
2020-09-02 10:29:40 -06:00
James Betker
8b52d46847
Interpreted feature loss to extensibletrainer
2020-09-02 10:08:24 -06:00
James Betker
886d59d5df
Misc fixes & adjustments
2020-09-01 07:58:11 -06:00
James Betker
0a9b85f239
Fix vgg_gn input_img_factor
2020-08-31 09:50:30 -06:00
James Betker
4b4d08bdec
Enable testing in ExtensibleTrainer, fix it in SRGAN_model
...
Also compute fea loss for this.
2020-08-31 09:41:48 -06:00
James Betker
b2091cb698
feamod fix
2020-08-30 08:08:49 -06:00
James Betker
a56e906f9c
train HR feature trainer
2020-08-29 09:27:48 -06:00
James Betker
0e859a8082
4x spsr ref (not workin)
2020-08-29 09:27:18 -06:00
James Betker
25832930db
Update loss with lr crossgan
2020-08-26 17:57:22 -06:00
James Betker
cbd5e7a986
Support old school crossgan in extensibletrainer
2020-08-26 17:52:35 -06:00
James Betker
8a6a2e6e2e
Rev3 of the full image ref arch
2020-08-26 17:11:01 -06:00
James Betker
f35b3ad28f
Fix val behavior for ExtensibleTrainer
2020-08-26 08:44:22 -06:00
James Betker
434ed70a9a
Wrap vgg disc
2020-08-25 18:14:45 -06:00
James Betker
83f2f8d239
more debugging
2020-08-25 18:12:12 -06:00
James Betker
3f60281da7
Print when wrapping
2020-08-25 18:08:46 -06:00
James Betker
bae18c05e6
wrap disc grad
2020-08-25 17:58:20 -06:00
James Betker
f85f1e21db
Turns out, can't do that
2020-08-25 17:18:52 -06:00
James Betker
935a735327
More dohs
2020-08-25 17:05:16 -06:00
James Betker
53e67bdb9c
Distribute get_grad_no_padding
2020-08-25 17:03:18 -06:00
James Betker
2f706b7d93
I an inept.
2020-08-25 16:42:59 -06:00
James Betker
8bae0de769
ffffffffffffffffff
2020-08-25 16:41:01 -06:00
James Betker
1fe16f71dd
Fix bug reporting spsr gan weight
2020-08-25 16:37:45 -06:00
James Betker
96586d6592
Fix distributed d_grad
2020-08-25 16:06:27 -06:00
James Betker
09a9079e17
Check rank before doing image logging.
2020-08-25 16:00:49 -06:00
James Betker
a1800f45ef
Fix for referencingmultiplexer
2020-08-25 15:43:12 -06:00
James Betker
a65b07607c
Reference network
2020-08-25 11:56:59 -06:00
James Betker
f9276007a8
More fixes to corrupt_fea
2020-08-23 17:52:18 -06:00
James Betker
0005c56cd4
dbg
2020-08-23 17:43:03 -06:00
James Betker
4bb5b3c981
corfea debugging
2020-08-23 17:39:02 -06:00
James Betker
7713cb8df5
Corrupted features in srgan
2020-08-23 17:32:03 -06:00
James Betker
dffc15184d
More ExtensibleTrainer work
...
It runs now, just need to debug it to reach performance parity with SRGAN. Sweet.
2020-08-23 17:22:45 -06:00
James Betker
afdd93fbe9
Grey feature
2020-08-22 13:41:38 -06:00
James Betker
e59e712e39
More ExtensibleTrainer work
2020-08-22 13:08:33 -06:00
James Betker
f40545f235
ExtensibleTrainer work
2020-08-22 08:24:34 -06:00
James Betker
a498d7b1b3
Report l_g_gan_grad before weight multiplication
2020-08-20 11:57:53 -06:00
James Betker
9d77a4db2e
Allow initial temperature to be specified to SPSR net for inference
2020-08-20 11:57:34 -06:00
James Betker
24bdcc1181
Let SwitchedSpsr transform count be specified
2020-08-18 09:10:25 -06:00
James Betker
74cdaa2226
Some work on extensible trainer
2020-08-18 08:49:32 -06:00
James Betker
868d0aa442
Undo early dim reduction on grad branch for SPSR_arch
2020-08-14 16:23:42 -06:00
James Betker
2d205f52ac
Unite spsr_arch switched gens
...
Found a pretty good basis model.
2020-08-12 17:04:45 -06:00
James Betker
3d0ece804b
SPSR LR2
2020-08-12 08:45:49 -06:00
James Betker
ab04ca1778
Extensible trainer (in progress)
2020-08-12 08:45:23 -06:00
James Betker
cb316fabc7
Use LR data for image gradient prediction when HR data is disjoint
2020-08-10 15:00:28 -06:00
James Betker
f0e2816239
Denoise attention maps
2020-08-10 14:59:58 -06:00
James Betker
59aba1daa7
LR switched SPSR arch
...
This variant doesn't do conv processing at HR, which should save
a ton of memory in inference. Lets see how it works.
2020-08-10 13:03:36 -06:00
James Betker
4e972144ae
More attention fixes for switched_spsr
2020-08-07 21:11:50 -06:00
James Betker
d02509ef97
spsr_switched missing import
2020-08-07 21:05:29 -06:00
James Betker
887806ffa0
Finish up spsr_switched
2020-08-07 21:03:48 -06:00
James Betker
1d5f4f6102
Crossgan
2020-08-07 21:03:39 -06:00
James Betker
fd7b6ca0a9
Comptue gan_grad_branch....
2020-08-06 12:11:40 -06:00
James Betker
30b16d5235
Update how branch GAN grad is disseminated
2020-08-06 11:13:02 -06:00
James Betker
1f21c02f8b
Add cross-compare discriminator
2020-08-06 08:56:21 -06:00
James Betker
be272248af
More RAGAN fixes
2020-08-05 16:47:21 -06:00
James Betker
26a6a5d512
Compute grad GAN loss against both the branch and final target, simplify pixel loss
...
Also fixes a memory leak issue where we weren't detaching our loss stats when
logging them. This stabilizes memory usage substantially.
2020-08-05 12:08:15 -06:00
James Betker
299ee13988
More RAGAN fixes
2020-08-05 11:03:06 -06:00
James Betker
b8a4df0a0a
Enable RAGAN in SPSR, retrofit old RAGAN for efficiency
2020-08-05 10:34:34 -06:00
James Betker
3ab39f0d22
Several new spsr nets
2020-08-05 10:01:24 -06:00
James Betker
3c0a2d6efe
Fix grad branch debug out
2020-08-04 16:43:43 -06:00
James Betker
ec2a795d53
Fix multistep optimizer (feeding from wrong config params)
2020-08-04 16:42:58 -06:00
James Betker
4bfbdaf94f
Don't recompute generator outputs for D in standard operation
...
Should significantly improve training performance with negligible
results differences.
2020-08-04 11:28:52 -06:00
James Betker
11b227edfc
Whoops
2020-08-04 10:30:40 -06:00
James Betker
6d25bcd5df
Apply fixes to grad discriminator
2020-08-04 10:25:13 -06:00
James Betker
c7e5d3888a
Add pix_grad_branch loss to metrics
2020-08-03 16:21:05 -06:00
James Betker
0d070b47a7
Add simplified SPSR architecture
...
Basically just cleaning up the code, removing some bad conventions,
and reducing complexity somewhat so that I can play around with
this arch a bit more easily.
2020-08-03 10:25:37 -06:00
James Betker
47e24039b5
Fix bug that makes feature loss run even when it is off
2020-08-02 20:37:51 -06:00
James Betker
328afde9c0
Integrate SPSR into SRGAN_model
...
SPSR_model really isn't that different from SRGAN_model. Rather than continuing to re-implement
everything I've done in SRGAN_model, port the new stuff from SPSR over.
This really demonstrates the need to refactor SRGAN_model a bit to make it cleaner. It is quite the
beast these days..
2020-08-02 12:55:08 -06:00
James Betker
c8da78966b
Substantial SPSR mods & fixes
...
- Added in gradient accumulation via mega-batch-factor
- Added AMP
- Added missing train hooks
- Added debug image outputs
- Cleaned up including removing GradientPenaltyLoss, custom SpectralNorm
- Removed all the custom discriminators
2020-08-02 10:45:24 -06:00
James Betker
f894ba8f98
Add SPSR_module
...
This is a port from the SPSR repo, it's going to need a lot of work to be properly integrated
but as of this commit it at least runs.
2020-08-01 22:02:54 -06:00
James Betker
f33ed578a2
Update how attention_maps are created
2020-08-01 20:23:46 -06:00
James Betker
c139f5cd17
More torch 1.6 fixes
2020-07-31 17:03:20 -06:00
James Betker
a66fbb32b6
Fix fixed_disc DataParallel issue
2020-07-31 16:59:23 -06:00
James Betker
8dd44182e6
Fix scale torch warning
2020-07-31 16:56:04 -06:00
James Betker
bcebed19b7
Fix pixdisc bugs
2020-07-31 16:38:14 -06:00
James Betker
eb11a08d1c
Enable disjoint feature networks
...
This is done by pre-training a feature net that predicts the features
of HR images from LR images. Then use the original feature network
and this new one in tandem to work only on LR/Gen images.
2020-07-31 16:29:47 -06:00
James Betker
6e086d0c20
Fix fixed_disc
2020-07-31 15:07:10 -06:00
James Betker
d5fa059594
Add capability to have old discriminators serve as feature networks
2020-07-31 14:59:54 -06:00
James Betker
6b45b35447
Allow multi_step_lr_scheduler to load a new LR schedule when restoring state
2020-07-31 11:21:11 -06:00
James Betker
e37726f302
Add feature_model for training custom feature nets
2020-07-31 11:20:39 -06:00
James Betker
7629cb0e61
Add FDPL Loss
...
New loss type that can replace PSNR loss. Works against the frequency domain
and focuses on frequency features loss during hr->lr conversion.
2020-07-30 20:47:57 -06:00
James Betker
85ee64b8d9
Turn down feadisc intensity
...
Honestly - this feature is probably going to be removed soon, so backwards
compatibility is not a huge deal anymore.
2020-07-27 15:28:55 -06:00
James Betker
ebb199e884
Get rid of safety valve (probably being encountered in val)
2020-07-26 22:51:59 -06:00
James Betker
d09ed4e5f7
Misc fixes
2020-07-26 22:44:24 -06:00
James Betker
c54784ae9e
Fix feature disc log item error
2020-07-26 22:25:59 -06:00
James Betker
9a8f227501
Allow separate dataset to pushed in for GAN-only training
2020-07-26 21:44:45 -06:00
James Betker
b06e1784e1
Fix SRG4 & switch disc
...
"fix". hehe.
2020-07-25 17:16:54 -06:00
James Betker
e6e91a1d75
Add SRG4
...
Back to the idea that maybe what we need is a hybrid
approach between pure switches and RDB.
2020-07-24 20:32:49 -06:00
James Betker
3320ad685f
Fix mega_batch_factor not set for test
2020-07-24 12:26:44 -06:00
James Betker
c50cce2a62
Add an abstract, configurabler weight scheduling class and apply it to the feature weight
2020-07-23 17:03:54 -06:00
James Betker
9ccf771629
Fix feature validation, wrong device
...
Only shows up in distributed training for some reason.
2020-07-23 10:16:34 -06:00
James Betker
bba283776c
Enable find_unused_parameters for DistributedDataParallel
...
attention_norm has some parameters which are not used to compute grad,
which is causing failures in the distributed case.
2020-07-23 09:08:13 -06:00
James Betker
dbf6147504
Add switched discriminator
...
The logic is that the discriminator may be incapable of providing a truly
targeted loss for all image regions since it has to be too generic
(basically the same argument for the switched generator). So add some
switches in! See how it works!
2020-07-22 20:52:59 -06:00
James Betker
106b8da315
Assert that temperature is set properly in eval mode.
2020-07-22 20:50:59 -06:00
James Betker
c74b9ee2e4
Add a way to disable grad on portions of the generator graph to save memory
2020-07-22 11:40:42 -06:00
James Betker
e3adafbeac
Add convert_model.py and a hacky way to add extra layers to a model
2020-07-22 11:39:45 -06:00
James Betker
7f7e17e291
Update feature discriminator further
...
Move the feature/disc losses closer and add a feature computation layer.
2020-07-20 20:54:45 -06:00
James Betker
46aa776fbb
Allow feature discriminator unet to only output closest layer to feature output
2020-07-19 19:05:08 -06:00
James Betker
8a9f215653
Huge set of mods to support progressive generator growth
2020-07-18 14:18:48 -06:00
James Betker
47a525241f
Make attention norm optional
2020-07-18 07:24:02 -06:00
James Betker
ad97a6a18a
Progressive SRG first check-in
2020-07-18 07:23:26 -06:00
James Betker
b08b1cad45
Fix feature decay
2020-07-16 23:27:06 -06:00
James Betker
3e7a83896b
Fix pixgan debugging issues
2020-07-16 11:45:19 -06:00
James Betker
a1bff64d1a
More fixes
2020-07-16 10:48:48 -06:00
James Betker
240f254263
More loss fixes
2020-07-16 10:45:50 -06:00
James Betker
6cfa67d831
Fix featuredisc broadcast error
2020-07-16 10:18:30 -06:00
James Betker
8d061a2687
Add u-net discriminator with feature output
2020-07-16 10:10:09 -06:00
James Betker
0c4c388e15
Remove dualoutputsrg
...
Good idea, didn't pan out.
2020-07-16 10:09:24 -06:00
James Betker
4bcc409fc7
Fix loadSRG2 typo
2020-07-14 10:20:53 -06:00
James Betker
1e4083a35b
Apply temperature mods to all SRG models
...
(Honestly this needs to be base classed at this point)
2020-07-14 10:19:35 -06:00
James Betker
7659bd6818
Fix temperature equation
2020-07-14 10:17:14 -06:00
James Betker
853468ef82
Allow legacy state_dicts in srg2
2020-07-14 10:03:45 -06:00
James Betker
1b1431133b
Add DualOutputSRG
...
Also removes the old multi-return mechanism that Generators support.
Also fixes AttentionNorm.
2020-07-14 09:28:24 -06:00
James Betker
a2285ff2ee
Scale anorm by transform count
2020-07-13 08:49:09 -06:00
James Betker
dd0bbd9a7c
Enable AttentionNorm on SRG2
2020-07-13 08:38:17 -06:00
James Betker
4c0f770f2a
Fix inverted temperature curve bug
2020-07-12 11:02:50 -06:00
James Betker
14d23b9d20
Fixes, do fake swaps less often in pixgan discriminator
2020-07-11 21:22:11 -06:00
James Betker
ba6187859a
err5
2020-07-10 23:02:56 -06:00
James Betker
902527dfaa
err4
2020-07-10 23:00:21 -06:00
James Betker
020b3361fa
err3
2020-07-10 22:57:34 -06:00
James Betker
b3a2c21250
err2
2020-07-10 22:52:02 -06:00
James Betker
716433db1f
err1
2020-07-10 22:50:56 -06:00
James Betker
0b7193392f
Implement unet disc
...
The latest discriminator architecture was already pretty much a unet. This
one makes that official and uses shared layers. It also upsamples one additional
time and throws out the lowest upsampling result.
The intent is to delete the old vgg pixdisc, but I'll keep it around for a bit since
I'm still trying out a few models with it.
2020-07-10 16:24:42 -06:00
James Betker
812c684f7d
Update pixgan swap algorithm
...
- Swap multiple blocks in the image instead of just one. The discriminator was clearly
learning that most blocks have one region that needs to be fixed.
- Relax block size constraints. This was in place to gaurantee that the discriminator
signal was clean. Instead, just downsample the "loss image" with bilinear interpolation.
The result is noisier, but this is actually probably healthy for the discriminator.
2020-07-10 15:56:14 -06:00
James Betker
33ca3832e1
Move ExpansionBlock to arch_util
...
Also makes all processing blocks have a conformant signature.
Alters ExpansionBlock to perform a processing conv on the passthrough
before the conjoin operation - this will break backwards compatibilty with SRG2.
2020-07-10 15:53:41 -06:00
James Betker
5e8b52f34c
Misc changes
2020-07-10 09:45:48 -06:00
James Betker
5f2c722a10
SRG2 revival
...
Big update to SRG2 architecture to pull in a lot of things that have been learned:
- Use group norm instead of batch norm
- Initialize the weights on the transformations low like is done in RRDB rather than using the scalar. Models live or die by their early stages, and this ones early stage is pretty weak
- Transform multiplexer to use u-net like architecture.
- Just use one set of configuration variables instead of a list - flat networks performed fine in this regard.
2020-07-09 17:34:51 -06:00
James Betker
12da993da8
More fixes...
2020-07-08 22:07:09 -06:00
James Betker
7d6eb28b87
More fixes
2020-07-08 22:00:57 -06:00
James Betker
b2507be13c
Fix up pixgan loss and pixdisc
2020-07-08 21:27:48 -06:00
James Betker
26a4a66d1c
Bug fixes and new gan mechanism
...
- Removed a bunch of unnecessary image loggers. These were just consuming space and never being viewed
- Got rid of support of artificial var_ref support. The new pixdisc is what i wanted to implement then - it's much better.
- Add pixgan GAN mechanism. This is purpose-built for the pixdisc. It is intended to promote a healthy discriminator
- Megabatchfactor was applied twice on metrics, fixed that
Adds pix_gan (untested) which swaps a portion of the fake and real image with each other, then expects the discriminator
to properly discriminate the swapped regions.
2020-07-08 17:40:26 -06:00
James Betker
4305be97b4
Update log metrics
...
They should now be universal regardless of job configuration
2020-07-07 15:33:22 -06:00
James Betker
8a4eb8241d
SRG3 work
...
Operates on top of a pre-trained SpineNET backbone (trained on CoCo 2017 with RetinaNet)
This variant is extremely shallow.
2020-07-07 13:46:40 -06:00
James Betker
0acad81035
More SRG2 adjustments..
2020-07-06 22:40:40 -06:00
James Betker
086b2f0570
More bugs
2020-07-06 22:28:07 -06:00
James Betker
d4d4f85fc0
Bug fixes
2020-07-06 22:25:40 -06:00
James Betker
3c31bea1ac
SRG2 architectural changes
2020-07-06 22:22:29 -06:00
James Betker
9a1c3241f5
Switch discriminator to groupnorm
2020-07-06 20:59:59 -06:00
James Betker
6beefa6d0c
PixDisc - Add two more levels of losses coming from this gen at higher resolutions
2020-07-06 11:15:52 -06:00
James Betker
2636d3b620
Fix assertion error
2020-07-06 09:23:53 -06:00
James Betker
8f92c0a088
Interpolate attention well before softmax
2020-07-06 09:18:30 -06:00
James Betker
72f90cabf8
More pixdisc fixes
2020-07-05 22:03:16 -06:00
James Betker
909007ee6a
Add G_warmup
...
Let the Generator get to a point where it is at least competing with the discriminator before firing off.
Backwards from most GAN architectures, but this one is a bit different from most.
2020-07-05 21:58:35 -06:00
James Betker
a47a5dca43
Fix pixdisc bug
2020-07-05 21:57:52 -06:00
James Betker
d0957bd7d4
Alter weight initialization for transformation blocks
2020-07-05 17:32:46 -06:00
James Betker
16d1bf6dd7
Replace ConvBnRelus in SRG2 with Silus
2020-07-05 17:29:20 -06:00
James Betker
10f7e49214
Add ConvBnSilu to replace ConvBnRelu
...
Relu produced good performance gains over LeakyRelu, but
GAN performance degraded significantly. Try SiLU as an alternative
to see if it's the leaky-ness we are looking for or the smooth activation
curvature.
2020-07-05 13:39:08 -06:00
James Betker
9934e5d082
Move SRG1 to identical to new
2020-07-05 08:49:34 -06:00