Commit Graph

1005 Commits

Author SHA1 Message Date
James Betker
54accfa693 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-26 11:12:37 -06:00
James Betker
ff58c6484a Fixes to unified chunk datasets to support stereoscopic training 2020-10-26 11:12:22 -06:00
James Betker
b2f803588b Fix multi_modal_train.py 2020-10-26 11:10:22 -06:00
James Betker
f857eb00a8 Allow tecogan losses to compute at 32px 2020-10-26 11:09:55 -06:00
James Betker
629b968901 ChainedGen 4x alteration
Increases conv window for teco_recurrent in the 4x case so all data
can be used.

base_model changes should be temporary.
2020-10-26 10:54:51 -06:00
James Betker
85c07f85d9 Update flownet submodule 2020-10-24 11:59:00 -06:00
James Betker
327cdbe110 Support configurable multi-modal training 2020-10-24 11:57:39 -06:00
James Betker
9c3d059ef0 Updates to be able to train flownet2 in ExtensibleTrainer
Only supports basic losses for now, though.
2020-10-24 11:56:39 -06:00
James Betker
1dbcbfbac8 Restore ChainedEmbeddingGenWithStructure
Still using this guy, after all
2020-10-24 11:54:52 -06:00
James Betker
8e5b6682bf Add PairedFrameDataset 2020-10-23 20:58:07 -06:00
James Betker
7a75d10784 Arch cleanup 2020-10-23 09:35:33 -06:00
James Betker
646d6a621a Support 4x zoom on ChainedEmbeddingGen 2020-10-23 09:25:58 -06:00
James Betker
8636492db0 Copy train.py mods to train2 2020-10-22 17:16:36 -06:00
James Betker
e9c0b9f0fd More adjustments to support multi-modal training
Specifically - looks like at least MSE loss cannot handle autocasted tensors
2020-10-22 16:49:34 -06:00
James Betker
76789a456f Class-ify train.py and workon multi-modal trainer 2020-10-22 16:15:31 -06:00
James Betker
15e00e9014 Finish integration with autocast
Note: autocast is broken when also using checkpoint(). Overcome this by modifying
torch's checkpoint() function in place to also use autocast.
2020-10-22 14:39:19 -06:00
James Betker
d7ee14f721 Move to torch.cuda.amp (not working)
Running into OOM errors, needs diagnosing. Checkpointing here.
2020-10-22 13:58:05 -06:00
James Betker
3e3d2af1f3 Add multi-modal trainer 2020-10-22 13:27:32 -06:00
James Betker
40dc2938e8 Fix multifaceted chain gen 2020-10-22 13:27:06 -06:00
James Betker
f9dc472f63 Misc nonfunctional mods to datasets 2020-10-22 10:16:17 -06:00
James Betker
43c4f92123 Collapse progressive zoom candidates into the batch dimension
This contributes a significant speedup to training this type of network
since losses can operate on the entire prediction spectrum at once.
2020-10-21 22:37:23 -06:00
James Betker
680d635420 Enable ExtensibleTrainer to skip steps when state keys are missing 2020-10-21 22:22:28 -06:00
James Betker
d1175f0de1 Add FFT injector 2020-10-21 22:22:00 -06:00
James Betker
1ef559d7ca Add a ChainedEmbeddingGen which can be simueltaneously used with multiple training paradigms 2020-10-21 22:21:51 -06:00
James Betker
931aa65dd0 Allow recurrent losses to be weighted 2020-10-21 16:59:44 -06:00
James Betker
5753e77d67 ChainedGen: Output debugging information on blocks 2020-10-21 16:36:23 -06:00
James Betker
b54de69153 Misc 2020-10-21 11:08:21 -06:00
James Betker
71c3820d2d Fix process_video 2020-10-21 11:08:12 -06:00
James Betker
3c6e600e48 Add capacity for models to self-report visuals 2020-10-21 11:08:03 -06:00
James Betker
dca5cddb3b Add bypass to ChainedEmbeddingGen 2020-10-21 11:07:45 -06:00
James Betker
d8c6a4bbb8 Misc 2020-10-20 12:56:52 -06:00
James Betker
aba83e7497 Don't apply jpeg corruption & noise corruption together
This causes some severe noise.
2020-10-20 12:56:35 -06:00
James Betker
111450f4e7 Use areal interpolate for multiscale_dataset 2020-10-19 15:30:25 -06:00
James Betker
a63bf2ea2f Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-19 15:26:11 -06:00
James Betker
76e4f0c086 Restore test.py for use as standalone validator 2020-10-19 15:26:07 -06:00
James Betker
1b1ca297f8 Fix recurrent=None bug in ChainedEmbeddingGen 2020-10-19 15:25:12 -06:00
James Betker
331c40f0c8 Allow starting step to be forced
Useful for testing purposes or to force a validation.
2020-10-19 15:23:04 -06:00
James Betker
8ca566b621 Revert "Misc"
This reverts commit 0e3ea63a14.

# Conflicts:
#	codes/test.py
#	codes/train.py
2020-10-19 13:34:54 -06:00
James Betker
b28e4d9cc7 Add spread loss
Experimental loss that peaks around 0.
2020-10-19 11:31:19 -06:00
James Betker
9b9a6e5925 Add get_paths() to base_unsupervised_image_dataset 2020-10-19 11:30:06 -06:00
James Betker
981d64413b Support validation over a custom injector
Also re-enable PSNR
2020-10-19 11:01:56 -06:00
James Betker
ffad0e0422 Allow image corruption in multiscale dataset 2020-10-19 10:10:27 -06:00
James Betker
668cafa798 Push correct patch of recurrent embedding to upstream image, rather than whole thing 2020-10-18 22:39:52 -06:00
James Betker
7df378a944 Remove separated vgg discriminator
Checkpointing happens inline instead. Was a dumb idea..

Also fixes some loss reporting issues.
2020-10-18 12:10:24 -06:00
James Betker
c709d38cd5 Fix memory leak with recurrent loss 2020-10-18 10:22:10 -06:00
James Betker
552e70a032 Get rid of excessive checkpointed disc params 2020-10-18 10:09:37 -06:00
James Betker
6a0d5f4813 Add a checkpointable discriminator 2020-10-18 09:57:47 -06:00
James Betker
9ead2c0a08 Multiscale training in! 2020-10-17 22:54:12 -06:00
James Betker
e706911c83 Fix spinenet bug 2020-10-17 20:20:36 -06:00
James Betker
b008a27d39 Spinenet should allow bypassing the initial conv
This makes feeding in references for recurrence easier.
2020-10-17 20:16:47 -06:00
James Betker
c7f3fc4dd9 Enable chunk_with_reference to work without centers
Moving away from this so it doesn't matter too much. Also fixes an issue
with the "ignore" flag.
2020-10-17 20:09:08 -06:00
James Betker
b45e132a9d Allow first n tiles to be ignored
Helps zoom in with chunked dataset
2020-10-17 09:45:03 -06:00
James Betker
c1c9c5681f Swap recurrence 2020-10-17 08:40:28 -06:00
James Betker
6141aa1110 More recurrence fixes for chainedgen 2020-10-17 08:35:46 -06:00
James Betker
cf8118a85b Allow recurrence to specified for chainedgen 2020-10-17 08:32:29 -06:00
James Betker
fc4c064867 Add recurrent support to chainedgenwithstructure 2020-10-17 08:31:34 -06:00
James Betker
d4a3e11ab2 Don't use several stages of spinenet_arch
These are used for lower outputs which I am not using
2020-10-17 08:28:37 -06:00
James Betker
d1c63ae339 Go back to torch's DDP
Apex was having some weird crashing issues.
2020-10-16 20:47:35 -06:00
James Betker
d856378b2e Add ChainedGenWithStructure 2020-10-16 20:44:36 -06:00
James Betker
96f1be30ed Add use_generator_as_filter 2020-10-16 20:43:55 -06:00
James Betker
617d97e19d Add ChainedEmbeddingGen 2020-10-15 23:18:08 -06:00
James Betker
c4543ce124 Set post_transform_block to None where applicable 2020-10-15 17:20:42 -06:00
James Betker
6f8705e8cb SSGSimpler network 2020-10-15 17:18:44 -06:00
James Betker
1ba01d69b5 Move datasets to INTER_AREA interpolation for downsizing
Looks **FAR** better visually
2020-10-15 17:18:23 -06:00
James Betker
d56745b2ec JPEG-broad adjustment 2020-10-15 10:14:51 -06:00
James Betker
eda75c9779 Cleanup fixes 2020-10-15 10:13:17 -06:00
James Betker
920865defb Arch work 2020-10-15 10:13:06 -06:00
James Betker
1dc0b05428 Add multiscale dataset 2020-10-15 10:12:50 -06:00
James Betker
0f4e03183f New image corruptor gradations 2020-10-15 10:12:25 -06:00
James Betker
1f20d59c31 Revert big switch back 2020-10-14 11:03:34 -06:00
James Betker
9815980329 Update SwitchedConv 2020-10-13 20:57:12 -06:00
James Betker
24792bdb4f Codebase cleanup
Removed a lot of legacy stuff I have no intent on using again.
Plan is to shape this repo into something more extensible (get it? hah!)
2020-10-13 20:56:39 -06:00
James Betker
e620fc05ba Mods to support video processing with teco networks 2020-10-13 20:47:05 -06:00
James Betker
17d78195ee Mods to SRG to support returning switch logits 2020-10-13 20:46:37 -06:00
James Betker
cc915303a5 Fix SPSR calls into SwitchComputer 2020-10-13 10:14:47 -06:00
James Betker
bdf4c38899 Merge remote-tracking branch 'origin/gan_lab' into gan_lab
# Conflicts:
#	codes/models/archs/SwitchedResidualGenerator_arch.py
2020-10-13 10:12:26 -06:00
James Betker
9a5d6162e9 Add the "BigSwitch" 2020-10-13 10:11:10 -06:00
James Betker
8014f050ac Clear metrics properly
Holy cow, what a PITA bug.
2020-10-13 10:07:49 -06:00
James Betker
4d52374e60 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-12 17:43:51 -06:00
James Betker
731700ab2c checkpoint in ssg 2020-10-12 17:43:28 -06:00
James Betker
ca523215c6 Fix recurrent std in arch 2020-10-12 17:42:32 -06:00
James Betker
05377973bf Allow initial recurrent input to be specified (optionally) 2020-10-12 17:36:43 -06:00
James Betker
597b6e92d6 Add ssgr1 recurrence 2020-10-12 17:18:19 -06:00
James Betker
c1a00f31b7 Update switched_conv 2020-10-12 10:37:45 -06:00
James Betker
d7d7590f3e Fix constant injector - wasn't working in test 2020-10-12 10:36:30 -06:00
James Betker
e7cf337dba Fix bug with chunk_with_reference 2020-10-12 10:23:03 -06:00
James Betker
ce163ad4a9 Update SSGdeep 2020-10-12 10:22:08 -06:00
James Betker
2bc5701b10 misc 2020-10-12 10:21:25 -06:00
James Betker
3409d88a1c Add PANet arch 2020-10-12 10:20:55 -06:00
James Betker
7cbf4fa665 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-11 08:33:30 -06:00
James Betker
92cb83958a Return zeros rather than None when image cant be read 2020-10-11 08:33:18 -06:00
James Betker
a9c2e97391 Constant injector and teco fixes 2020-10-11 08:20:07 -06:00
James Betker
e785029936 Mods needed to support SPSR archs with teco gan 2020-10-10 22:39:55 -06:00
James Betker
120072d464 Add constant injector 2020-10-10 21:50:23 -06:00
James Betker
f99812e14d Fix tecogan_losses errors 2020-10-10 20:30:14 -06:00
James Betker
3a5b23b9f7 Alter teco_losses to feed a recurrent input in as separate 2020-10-10 20:21:09 -06:00
James Betker
0d30d18a3d Add MarginRemoval injector 2020-10-09 20:35:56 -06:00
James Betker
0011d445c8 Fix loss indexing 2020-10-09 20:20:51 -06:00
James Betker
202eb11fdc For element loss added 2020-10-09 19:51:44 -06:00
James Betker
61e5047c60 Fix loss accumulator when buffers are not filled
They were reporting incorrect losses.
2020-10-09 19:47:59 -06:00
James Betker
fe50d6f9d0 Fix attention images 2020-10-09 19:21:55 -06:00
James Betker
7e777ea34c Allow tecogan to be used in process_video 2020-10-09 19:21:43 -06:00
James Betker
58d8bf8f69 Add network architecture built for teco 2020-10-09 08:40:14 -06:00
James Betker
b3d0baaf17 Improve multiframe dataset memory usage 2020-10-09 08:40:00 -06:00
James Betker
afe6af88af Fix attention print issue 2020-10-08 18:34:00 -06:00
James Betker
4c85ee51a4 Converge SSG architectures into unified switching base class
Also adds attention norm histogram to logging
2020-10-08 17:23:21 -06:00
James Betker
3cc56cd00b Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-08 16:12:05 -06:00
James Betker
7d8d9dafbb misc 2020-10-08 16:12:00 -06:00
James Betker
856ef4d21d Update switched_conv 2020-10-08 16:10:23 -06:00
James Betker
1eb516d686 Fix more distributed bugs 2020-10-08 14:32:45 -06:00
James Betker
b36ba0460c Fix multi-frame dataset OBO error 2020-10-08 12:21:04 -06:00
James Betker
fba29d7dcc Move to apex distributeddataparallel and add switch all_reduce
Torch's distributed_data_parallel is missing "delay_allreduce", which is
necessary to get gradient checkpointing to work with recurrent models.
2020-10-08 11:20:05 -06:00
James Betker
c174ac0fd5 Allow tecogan to support generators that only output a tensor (instead of a list) 2020-10-08 09:26:25 -06:00
James Betker
969bcd9021 Use local checkpoint in SSG 2020-10-08 08:54:46 -06:00
James Betker
c93dd623d7 Tecogan losses work 2020-10-07 23:11:58 -06:00
James Betker
29bf78d791 Update switched_conv submodule 2020-10-07 23:11:50 -06:00
James Betker
c96f5b2686 Import switched_conv as a submodule 2020-10-07 23:10:54 -06:00
James Betker
c352c8bce4 More tecogan fixes 2020-10-07 12:41:17 -06:00
James Betker
a62a5dbb5f Clone and detach in recursively_detach 2020-10-07 12:41:00 -06:00
James Betker
1c44d395af Tecogan work
Its training!  There's still probably plenty of bugs though..
2020-10-07 09:03:30 -06:00
James Betker
e9d7371a61 Add concatenate injector 2020-10-07 09:02:42 -06:00
James Betker
8a7e993aea Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-06 20:41:58 -06:00
James Betker
b2c4b2a16d Move gpu_ids out of if statement 2020-10-06 20:40:20 -06:00
James Betker
1e415b249b Add tag that can be applied to prevent parameter training 2020-10-06 20:39:49 -06:00
James Betker
2f2e3f33f8 StackedSwitchedGenerator_5lyr 2020-10-06 20:39:32 -06:00
James Betker
6217b48e3f Fix spsr_arch bug 2020-10-06 20:38:47 -06:00
James Betker
4290918359 Add distributed_checkpoint for more efficient checkpoints 2020-10-06 20:38:38 -06:00
James Betker
cffc596141 Integrate flownet2 into codebase, add teco visual debugs 2020-10-06 20:35:39 -06:00
James Betker
e4b89a172f Reduce spsr7 memory usage 2020-10-05 22:05:56 -06:00
James Betker
4111942ada Support attention deferral in deep ssgr 2020-10-05 19:35:55 -06:00
James Betker
840927063a Work on tecogan losses 2020-10-05 19:35:28 -06:00
James Betker
0e3ea63a14 Misc 2020-10-05 18:01:50 -06:00
James Betker
2875822024 SPSR9 arch
takes some of the stuff I learned with SGSR yesterday and applies it to spsr
2020-10-05 08:47:51 -06:00
James Betker
51044929af Don't compute attention statistics on multiple generator invocations of the same data 2020-10-05 00:34:29 -06:00
James Betker
e760658fdb Another fix.. 2020-10-04 21:08:00 -06:00
James Betker
a890e3a9c0 Fix geometric loss not handling 0 index 2020-10-04 21:05:01 -06:00
James Betker
c3ef8a4a31 Stacked switches - return a tuple 2020-10-04 21:02:24 -06:00
James Betker
13f97e1e97 Add recursive loss 2020-10-04 20:48:15 -06:00
James Betker
ffd069fd97 Lots of SSG work
- Checkpointed pretty much the entire model - enabling recurrent inputs
- Added two new models for test - adding depth (again) and removing SPSR (in lieu of the new losses)
2020-10-04 20:48:08 -06:00
James Betker
aca2c7ab41 Full checkpoint-ize SSG1 2020-10-04 18:24:52 -06:00
James Betker
fc396baf1a Move loaded_options to util
Doesn't seem to work with python 3.6
2020-10-03 20:29:06 -06:00
James Betker
2d8e9a9d30 Options fix? 2020-10-03 20:27:12 -06:00
James Betker
e3294939b0 Revert "SSG: offer option to use BN-based attention normalization"
Didn't work. Oh well.

This reverts commit 5cd2b37591.
2020-10-03 17:54:53 -06:00
James Betker
43c6c67fd1 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-03 16:17:31 -06:00
James Betker
5cd2b37591 SSG: offer option to use BN-based attention normalization
Not sure how this is going to work, lets try it.
2020-10-03 16:16:19 -06:00
James Betker
c896939523 Fix recursive checkpoint 2020-10-03 16:15:52 -06:00
James Betker
3cbb9ecd45 Misc 2020-10-03 16:15:42 -06:00
James Betker
35731502c3 Fix checkpoint recursion 2020-10-03 12:52:50 -06:00
James Betker
9b4ed82093 Get rid of unused convs in spsr7 2020-10-03 11:36:26 -06:00
James Betker
b2b81b13a4 Remove recursive utils import 2020-10-03 11:30:05 -06:00
James Betker
3561cc164d Fix up fea_loss calculator (for validation)
Not sure how this was working in regular training mode, but it
was failing in DDP.
2020-10-03 11:19:20 -06:00
James Betker
21d3bb83b2 Use tqdm reporting with validation 2020-10-03 11:16:39 -06:00
James Betker
6c9718ad64 Don't log if you aren't 0 rank 2020-10-03 11:14:13 -06:00
James Betker
922b1d76df Don't record visuals when not on rank 0 2020-10-03 11:10:03 -06:00
James Betker
8197fd646f Don't accumulate losses for metrics when the loss isn't a tensor 2020-10-03 11:03:55 -06:00
James Betker
19a4075e1e Allow checkpointing to be disabled in the options file
Also makes options a global variable for usage in utils.
2020-10-03 11:03:28 -06:00
James Betker
dd9d7b27ac Add more sophisticated mechanism for balancing GAN losses 2020-10-02 22:53:42 -06:00
James Betker
39865ca3df TOTAL_loss, dumbo 2020-10-02 21:06:10 -06:00
James Betker
4e44fcd655 Loss accumulator fix 2020-10-02 20:55:33 -06:00
James Betker
567b4d50a4 ExtensibleTrainer - don't compute backward when there is no loss 2020-10-02 20:54:06 -06:00
James Betker
146a9125f2 Modify geometric & translational losses so they can be used with embeddings 2020-10-02 20:40:13 -06:00
James Betker
e30a1443cd Change sw2 refs 2020-10-02 09:01:18 -06:00
James Betker
e38716925f Fix spsr8 class init 2020-10-02 09:00:18 -06:00
James Betker
efbf6b737b Update validate_data to work with SingleImageDataset 2020-10-02 08:58:34 -06:00
James Betker
35469f08e2 Spsr 8 2020-10-02 08:58:15 -06:00
James Betker
c9a9e5c525 Prompt user for gpu_id if multiple gpus are detected 2020-10-01 17:24:50 -06:00
James Betker
aa4fd89018 resnext with groupnorm 2020-10-01 15:49:28 -06:00
James Betker
8beaa47933 resnext discriminator 2020-10-01 11:48:14 -06:00
James Betker
55f2764fef Allow fixup50 to be used as a discriminator 2020-10-01 11:28:18 -06:00
James Betker
7986185fcb Change 'mod_step' to 'every' 2020-10-01 11:28:06 -06:00
James Betker
d9ae970fd9 SSG update 2020-10-01 11:27:51 -06:00
James Betker
e3053e4e55 Exchange SpsrNet for SpsrNetSimplified 2020-09-30 17:01:04 -06:00
James Betker
66d4512029 Fix up translational equivariance loss so it's ready for prime time 2020-09-30 12:01:00 -06:00
James Betker
896b4f5be2 Revert "spsr7 adjustments"
This reverts commit 9fee1cec71.
2020-09-29 18:30:41 -06:00
James Betker
9fee1cec71 spsr7 adjustments 2020-09-29 17:19:59 -06:00
James Betker
dc8f3b24de Don't let duplicate keys be used for injectors and losses 2020-09-29 16:59:44 -06:00
James Betker
0b5a033503 spsr7 + cleanup
SPSR7 adds ref onto spsr6, makes more "common sense" mods.
2020-09-29 16:59:26 -06:00
James Betker
f9b83176f1 Fix bugs in extensibletrainer 2020-09-28 22:09:42 -06:00
James Betker
db52bec4ab spsr6
This is meant to be a variant of SPSR5 that harkens
back to the simpler earlier architectures that do not
have embeddings or ref_ inputs, but do have deep
multiplexers. It does, however, use some of the new
conjoin mechanisms.
2020-09-28 22:09:27 -06:00
James Betker
7e240f2fed Recurrent / teco work 2020-09-28 22:06:56 -06:00
James Betker
57814f18cf More features for multi-frame-dataset 2020-09-28 14:26:15 -06:00
James Betker
aeaf185314 Add RCAN 2020-09-27 16:00:41 -06:00
James Betker
4d29b7729e Model arch cleanup 2020-09-27 11:18:45 -06:00
James Betker
7dff802144 Add MultiFrameDataset
Retrieves video sequence patches rather than single images.
2020-09-27 11:13:06 -06:00
James Betker
d8c3fc9327 Fix random noise corruptor
It was functioning as a color shift
2020-09-27 11:12:24 -06:00
James Betker
c85da79697 Move many dataset functions into a base class 2020-09-27 11:11:58 -06:00
James Betker
eb12b5f887 Misc 2020-09-26 21:27:17 -06:00
James Betker
31641d7f63 Add ImagePatchInjector and TranslationalLoss 2020-09-26 21:25:32 -06:00
James Betker
d8621e611a BackboneSpineNoHead takes ref 2020-09-26 21:25:04 -06:00
James Betker
5a27187c59 More mods to accomodate new dataset 2020-09-25 22:45:57 -06:00
James Betker
254cb1e915 More dataset integration work 2020-09-25 22:19:38 -06:00
James Betker
6d0490a0e6 Tecogan implementation work 2020-09-25 16:38:23 -06:00
James Betker
ce4613ecb9 Finish up single_image_dataset work
Sweet!
2020-09-25 16:37:54 -06:00
James Betker
1cf73c2cce Fix dataset for a val set that includes lq 2020-09-24 18:01:07 -06:00
James Betker
ea565b7eaf More fixes 2020-09-24 17:51:52 -06:00
James Betker
553917a8d1 Fix torchvision import bug 2020-09-24 17:38:34 -06:00
James Betker
58886109d4 Update how spsr arches do attention to conform with sgsr 2020-09-24 16:53:54 -06:00
James Betker
9a50a7966d SiLU doesnt support inplace 2020-09-23 21:09:13 -06:00
James Betker
eda0eadba2 Use custom SiLU
Torch didnt have this before 1.7
2020-09-23 21:05:06 -06:00
James Betker
05963157c1 Several things
- Fixes to 'after' and 'before' defs for steps (turns out they werent working)
- Feature nets take in a list of layers to extract. Not fully implemented yet.
- Fixes bugs with RAGAN
- Allows real input into generator gan to not be detached by param
2020-09-23 11:56:36 -06:00
James Betker
4ab989e015 try again.. 2020-09-22 18:27:52 -06:00
James Betker
3b6c957194 Fix? it again? 2020-09-22 18:25:59 -06:00
James Betker
7b60d9e0d8 Fix? cosine loss 2020-09-22 18:18:35 -06:00
James Betker
2e18c4c22d Add CosineEmbeddingLoss to F 2020-09-22 17:10:29 -06:00
James Betker
f40beb5460 Add 'before' and 'after' defs to injections, steps and optimizers 2020-09-22 17:03:22 -06:00
James Betker
419f77ec19 Some new backbones 2020-09-21 12:36:49 -06:00
James Betker
9429544a60 Spinenet: implementation without 4x downsampling right off the bat 2020-09-21 12:36:30 -06:00
James Betker
384e3d54cc Extract images into jpg, have a multiplier & size threshold 2020-09-21 12:36:03 -06:00
James Betker
bde35ced47 Fix recursive detach 2020-09-20 19:08:13 -06:00
James Betker
53a5657850 Fix SSGR 2020-09-20 19:07:15 -06:00
James Betker
17c569ea62 Add geometric loss 2020-09-20 16:24:23 -06:00
James Betker
17dd99b29b Fix bug with discriminator noise addition
It wasn't using the scale and was applying the noise to the
underlying state variable.
2020-09-20 12:00:27 -06:00
James Betker
dab8ab8a8f Offer option to configure the size of the normal distribution that the target size is drawn from 2020-09-20 11:59:31 -06:00
James Betker
3138f98fbc Allow discriminator noise to be injected at the loss level, cleans up configs 2020-09-19 21:47:52 -06:00
James Betker
e9a39bfa14 Recursively detach all outputs, even if they are nested in data structures 2020-09-19 21:47:34 -06:00
James Betker
fe82785ba5 Add some new architectures to ssg 2020-09-19 21:47:10 -06:00
James Betker
b83f097082 Get rid of get_debug_values from RRDB, rectify outputs 2020-09-19 21:46:36 -06:00
James Betker
e0bd68efda Add ImageFlowInjector 2020-09-19 10:07:00 -06:00
James Betker
e2a146abc7 Add in experiments hook 2020-09-19 10:05:25 -06:00
James Betker
4f75cf0f02 Revert "Full image dataset operates on lists"
Going with an entirely new dataset instead..

This reverts commit 36ec32bf11.
2020-09-18 09:50:43 -06:00
James Betker
36ec32bf11 Full image dataset operates on lists 2020-09-18 09:50:26 -06:00
James Betker
3cb2a9a9d3 New dataset, initial work 2020-09-18 09:49:13 -06:00
James Betker
9a17ade550 Some convenience adjustments to ExtensibleTrainer 2020-09-17 21:05:32 -06:00
James Betker
57fc3f490c Add script for extracting image tiles with reference images 2020-09-17 13:30:51 -06:00
James Betker
9963b37200 Add a new script for loading a discriminator network and using it to filter images 2020-09-17 13:30:32 -06:00
James Betker
f5cd23e2d5 Further patch size adjustments 2020-09-16 16:50:35 -06:00
James Betker
723754c133 Update attention debugger outputting for SSG 2020-09-16 13:09:46 -06:00
James Betker
0b047e5f80 Increase scale of the patch selector random distribution
This will cause larger slices of an image to appear more frequently,
increasing the difficulty of the generator.
2020-09-16 08:27:42 -06:00
James Betker
f211575e9d Save models before validation
Validation often fails with OOM, wasting hours of training time.
Save models first.
2020-09-16 08:17:17 -06:00
James Betker
0918430572 SSG network
This branches off of SPSR. It is identical but substantially reduced
in complexity. It's intended to be my long term working arch.
2020-09-15 20:59:24 -06:00
James Betker
c833bd1eac Misc changes 2020-09-15 20:57:59 -06:00
James Betker
6deab85b9b Add BackboneEncoderNoRef 2020-09-15 16:55:38 -06:00
James Betker
d0321ca5de Don't load amp state dict if amp is disabled 2020-09-14 15:21:42 -06:00
James Betker
94deab2792 Fix error serving gt_fullsize_ref in full_image_dataset 2020-09-14 15:05:44 -06:00
James Betker
ccf8438001 SPSR5
This is SPSR4, but the multiplexers have access to the output of the transformations
for making their decision.
2020-09-13 20:10:24 -06:00
James Betker
5b85f891af Only log the name of the first network in the total_loss training set 2020-09-12 16:07:09 -06:00
James Betker
fb595e72a4 Supporting infrastructure in ExtensibleTrainer to train spsr4
Need to be able to train 2 nets in one step: the backbone will be entirely separate
with its own optimizer (for an extremely low LR).

This functionality was already present, just not implemented correctly.
2020-09-11 22:57:06 -06:00
James Betker
4e44bca611 SPSR4
aka - return of the backbone! I'm tired of massively overparameterized generators
with pile-of-shit multiplexers. Let's give this another try..
2020-09-11 22:55:37 -06:00
James Betker
19896abaea Clean up old SwitchedSpsr arch
It didn't work anyways, so why not?
2020-09-11 16:09:28 -06:00
James Betker
4c2ee66fe4 Fix video processor 2020-09-11 13:10:14 -06:00
James Betker
50ca17bb0a Feature mode -> back to LR fea 2020-09-11 13:09:55 -06:00
James Betker
1086f0476b Fix ref branch using fixed filters 2020-09-11 08:58:35 -06:00
James Betker
8c469b8286 Enable memory checkpointing 2020-09-11 08:44:29 -06:00
James Betker
5189b11dac Add combined dataset for training across multiple datasets 2020-09-11 08:44:06 -06:00
James Betker
313424d7b5 Add new referencing discriminator
Also extend the way losses work so that you can pass
parameters into the discriminator from the config file
2020-09-10 21:35:29 -06:00
James Betker
9e5aa166de Report the standard deviation of ref branches
This patch also ups the contribution
2020-09-10 16:34:41 -06:00
James Betker
668bfbff6d Back to best arch for spsr3 2020-09-10 14:58:14 -06:00
James Betker
992b0a8d98 spsr3 with conjoin stage as part of the switch 2020-09-10 09:11:37 -06:00
James Betker
e0fc5eb50c Temporary commit - noise 2020-09-09 17:12:52 -06:00
James Betker
00da69d450 Temporary commit - ref 2020-09-09 17:09:44 -06:00
James Betker
df59d6c99d More spsr3 mods
- Most branches get their own noise vector now.
- First attention branch has the intended sole purpose of raw image processing
- Remove norms from joiner block
2020-09-09 16:46:38 -06:00
James Betker
747ded2bf7 Fixes to the spsr3
Some lessons learned:
- Biases are fairly important as a relief valve. They dont need to be everywhere, but
  most computationally heavy branches should have a bias.
- GroupNorm in SPSR is not a great idea. Since image gradients are represented
   in this model, normal means and standard deviations are not applicable. (imggrad
   has a high representation of 0).
- Don't fuck with the mainline of any generative model. As much as possible, all
   additions should be done through residual connections. Never pollute the mainline
   with reference data, do that in branches. It basically leaves the mode untrainable.
2020-09-09 15:28:14 -06:00
James Betker
0ffac391c1 SPSR with ref joining 2020-09-09 11:17:07 -06:00
James Betker
c41dc9a48c Add missing requirements 2020-09-09 10:46:08 -06:00
James Betker
3027e6e27d Enable amp to be disabled 2020-09-09 10:45:59 -06:00
James Betker
c04f244802 More mods 2020-09-08 20:36:27 -06:00
James Betker
dffbfd2ec4 Allow SRG checkpointing to be toggled 2020-09-08 15:14:43 -06:00
James Betker
e6207d4c50 SPSR3 work
SPSR3 is meant to fix whatever is causing the switching units
inside of the newer SPSR architectures to fail and basically
not use the multiplexers.
2020-09-08 15:14:23 -06:00
James Betker
5606e8b0ee Fix SRGAN_model/fullimgdataset compatibility 1 2020-09-08 11:34:35 -06:00
James Betker
22c98f1567 Move MultiConvBlock to arch_util 2020-09-08 08:17:27 -06:00
James Betker
146ace0859 CSNLN changes (removed because it doesnt train well) 2020-09-08 08:04:16 -06:00
James Betker
f43df7f5f7 Make ExtensibleTrainer compatible with process_video 2020-09-08 08:03:41 -06:00
James Betker
a18ece62ee Add updated spsr net for test 2020-09-07 17:01:48 -06:00
James Betker
55475d2ac1 Clean up unused archs 2020-09-07 11:38:11 -06:00
James Betker
e8613041c0 Add novograd optimizer 2020-09-06 17:27:08 -06:00
James Betker
a5c2388368 Use lower LQ image size when it is being fed in 2020-09-06 17:26:32 -06:00
James Betker
b1238d29cb Fix trainable not applying to discriminators 2020-09-05 20:31:26 -06:00
James Betker
21ae135f23 Allow Novograd to be used as an optimizer 2020-09-05 16:50:13 -06:00
James Betker
912a4d9fea Fix srg computer bug 2020-09-05 07:59:54 -06:00
James Betker
0dfd8eaf3b Support injectors that run in eval only 2020-09-05 07:59:45 -06:00
James Betker
17aa205e96 New dataset that reads from lmdb 2020-09-04 17:32:57 -06:00
James Betker
44c75f7642 Undo SRG change 2020-09-04 17:32:16 -06:00
James Betker
6657a406ac Mods needed to support training a corruptor again:
- Allow original SPSRNet to have a specifiable block increment
- Cleanup
- Bug fixes in code that hasnt been touched in awhile.
2020-09-04 15:33:39 -06:00
James Betker
bfdfaab911 Checkpoint RRDB
Greatly reduces memory consumption with a low performance penalty
2020-09-04 15:32:00 -06:00
James Betker
8580490a85 Reduce usage of resize operations when not needed in dataloaders. 2020-09-04 15:31:24 -06:00
James Betker
6226b52130 Pin memory in dataloaders by default 2020-09-04 15:30:46 -06:00
James Betker
64a24503f6 Add extract_subimages_with_ref_lmdb for generating lmdb with reference images 2020-09-04 15:30:34 -06:00
James Betker
696242064c Use tensor checkpointing to drastically reduce memory usage
This comes at the expense of computation, but since we can use much larger
batches, it results in a net speedup.
2020-09-03 11:33:36 -06:00
James Betker
365813bde3 Add InterpolateInjector 2020-09-03 11:32:47 -06:00
James Betker
d90c96e55e Fix greyscale injector 2020-09-02 10:29:40 -06:00
James Betker
8b52d46847 Interpreted feature loss to extensibletrainer 2020-09-02 10:08:24 -06:00
James Betker
886d59d5df Misc fixes & adjustments 2020-09-01 07:58:11 -06:00
James Betker
0a9b85f239 Fix vgg_gn input_img_factor 2020-08-31 09:50:30 -06:00
James Betker
4b4d08bdec Enable testing in ExtensibleTrainer, fix it in SRGAN_model
Also compute fea loss for this.
2020-08-31 09:41:48 -06:00
James Betker
b2091cb698 feamod fix 2020-08-30 08:08:49 -06:00
James Betker
a56e906f9c train HR feature trainer 2020-08-29 09:27:48 -06:00
James Betker
0e859a8082 4x spsr ref (not workin) 2020-08-29 09:27:18 -06:00
James Betker
623f3b99b2 Stupid pathing.. 2020-08-26 17:58:24 -06:00
James Betker
25832930db Update loss with lr crossgan 2020-08-26 17:57:22 -06:00
James Betker
80aa83bfd2 Try copytree for tb_logger again. 2020-08-26 17:55:02 -06:00
James Betker
cbd5e7a986 Support old school crossgan in extensibletrainer 2020-08-26 17:52:35 -06:00
James Betker
b593d8e7c3 Save tb_logger to alt_path 2020-08-26 17:45:07 -06:00
James Betker
8a6a2e6e2e Rev3 of the full image ref arch 2020-08-26 17:11:01 -06:00
James Betker
f35b3ad28f Fix val behavior for ExtensibleTrainer 2020-08-26 08:44:22 -06:00
James Betker
434ed70a9a Wrap vgg disc 2020-08-25 18:14:45 -06:00
James Betker
83f2f8d239 more debugging 2020-08-25 18:12:12 -06:00
James Betker
3f60281da7 Print when wrapping 2020-08-25 18:08:46 -06:00
James Betker
bae18c05e6 wrap disc grad 2020-08-25 17:58:20 -06:00
James Betker
f85f1e21db Turns out, can't do that 2020-08-25 17:18:52 -06:00
James Betker
935a735327 More dohs 2020-08-25 17:05:16 -06:00
James Betker
53e67bdb9c Distribute get_grad_no_padding 2020-08-25 17:03:18 -06:00
James Betker
2f706b7d93 I an inept. 2020-08-25 16:42:59 -06:00
James Betker
8bae0de769 ffffffffffffffffff 2020-08-25 16:41:01 -06:00
James Betker
1fe16f71dd Fix bug reporting spsr gan weight 2020-08-25 16:37:45 -06:00
James Betker
96586d6592 Fix distributed d_grad 2020-08-25 16:06:27 -06:00
James Betker
09a9079e17 Check rank before doing image logging. 2020-08-25 16:00:49 -06:00
James Betker
a1800f45ef Fix for referencingmultiplexer 2020-08-25 15:43:12 -06:00
James Betker
19487d9bbd Fix distributed launch for large distributed runs 2020-08-25 15:42:59 -06:00
James Betker
03eb29a4d9 Fix LQGT dataset 2020-08-25 11:57:25 -06:00
James Betker
a65b07607c Reference network 2020-08-25 11:56:59 -06:00
James Betker
f224907603 Fix LQGT_dataset, add full_image_dataset 2020-08-24 17:12:43 -06:00
James Betker
5ec04aedc8 Let noise be configurable
LQ noise is not currently configurable for some reason..
2020-08-24 15:00:14 -06:00
James Betker
f9276007a8 More fixes to corrupt_fea 2020-08-23 17:52:18 -06:00
James Betker
0005c56cd4 dbg 2020-08-23 17:43:03 -06:00
James Betker
4bb5b3c981 corfea debugging 2020-08-23 17:39:02 -06:00
James Betker
7713cb8df5 Corrupted features in srgan 2020-08-23 17:32:03 -06:00
James Betker
dffc15184d More ExtensibleTrainer work
It runs now, just need to debug it to reach performance parity with SRGAN. Sweet.
2020-08-23 17:22:45 -06:00
James Betker
afdd93fbe9 Grey feature 2020-08-22 13:41:38 -06:00
James Betker
e59e712e39 More ExtensibleTrainer work 2020-08-22 13:08:33 -06:00
James Betker
f40545f235 ExtensibleTrainer work 2020-08-22 08:24:34 -06:00
James Betker
a498d7b1b3 Report l_g_gan_grad before weight multiplication 2020-08-20 11:57:53 -06:00
James Betker
9d77a4db2e Allow initial temperature to be specified to SPSR net for inference 2020-08-20 11:57:34 -06:00
James Betker
24bdcc1181 Let SwitchedSpsr transform count be specified 2020-08-18 09:10:25 -06:00
James Betker
40bb0597bb misc 2020-08-18 08:50:24 -06:00
James Betker
74cdaa2226 Some work on extensible trainer 2020-08-18 08:49:32 -06:00
James Betker
0c98c61f4a Enable start_step to be specified 2020-08-15 18:34:59 -06:00
James Betker
868d0aa442 Undo early dim reduction on grad branch for SPSR_arch 2020-08-14 16:23:42 -06:00
James Betker
2d205f52ac Unite spsr_arch switched gens
Found a pretty good basis model.
2020-08-12 17:04:45 -06:00
James Betker
bdaa67deb7 Misc 2020-08-12 08:46:15 -06:00
James Betker
3d0ece804b SPSR LR2 2020-08-12 08:45:49 -06:00
James Betker
ab04ca1778 Extensible trainer (in progress) 2020-08-12 08:45:23 -06:00
James Betker
cb316fabc7 Use LR data for image gradient prediction when HR data is disjoint 2020-08-10 15:00:28 -06:00
James Betker
f0e2816239 Denoise attention maps 2020-08-10 14:59:58 -06:00
James Betker
59aba1daa7 LR switched SPSR arch
This variant doesn't do conv processing at HR, which should save
a ton of memory in inference. Lets see how it works.
2020-08-10 13:03:36 -06:00
James Betker
4e972144ae More attention fixes for switched_spsr 2020-08-07 21:11:50 -06:00
James Betker
d02509ef97 spsr_switched missing import 2020-08-07 21:05:29 -06:00
James Betker
887806ffa0 Finish up spsr_switched 2020-08-07 21:03:48 -06:00
James Betker
1d5f4f6102 Crossgan 2020-08-07 21:03:39 -06:00
James Betker
fd7b6ca0a9 Comptue gan_grad_branch.... 2020-08-06 12:11:40 -06:00
James Betker
30b16d5235 Update how branch GAN grad is disseminated 2020-08-06 11:13:02 -06:00
James Betker
1f21c02f8b Add cross-compare discriminator 2020-08-06 08:56:21 -06:00
James Betker
be272248af More RAGAN fixes 2020-08-05 16:47:21 -06:00
James Betker
26a6a5d512 Compute grad GAN loss against both the branch and final target, simplify pixel loss
Also fixes a memory leak issue where we weren't detaching our loss stats when
logging them. This stabilizes memory usage substantially.
2020-08-05 12:08:15 -06:00
James Betker
299ee13988 More RAGAN fixes 2020-08-05 11:03:06 -06:00
James Betker
b8a4df0a0a Enable RAGAN in SPSR, retrofit old RAGAN for efficiency 2020-08-05 10:34:34 -06:00
James Betker
3ab39f0d22 Several new spsr nets 2020-08-05 10:01:24 -06:00
James Betker
3c0a2d6efe Fix grad branch debug out 2020-08-04 16:43:43 -06:00
James Betker
ec2a795d53 Fix multistep optimizer (feeding from wrong config params) 2020-08-04 16:42:58 -06:00
James Betker
4bfbdaf94f Don't recompute generator outputs for D in standard operation
Should significantly improve training performance with negligible
results differences.
2020-08-04 11:28:52 -06:00
James Betker
11b227edfc Whoops 2020-08-04 10:30:40 -06:00
James Betker
6d25bcd5df Apply fixes to grad discriminator 2020-08-04 10:25:13 -06:00
James Betker
96d66f51c5 Update requirements 2020-08-03 16:57:56 -06:00
James Betker
c7e5d3888a Add pix_grad_branch loss to metrics 2020-08-03 16:21:05 -06:00
James Betker
0d070b47a7 Add simplified SPSR architecture
Basically just cleaning up the code, removing some bad conventions,
and reducing complexity somewhat so that I can play around with
this arch a bit more easily.
2020-08-03 10:25:37 -06:00
James Betker
47e24039b5 Fix bug that makes feature loss run even when it is off 2020-08-02 20:37:51 -06:00
James Betker
328afde9c0 Integrate SPSR into SRGAN_model
SPSR_model really isn't that different from SRGAN_model. Rather than continuing to re-implement
everything I've done in SRGAN_model, port the new stuff from SPSR over.

This really demonstrates the need to refactor SRGAN_model a bit to make it cleaner. It is quite the
beast these days..
2020-08-02 12:55:08 -06:00
James Betker
c8da78966b Substantial SPSR mods & fixes
- Added in gradient accumulation via mega-batch-factor
- Added AMP
- Added missing train hooks
- Added debug image outputs
- Cleaned up including removing GradientPenaltyLoss, custom SpectralNorm
- Removed all the custom discriminators
2020-08-02 10:45:24 -06:00
James Betker
f894ba8f98 Add SPSR_module
This is a port from the SPSR repo, it's going to need a lot of work to be properly integrated
but as of this commit it at least runs.
2020-08-01 22:02:54 -06:00
James Betker
f33ed578a2 Update how attention_maps are created 2020-08-01 20:23:46 -06:00
James Betker
c139f5cd17 More torch 1.6 fixes 2020-07-31 17:03:20 -06:00
James Betker
a66fbb32b6 Fix fixed_disc DataParallel issue 2020-07-31 16:59:23 -06:00
James Betker
8dd44182e6 Fix scale torch warning 2020-07-31 16:56:04 -06:00
James Betker
bcebed19b7 Fix pixdisc bugs 2020-07-31 16:38:14 -06:00
James Betker
eb11a08d1c Enable disjoint feature networks
This is done by pre-training a feature net that predicts the features
of HR images from LR images. Then use the original feature network
and this new one in tandem to work only on LR/Gen images.
2020-07-31 16:29:47 -06:00
James Betker
6e086d0c20 Fix fixed_disc 2020-07-31 15:07:10 -06:00
James Betker
d5fa059594 Add capability to have old discriminators serve as feature networks 2020-07-31 14:59:54 -06:00
James Betker
6b45b35447 Allow multi_step_lr_scheduler to load a new LR schedule when restoring state 2020-07-31 11:21:11 -06:00
James Betker
e37726f302 Add feature_model for training custom feature nets 2020-07-31 11:20:39 -06:00
James Betker
7629cb0e61 Add FDPL Loss
New loss type that can replace PSNR loss. Works against the frequency domain
and focuses on frequency features loss during hr->lr conversion.
2020-07-30 20:47:57 -06:00
James Betker
85ee64b8d9 Turn down feadisc intensity
Honestly - this feature is probably going to be removed soon, so backwards
compatibility is not a huge deal anymore.
2020-07-27 15:28:55 -06:00
James Betker
ebb199e884 Get rid of safety valve (probably being encountered in val) 2020-07-26 22:51:59 -06:00
James Betker
0892d5fe99 LQGT_dataset gan debug 2020-07-26 22:48:35 -06:00
James Betker
d09ed4e5f7 Misc fixes 2020-07-26 22:44:24 -06:00
James Betker
c54784ae9e Fix feature disc log item error 2020-07-26 22:25:59 -06:00
James Betker
9a8f227501 Allow separate dataset to pushed in for GAN-only training 2020-07-26 21:44:45 -06:00
James Betker
b06e1784e1 Fix SRG4 & switch disc
"fix". hehe.
2020-07-25 17:16:54 -06:00
James Betker
e6e91a1d75 Add SRG4
Back to the idea that maybe what we need is a hybrid
approach between pure switches and RDB.
2020-07-24 20:32:49 -06:00
James Betker
3320ad685f Fix mega_batch_factor not set for test 2020-07-24 12:26:44 -06:00
James Betker
c50cce2a62 Add an abstract, configurabler weight scheduling class and apply it to the feature weight 2020-07-23 17:03:54 -06:00
James Betker
9ccf771629 Fix feature validation, wrong device
Only shows up in distributed training for some reason.
2020-07-23 10:16:34 -06:00
James Betker
a7541b6d8d Fix illegal tb_logger use in distributed training 2020-07-23 09:14:01 -06:00
James Betker
bba283776c Enable find_unused_parameters for DistributedDataParallel
attention_norm has some parameters which are not used to compute grad,
which is causing failures in the distributed case.
2020-07-23 09:08:13 -06:00
James Betker
dbf6147504 Add switched discriminator
The logic is that the discriminator may be incapable of providing a truly
targeted loss for all image regions since it has to be too generic
(basically the same argument for the switched generator). So add some
switches in! See how it works!
2020-07-22 20:52:59 -06:00
James Betker
8a0a1569f3 Enable force_multiple in LQ_dataset 2020-07-22 20:51:16 -06:00
James Betker
106b8da315 Assert that temperature is set properly in eval mode. 2020-07-22 20:50:59 -06:00
James Betker
c74b9ee2e4 Add a way to disable grad on portions of the generator graph to save memory 2020-07-22 11:40:42 -06:00
James Betker
e3adafbeac Add convert_model.py and a hacky way to add extra layers to a model 2020-07-22 11:39:45 -06:00
James Betker
7f7e17e291 Update feature discriminator further
Move the feature/disc losses closer and add a feature computation layer.
2020-07-20 20:54:45 -06:00
James Betker
46aa776fbb Allow feature discriminator unet to only output closest layer to feature output 2020-07-19 19:05:08 -06:00
James Betker
8a9f215653 Huge set of mods to support progressive generator growth 2020-07-18 14:18:48 -06:00
James Betker
47a525241f Make attention norm optional 2020-07-18 07:24:02 -06:00
James Betker
ad97a6a18a Progressive SRG first check-in 2020-07-18 07:23:26 -06:00
James Betker
b08b1cad45 Fix feature decay 2020-07-16 23:27:06 -06:00
James Betker
3e7a83896b Fix pixgan debugging issues 2020-07-16 11:45:19 -06:00
James Betker
a1bff64d1a More fixes 2020-07-16 10:48:48 -06:00
James Betker
240f254263 More loss fixes 2020-07-16 10:45:50 -06:00
James Betker
6cfa67d831 Fix featuredisc broadcast error 2020-07-16 10:18:30 -06:00
James Betker
8d061a2687 Add u-net discriminator with feature output 2020-07-16 10:10:09 -06:00
James Betker
0c4c388e15 Remove dualoutputsrg
Good idea, didn't pan out.
2020-07-16 10:09:24 -06:00
James Betker
4bcc409fc7 Fix loadSRG2 typo 2020-07-14 10:20:53 -06:00