Commit Graph

791 Commits

Author SHA1 Message Date
James Betker
dcfe994fee Add standalone srg2_classic
Trying to investigate how I was so misguided. I *thought* srg2 was considerably
better than RRDB in performance but am not actually seeing that.
2020-10-31 20:55:34 -06:00
James Betker
ea8c20c0e2 Fix bug with multiscale_dataset 2020-10-31 20:54:41 -06:00
James Betker
bb39d3efe5 Bump image corruption factor a bit 2020-10-31 20:50:24 -06:00
James Betker
eb7df63592 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-31 11:09:32 -06:00
James Betker
c2866ad8d2 Disable debugging of comparable pingpong generations 2020-10-31 11:09:10 -06:00
James Betker
7303d8c932 Add psnr approximator 2020-10-31 11:08:55 -06:00
James Betker
565517814e Restore SRG2
Going to try to figure out where SRG lost competitiveness to RRDB..
2020-10-30 14:01:56 -06:00
James Betker
b24ff3c88d Fix bug that causes multiscale dataset to crash 2020-10-30 14:01:24 -06:00
James Betker
74738489b9 Fixes and additional support for progressive zoom 2020-10-30 09:59:54 -06:00
James Betker
a3918fa808 Tecogan & other fixes 2020-10-30 00:19:58 -06:00
James Betker
b316078a15 Fix tecogan_losses fp16 2020-10-29 23:02:20 -06:00
James Betker
3791f95ad0 Enable RRDB to take in reference inputs 2020-10-29 11:07:40 -06:00
James Betker
7d38381d46 Add scaling to rrdb 2020-10-29 09:48:10 -06:00
James Betker
607ff3c67c RRDB with bypass 2020-10-29 09:39:45 -06:00
James Betker
1655b9e242 Fix fast_forward teco loss bug 2020-10-28 17:49:54 -06:00
James Betker
25b007a0f5 Increase jpeg corruption & add error 2020-10-28 17:37:39 -06:00
James Betker
796659b0ac Add 'jpeg-normal' corruption 2020-10-28 16:40:47 -06:00
James Betker
515905e904 Add a min_loss that is DDP compatible 2020-10-28 15:46:59 -06:00
James Betker
f133243ac8 Extra logging for teco_resgen 2020-10-28 15:21:22 -06:00
James Betker
2ab5054d4c Add noise to teco disc 2020-10-27 22:48:23 -06:00
James Betker
4dc16d5889 Upgrade tecogan_losses for speed 2020-10-27 22:40:15 -06:00
James Betker
ac3da0c5a6 Make tecogen functional 2020-10-27 21:08:59 -06:00
James Betker
10da206db6 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-27 20:59:59 -06:00
James Betker
9848f4c6cb Add teco_resgen 2020-10-27 20:59:55 -06:00
James Betker
543c384a91 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-27 20:59:16 -06:00
James Betker
da53090ce6 More adjustments to support distributed training with teco & on multi_modal_train 2020-10-27 20:58:03 -06:00
James Betker
00bb568956 further checkpointify spsr_arch 2020-10-27 17:54:28 -06:00
James Betker
c2727a0150 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-27 15:24:19 -06:00
James Betker
2a3eec8fd7 Fix some distributed training snafus 2020-10-27 15:24:05 -06:00
James Betker
d923a62ed3 Allow SPSR to checkpoint 2020-10-27 15:23:20 -06:00
James Betker
11a9e223a6 Retrofit SPSR_arch so it is capable of accepting a ref 2020-10-27 11:14:36 -06:00
James Betker
8202ee72b9 Re-add original SPSR_arch 2020-10-27 11:00:38 -06:00
James Betker
31cf1ac98d Retrofit full_image_dataset to work with new arch. 2020-10-27 10:26:19 -06:00
James Betker
ade0a129da Include psnr in test.py 2020-10-27 10:25:42 -06:00
James Betker
231137ab0a Revert RRDB back to original model 2020-10-27 10:25:31 -06:00
James Betker
1ce863849a Remove temporary base_model change 2020-10-26 11:13:01 -06:00
James Betker
54accfa693 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-26 11:12:37 -06:00
James Betker
ff58c6484a Fixes to unified chunk datasets to support stereoscopic training 2020-10-26 11:12:22 -06:00
James Betker
b2f803588b Fix multi_modal_train.py 2020-10-26 11:10:22 -06:00
James Betker
f857eb00a8 Allow tecogan losses to compute at 32px 2020-10-26 11:09:55 -06:00
James Betker
629b968901 ChainedGen 4x alteration
Increases conv window for teco_recurrent in the 4x case so all data
can be used.

base_model changes should be temporary.
2020-10-26 10:54:51 -06:00
James Betker
85c07f85d9 Update flownet submodule 2020-10-24 11:59:00 -06:00
James Betker
327cdbe110 Support configurable multi-modal training 2020-10-24 11:57:39 -06:00
James Betker
9c3d059ef0 Updates to be able to train flownet2 in ExtensibleTrainer
Only supports basic losses for now, though.
2020-10-24 11:56:39 -06:00
James Betker
1dbcbfbac8 Restore ChainedEmbeddingGenWithStructure
Still using this guy, after all
2020-10-24 11:54:52 -06:00
James Betker
8e5b6682bf Add PairedFrameDataset 2020-10-23 20:58:07 -06:00
James Betker
7a75d10784 Arch cleanup 2020-10-23 09:35:33 -06:00
James Betker
646d6a621a Support 4x zoom on ChainedEmbeddingGen 2020-10-23 09:25:58 -06:00
James Betker
8636492db0 Copy train.py mods to train2 2020-10-22 17:16:36 -06:00
James Betker
e9c0b9f0fd More adjustments to support multi-modal training
Specifically - looks like at least MSE loss cannot handle autocasted tensors
2020-10-22 16:49:34 -06:00
James Betker
76789a456f Class-ify train.py and workon multi-modal trainer 2020-10-22 16:15:31 -06:00
James Betker
15e00e9014 Finish integration with autocast
Note: autocast is broken when also using checkpoint(). Overcome this by modifying
torch's checkpoint() function in place to also use autocast.
2020-10-22 14:39:19 -06:00
James Betker
d7ee14f721 Move to torch.cuda.amp (not working)
Running into OOM errors, needs diagnosing. Checkpointing here.
2020-10-22 13:58:05 -06:00
James Betker
3e3d2af1f3 Add multi-modal trainer 2020-10-22 13:27:32 -06:00
James Betker
40dc2938e8 Fix multifaceted chain gen 2020-10-22 13:27:06 -06:00
James Betker
f9dc472f63 Misc nonfunctional mods to datasets 2020-10-22 10:16:17 -06:00
James Betker
43c4f92123 Collapse progressive zoom candidates into the batch dimension
This contributes a significant speedup to training this type of network
since losses can operate on the entire prediction spectrum at once.
2020-10-21 22:37:23 -06:00
James Betker
680d635420 Enable ExtensibleTrainer to skip steps when state keys are missing 2020-10-21 22:22:28 -06:00
James Betker
d1175f0de1 Add FFT injector 2020-10-21 22:22:00 -06:00
James Betker
1ef559d7ca Add a ChainedEmbeddingGen which can be simueltaneously used with multiple training paradigms 2020-10-21 22:21:51 -06:00
James Betker
931aa65dd0 Allow recurrent losses to be weighted 2020-10-21 16:59:44 -06:00
James Betker
5753e77d67 ChainedGen: Output debugging information on blocks 2020-10-21 16:36:23 -06:00
James Betker
b54de69153 Misc 2020-10-21 11:08:21 -06:00
James Betker
71c3820d2d Fix process_video 2020-10-21 11:08:12 -06:00
James Betker
3c6e600e48 Add capacity for models to self-report visuals 2020-10-21 11:08:03 -06:00
James Betker
dca5cddb3b Add bypass to ChainedEmbeddingGen 2020-10-21 11:07:45 -06:00
James Betker
d8c6a4bbb8 Misc 2020-10-20 12:56:52 -06:00
James Betker
aba83e7497 Don't apply jpeg corruption & noise corruption together
This causes some severe noise.
2020-10-20 12:56:35 -06:00
James Betker
111450f4e7 Use areal interpolate for multiscale_dataset 2020-10-19 15:30:25 -06:00
James Betker
a63bf2ea2f Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-19 15:26:11 -06:00
James Betker
76e4f0c086 Restore test.py for use as standalone validator 2020-10-19 15:26:07 -06:00
James Betker
1b1ca297f8 Fix recurrent=None bug in ChainedEmbeddingGen 2020-10-19 15:25:12 -06:00
James Betker
331c40f0c8 Allow starting step to be forced
Useful for testing purposes or to force a validation.
2020-10-19 15:23:04 -06:00
James Betker
8ca566b621 Revert "Misc"
This reverts commit 0e3ea63a14.

# Conflicts:
#	codes/test.py
#	codes/train.py
2020-10-19 13:34:54 -06:00
James Betker
b28e4d9cc7 Add spread loss
Experimental loss that peaks around 0.
2020-10-19 11:31:19 -06:00
James Betker
9b9a6e5925 Add get_paths() to base_unsupervised_image_dataset 2020-10-19 11:30:06 -06:00
James Betker
981d64413b Support validation over a custom injector
Also re-enable PSNR
2020-10-19 11:01:56 -06:00
James Betker
ffad0e0422 Allow image corruption in multiscale dataset 2020-10-19 10:10:27 -06:00
James Betker
668cafa798 Push correct patch of recurrent embedding to upstream image, rather than whole thing 2020-10-18 22:39:52 -06:00
James Betker
7df378a944 Remove separated vgg discriminator
Checkpointing happens inline instead. Was a dumb idea..

Also fixes some loss reporting issues.
2020-10-18 12:10:24 -06:00
James Betker
c709d38cd5 Fix memory leak with recurrent loss 2020-10-18 10:22:10 -06:00
James Betker
552e70a032 Get rid of excessive checkpointed disc params 2020-10-18 10:09:37 -06:00
James Betker
6a0d5f4813 Add a checkpointable discriminator 2020-10-18 09:57:47 -06:00
James Betker
9ead2c0a08 Multiscale training in! 2020-10-17 22:54:12 -06:00
James Betker
e706911c83 Fix spinenet bug 2020-10-17 20:20:36 -06:00
James Betker
b008a27d39 Spinenet should allow bypassing the initial conv
This makes feeding in references for recurrence easier.
2020-10-17 20:16:47 -06:00
James Betker
c7f3fc4dd9 Enable chunk_with_reference to work without centers
Moving away from this so it doesn't matter too much. Also fixes an issue
with the "ignore" flag.
2020-10-17 20:09:08 -06:00
James Betker
b45e132a9d Allow first n tiles to be ignored
Helps zoom in with chunked dataset
2020-10-17 09:45:03 -06:00
James Betker
c1c9c5681f Swap recurrence 2020-10-17 08:40:28 -06:00
James Betker
6141aa1110 More recurrence fixes for chainedgen 2020-10-17 08:35:46 -06:00
James Betker
cf8118a85b Allow recurrence to specified for chainedgen 2020-10-17 08:32:29 -06:00
James Betker
fc4c064867 Add recurrent support to chainedgenwithstructure 2020-10-17 08:31:34 -06:00
James Betker
d4a3e11ab2 Don't use several stages of spinenet_arch
These are used for lower outputs which I am not using
2020-10-17 08:28:37 -06:00
James Betker
d1c63ae339 Go back to torch's DDP
Apex was having some weird crashing issues.
2020-10-16 20:47:35 -06:00
James Betker
d856378b2e Add ChainedGenWithStructure 2020-10-16 20:44:36 -06:00
James Betker
96f1be30ed Add use_generator_as_filter 2020-10-16 20:43:55 -06:00
James Betker
617d97e19d Add ChainedEmbeddingGen 2020-10-15 23:18:08 -06:00
James Betker
c4543ce124 Set post_transform_block to None where applicable 2020-10-15 17:20:42 -06:00
James Betker
6f8705e8cb SSGSimpler network 2020-10-15 17:18:44 -06:00
James Betker
1ba01d69b5 Move datasets to INTER_AREA interpolation for downsizing
Looks **FAR** better visually
2020-10-15 17:18:23 -06:00
James Betker
d56745b2ec JPEG-broad adjustment 2020-10-15 10:14:51 -06:00
James Betker
eda75c9779 Cleanup fixes 2020-10-15 10:13:17 -06:00
James Betker
920865defb Arch work 2020-10-15 10:13:06 -06:00
James Betker
1dc0b05428 Add multiscale dataset 2020-10-15 10:12:50 -06:00
James Betker
0f4e03183f New image corruptor gradations 2020-10-15 10:12:25 -06:00
James Betker
1f20d59c31 Revert big switch back 2020-10-14 11:03:34 -06:00
James Betker
9815980329 Update SwitchedConv 2020-10-13 20:57:12 -06:00
James Betker
24792bdb4f Codebase cleanup
Removed a lot of legacy stuff I have no intent on using again.
Plan is to shape this repo into something more extensible (get it? hah!)
2020-10-13 20:56:39 -06:00
James Betker
e620fc05ba Mods to support video processing with teco networks 2020-10-13 20:47:05 -06:00
James Betker
17d78195ee Mods to SRG to support returning switch logits 2020-10-13 20:46:37 -06:00
James Betker
cc915303a5 Fix SPSR calls into SwitchComputer 2020-10-13 10:14:47 -06:00
James Betker
bdf4c38899 Merge remote-tracking branch 'origin/gan_lab' into gan_lab
# Conflicts:
#	codes/models/archs/SwitchedResidualGenerator_arch.py
2020-10-13 10:12:26 -06:00
James Betker
9a5d6162e9 Add the "BigSwitch" 2020-10-13 10:11:10 -06:00
James Betker
8014f050ac Clear metrics properly
Holy cow, what a PITA bug.
2020-10-13 10:07:49 -06:00
James Betker
4d52374e60 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-12 17:43:51 -06:00
James Betker
731700ab2c checkpoint in ssg 2020-10-12 17:43:28 -06:00
James Betker
ca523215c6 Fix recurrent std in arch 2020-10-12 17:42:32 -06:00
James Betker
05377973bf Allow initial recurrent input to be specified (optionally) 2020-10-12 17:36:43 -06:00
James Betker
597b6e92d6 Add ssgr1 recurrence 2020-10-12 17:18:19 -06:00
James Betker
c1a00f31b7 Update switched_conv 2020-10-12 10:37:45 -06:00
James Betker
d7d7590f3e Fix constant injector - wasn't working in test 2020-10-12 10:36:30 -06:00
James Betker
e7cf337dba Fix bug with chunk_with_reference 2020-10-12 10:23:03 -06:00
James Betker
ce163ad4a9 Update SSGdeep 2020-10-12 10:22:08 -06:00
James Betker
2bc5701b10 misc 2020-10-12 10:21:25 -06:00
James Betker
3409d88a1c Add PANet arch 2020-10-12 10:20:55 -06:00
James Betker
7cbf4fa665 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-11 08:33:30 -06:00
James Betker
92cb83958a Return zeros rather than None when image cant be read 2020-10-11 08:33:18 -06:00
James Betker
a9c2e97391 Constant injector and teco fixes 2020-10-11 08:20:07 -06:00
James Betker
e785029936 Mods needed to support SPSR archs with teco gan 2020-10-10 22:39:55 -06:00
James Betker
120072d464 Add constant injector 2020-10-10 21:50:23 -06:00
James Betker
f99812e14d Fix tecogan_losses errors 2020-10-10 20:30:14 -06:00
James Betker
3a5b23b9f7 Alter teco_losses to feed a recurrent input in as separate 2020-10-10 20:21:09 -06:00
James Betker
0d30d18a3d Add MarginRemoval injector 2020-10-09 20:35:56 -06:00
James Betker
0011d445c8 Fix loss indexing 2020-10-09 20:20:51 -06:00
James Betker
202eb11fdc For element loss added 2020-10-09 19:51:44 -06:00
James Betker
61e5047c60 Fix loss accumulator when buffers are not filled
They were reporting incorrect losses.
2020-10-09 19:47:59 -06:00
James Betker
fe50d6f9d0 Fix attention images 2020-10-09 19:21:55 -06:00
James Betker
7e777ea34c Allow tecogan to be used in process_video 2020-10-09 19:21:43 -06:00
James Betker
58d8bf8f69 Add network architecture built for teco 2020-10-09 08:40:14 -06:00
James Betker
b3d0baaf17 Improve multiframe dataset memory usage 2020-10-09 08:40:00 -06:00
James Betker
afe6af88af Fix attention print issue 2020-10-08 18:34:00 -06:00
James Betker
4c85ee51a4 Converge SSG architectures into unified switching base class
Also adds attention norm histogram to logging
2020-10-08 17:23:21 -06:00
James Betker
3cc56cd00b Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-08 16:12:05 -06:00
James Betker
7d8d9dafbb misc 2020-10-08 16:12:00 -06:00
James Betker
856ef4d21d Update switched_conv 2020-10-08 16:10:23 -06:00
James Betker
1eb516d686 Fix more distributed bugs 2020-10-08 14:32:45 -06:00
James Betker
b36ba0460c Fix multi-frame dataset OBO error 2020-10-08 12:21:04 -06:00
James Betker
fba29d7dcc Move to apex distributeddataparallel and add switch all_reduce
Torch's distributed_data_parallel is missing "delay_allreduce", which is
necessary to get gradient checkpointing to work with recurrent models.
2020-10-08 11:20:05 -06:00
James Betker
c174ac0fd5 Allow tecogan to support generators that only output a tensor (instead of a list) 2020-10-08 09:26:25 -06:00
James Betker
969bcd9021 Use local checkpoint in SSG 2020-10-08 08:54:46 -06:00