Commit Graph

1573 Commits

Author SHA1 Message Date
James Betker
aae65e6ed8 Mods to byol_resnet_playground for large batches 2021-01-01 11:59:54 -07:00
James Betker
e992e18767 Add initial_stride term to style_sr
Also fix fid and a networks.py issue.
2021-01-01 11:59:36 -07:00
James Betker
9864fe4c04 Fix for train.py 2021-01-01 11:59:00 -07:00
James Betker
e214e6ce33 Styled SR model 2020-12-31 20:54:18 -07:00
James Betker
0eb1f4dd67 Revert "Get rid of CUDA_VISIBLE_DEVICES"
It is actually necessary for training in distributed mode. Only
do it then.
2020-12-31 10:31:40 -07:00
James Betker
8de5a02a48 byol_resnet_playground
Similar to the spinenet playground, but tinkers with resnet instead
2020-12-31 10:15:04 -07:00
James Betker
8f18b2709e Get rid of CUDA_VISIBLE_DEVICES
It is not clear to me what the purpose of this is, but it has recently
started causing failures.
2020-12-31 10:13:58 -07:00
James Betker
1de1fa30ac Disable refs and centers altogether in single_image_dataset
I suspect that this might be a cause of failures on parallel datasets.
Plus it is unnecessary computation.
2020-12-31 10:13:24 -07:00
James Betker
8f0984cacf Add sr_fid evaluator 2020-12-30 20:18:58 -07:00
James Betker
b1fb82476b Add gp debug (fix) 2020-12-30 15:26:54 -07:00
James Betker
9c53314ea2 Add gradient penalty visual debug 2020-12-30 09:51:59 -07:00
James Betker
63cf3d3126 Injector auto-registration
I love it!
2020-12-29 20:58:02 -07:00
James Betker
a777c1e4f9 Misc script fixes 2020-12-29 20:25:09 -07:00
James Betker
9dc3c8f0ff Script updates 2020-12-29 20:24:41 -07:00
James Betker
ba543d1152 Glean mods
- Fixes fixed upscale factor issues
- Refines a few ops to decrease computation & parameterization
2020-12-27 12:25:06 -07:00
James Betker
5e2e605a50 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-12-26 13:51:19 -07:00
James Betker
f9be049adb GLEAN mod to support custom initial strides 2020-12-26 13:51:14 -07:00
James Betker
2706a84f15 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-12-26 13:50:34 -07:00
James Betker
90e2362c00 Fix bug with full_image_dataset 2020-12-26 13:50:27 -07:00
James Betker
3fd627fc62 Mods to support image classification & filtering 2020-12-26 13:49:27 -07:00
James Betker
10fdfa1563 Migrate generators to dynamic model registration 2020-12-24 23:02:10 -07:00
James Betker
29db7c7a02 Further mods to BYOL 2020-12-24 09:28:41 -07:00
James Betker
036684893e Add LARS optimizer & support for BYOL idiosyncrasies
- Added LARS and SGD optimizer variants that support turning off certain
  features for BN and bias layers
- Added a variant of pytorch's resnet model that supports gradient checkpointing.
- Modify the trainer infrastructure to support above
- Fix bug with BYOL (should have been nonfunctional)
2020-12-23 20:33:43 -07:00
James Betker
1bbcb96ee8 Implement a few changes to support training BYOL networks 2020-12-23 10:50:23 -07:00
James Betker
2437b33e74 Fix srflow_latent_space_playground bug 2020-12-22 15:42:38 -07:00
James Betker
e7aeb17404 ImageFolder dataset: allow intermediary downscale before corrupt
For massive upscales (ex: 8x), corruption does almost nothing when applied
at the HQ level. This patch adds support to perform corruption at a specified
intermediary scale. The dataset downscales to this level, performs the corruption,
then downscales the rest of the way to get the LQ image.
2020-12-22 15:42:21 -07:00
James Betker
7938f9f50b Fix bug with single_image_dataset which prevented working on multiple directories from working 2020-12-19 15:13:46 -07:00
James Betker
ae666dc520 Fix bugs with srflow after refactor 2020-12-19 10:28:23 -07:00
James Betker
4328c2f713 Change default ReLU slope to .2 BREAKS COMPATIBILITY
This conforms my ConvGnLelu implementation with the generally accepted negative_slope=.2. I have no idea where I got .1. This will break backwards compatibility with some older models but will likely improve their performance when freshly trained. I did some auditing to find what these models might be, and I am not actively using any of them, so probably OK.
2020-12-19 08:28:03 -07:00
James Betker
9377d34ac3 glean mods 2020-12-19 08:26:07 -07:00
James Betker
f35c034fa5 Add trainer readme 2020-12-18 16:52:16 -07:00
James Betker
e82f4552db Update other docs with dumb config options 2020-12-18 16:21:28 -07:00
James Betker
92f9a129f7 GLEAN! 2020-12-18 16:04:19 -07:00
James Betker
c717765bcb Notes for lucidrains converter. 2020-12-18 09:55:38 -07:00
James Betker
b4720ea377 Move stylegan to new location 2020-12-18 09:52:36 -07:00
James Betker
1708136b55 Commit my attempt at "conforming" the lucidrains stylegan implementation to the reference spec. Not working. will probably be abandoned. 2020-12-18 09:51:48 -07:00
James Betker
209332292a Rosinality stylegan fix 2020-12-18 09:50:41 -07:00
James Betker
d875ca8342 More refactor changes 2020-12-18 09:24:31 -07:00
James Betker
5640e4efe4 More refactoring 2020-12-18 09:18:34 -07:00
James Betker
b905b108da Large cleanup
Removed a lot of old code that I won't be touching again. Refactored some
code elements into more logical places.
2020-12-18 09:10:44 -07:00
James Betker
2f0a52b7db misc changes 2020-12-18 08:53:45 -07:00
James Betker
a8179ff53c Image label work 2020-12-18 08:53:18 -07:00
James Betker
3074f41877 Get rosinality model converter to work
Mostly, just needed to remove the custom cuda ops, not so bueno on Windows.
2020-12-17 16:03:39 -07:00
James Betker
e838c6e75b Rosinality stylegan2 port 2020-12-17 14:18:46 -07:00
James Betker
12cf052889 Add an image patch labeling UI 2020-12-17 10:16:21 -07:00
James Betker
49327b99fe SRFlow outputs RRDB output 2020-12-16 10:28:02 -07:00
James Betker
c25b49bb12 Clean up of SRFlowNet_arch 2020-12-16 10:27:38 -07:00
James Betker
42ac8e3eeb Remove unnecessary comment from SRFlowNet 2020-12-16 09:43:07 -07:00
James Betker
fb2cfc795b Update requirements, add image_patch_classifier tool 2020-12-16 09:42:50 -07:00
James Betker
09de3052ac Add softmax to spinenet classification head 2020-12-16 09:42:15 -07:00
James Betker
4310e66848 Fix bug in 'corrupt_before_downsize=true' 2020-12-16 09:41:59 -07:00
James Betker
8661207d57 Merge branch 'gan_lab' of https://github.com/neonbjb/DL-Art-School into gan_lab 2020-12-15 17:16:48 -07:00
James Betker
fc376d34b2 Spinenet with logits head 2020-12-15 17:16:19 -07:00
James Betker
8e0e883050 Mods to support labeled datasets & random augs for those datasets 2020-12-15 17:15:56 -07:00
James Betker
e5a3e6b9b5 srflow latent space misc 2020-12-14 23:59:49 -07:00
James Betker
1e14635d88 Add exclusions to extract_subimages_with_ref 2020-12-14 23:59:41 -07:00
James Betker
0a19e53df0 BYOL mods 2020-12-14 23:59:11 -07:00
James Betker
ef7eabf457 Allow RRDB to upscale 8x 2020-12-14 23:58:52 -07:00
James Betker
087e9280ed Add labeling feature to image_folder_dataset 2020-12-14 23:58:37 -07:00
James Betker
ec0ee25f4b Structural latents checkpoint 2020-12-11 12:01:09 -07:00
James Betker
26ceca68c0 BYOL with structure! 2020-12-10 15:07:35 -07:00
James Betker
9c5e272a22 Script to extract models from a wrapped BYOL model 2020-12-10 09:57:52 -07:00
James Betker
a5630d282f Get rid of 2nd trainer 2020-12-10 09:57:38 -07:00
James Betker
8e4b9f42fd New BYOL dataset which uses a form of RandomCrop that lends itself to
structural guidance to the latents.
2020-12-10 09:57:18 -07:00
James Betker
c203cee31e Allow swapping to torch DDP as needed in code 2020-12-09 15:03:59 -07:00
James Betker
66cbae8731 Add random_dataset for testing 2020-12-09 14:55:05 -07:00
James Betker
97ff25a086 BYOL!
Man, is there anything ExtensibleTrainer can't train? :)
2020-12-08 13:07:53 -07:00
James Betker
5369cba8ed Stage 2020-12-08 00:33:07 -07:00
James Betker
bca59ed98a Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-12-07 12:51:04 -07:00
James Betker
ea56eb61f0 Fix DDP errors for discriminator
- Don't define training_net in define_optimizers - this drops the shell and leads to problems downstream
- Get rid of support for multiple training nets per opt. This was half baked and needs a better solution if needed
   downstream.
2020-12-07 12:50:57 -07:00
James Betker
c0aeaabc31 Spinenet playground 2020-12-07 12:49:32 -07:00
James Betker
88fc049c8d spinenet latent playground! 2020-12-05 20:30:36 -07:00
James Betker
11155aead4 Directly use dataset keys
This has been a long time coming. Cleans up messy "GT" nomenclature and simplifies ExtensibleTraner.feed_data
2020-12-04 20:14:53 -07:00
James Betker
8a83b1c716 Go back to apex DDP, fix distributed bugs 2020-12-04 16:39:21 -07:00
James Betker
7a81d4e2f4 Revert gaussian loss changes 2020-12-04 12:49:20 -07:00
James Betker
711780126e Cleanup 2020-12-03 23:42:51 -07:00
James Betker
ac7256d4a3 Do tqdm reporting when calculating flow_gaussian_nll 2020-12-03 23:42:29 -07:00
James Betker
dc9ff8e05b Allow the majority of the srflow steps to checkpoint 2020-12-03 23:41:57 -07:00
James Betker
06d1c62c5a iGPT support!
Sweeeeet
2020-12-03 15:32:21 -07:00
James Betker
c18adbd606 Delete mdcn & panet
Garbage, all of it.
2020-12-02 22:25:57 -07:00
James Betker
f2880b33c9 Get rid of mean shift from MDCN 2020-12-02 14:18:33 -07:00
James Betker
8a00f15746 Implement FlowGaussianNll evaluator 2020-12-02 14:09:54 -07:00
James Betker
edf408508c Fix discriminator 2020-12-01 17:45:56 -07:00
James Betker
c963e5f2ce Add ImageFolderDataset
This one has been a long time coming.. How does torch not have something like this?
2020-12-01 17:45:37 -07:00
James Betker
9a421a41f4 SRFlow: accomodate mismatches between global scale and flow_scale 2020-12-01 11:11:51 -07:00
James Betker
8f65f81ddb Adjustments to subimage extractor 2020-12-01 11:11:30 -07:00
James Betker
e343722d37 Add stepped rrdb 2020-12-01 11:11:15 -07:00
James Betker
2e0bbda640 Remove unused archs 2020-12-01 11:10:48 -07:00
James Betker
a1c8300052 Add mdcn 2020-11-30 16:14:21 -07:00
James Betker
1e0f69e34b extra_conv in gn discriminator, multiframe support in rrdb. 2020-11-29 15:39:50 -07:00
James Betker
da604752e6 Misc RRDB changes 2020-11-29 12:21:31 -07:00
James Betker
f2422f1d75 Latent space playground 2020-11-29 09:33:29 -07:00
James Betker
a1d4c9f83c multires rrdb work 2020-11-28 14:35:46 -07:00
James Betker
929cd45c05 Fix for RRDB scale 2020-11-27 21:37:10 -07:00
James Betker
71fa532356 Adjustments to how flow networks set size and scale 2020-11-27 21:37:00 -07:00
James Betker
6f958bb150 Maybe this is necessary after all? 2020-11-27 15:21:13 -07:00
James Betker
ef8d5f88c1 Bring split gaussian nll out of split so it can be computed accurately with the rest of the nll component 2020-11-27 13:30:21 -07:00
James Betker
11d2b70bdd Latent space playground work 2020-11-27 12:03:16 -07:00
James Betker
4ab49b0d69 RRDB disc work 2020-11-27 12:03:08 -07:00
James Betker
6de4dabb73 Remove srflow (modified version)
Starting from orig and re-working from there.
2020-11-27 12:02:06 -07:00
James Betker
5f5420ff4a Update to srflow_latent_space_playground 2020-11-26 20:31:21 -07:00
James Betker
fd356580c0 Play with lambdas 2020-11-26 20:30:55 -07:00
James Betker
0c6d7971b9 Dataset documentation 2020-11-26 11:58:39 -07:00
James Betker
45a489110f Fix datasets 2020-11-26 11:50:38 -07:00
James Betker
5edaf085e0 Adjustments to latent_space_playground 2020-11-25 15:52:36 -07:00
James Betker
205c9a5335 Learn how to functionally use srflow networks 2020-11-25 13:59:06 -07:00
James Betker
cb045121b3 Expose srflow rrdb 2020-11-24 13:20:20 -07:00
James Betker
f3c1fc1bcd Dataset modifications 2020-11-24 13:20:12 -07:00
James Betker
f6098155cd Mods to tecogan to allow use of embeddings as input 2020-11-24 09:24:02 -07:00
James Betker
b10bcf6436 Rework stylegan_for_sr to incorporate structure as an adain block 2020-11-23 11:31:11 -07:00
James Betker
519ba6f10c Support 2x RRDB with 4x srflow 2020-11-21 14:46:15 -07:00
James Betker
cad92bada8 Report logp and logdet for srflow 2020-11-21 10:13:05 -07:00
James Betker
c37d3faa58 More adjustments to srflow_orig 2020-11-20 19:38:33 -07:00
James Betker
d51d12a41a Adjustments to srflow to (maybe?) fix training 2020-11-20 14:44:24 -07:00
James Betker
6c8c35ac47 Support training RRDB encoder [srflow] 2020-11-20 10:03:06 -07:00
James Betker
5ccdbcefe3 srflow_orig integration 2020-11-19 23:47:24 -07:00
James Betker
f80acfcab6 Throw if dataset isn't going to work with force_multiple setting 2020-11-19 23:47:00 -07:00
James Betker
2b2d754d8e Bring in an original SRFlow implementation for reference 2020-11-19 21:42:39 -07:00
James Betker
1e0d7be3ce "Clean up" SRFlow 2020-11-19 21:42:24 -07:00
James Betker
d7877d0a36 Fixes to teco losses and translational losses 2020-11-19 11:35:05 -07:00
James Betker
b2a05465fc Fix missing requirements 2020-11-18 10:16:39 -07:00
James Betker
5c10264538 Remove pyramid_disc hard dependencies 2020-11-17 18:34:11 -07:00
James Betker
6b679e2b51 Make grad_penalty available to classical discs 2020-11-17 18:31:40 -07:00
James Betker
8a19c9ae15 Add additive mode to rrdb 2020-11-16 20:45:09 -07:00
James Betker
2a507987df Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-11-15 16:16:30 -07:00
James Betker
931ed903c1 Allow combined additive loss 2020-11-15 16:16:18 -07:00
James Betker
4b68116977 import fix 2020-11-15 16:15:42 -07:00
James Betker
98eada1e4c More circular dependency fixes + unet fixes 2020-11-15 11:53:35 -07:00
James Betker
e587d549f7 Fix circular imports 2020-11-15 11:32:35 -07:00
James Betker
99f0cfaab5 Rework stylegan2 divergence losses
Notably: include unet loss
2020-11-15 11:26:44 -07:00
James Betker
ea94b93a37 Fixes for unet 2020-11-15 10:38:33 -07:00
James Betker
89f56b2091 Fix another import 2020-11-14 22:10:45 -07:00
James Betker
9af049c671 Import fix for unet 2020-11-14 22:09:18 -07:00
James Betker
5cade6b874 Move stylegan2 around, bring in unet 2020-11-14 22:04:48 -07:00
James Betker
4c6b14a3f8 Allow extract_square_images to work on multiple images 2020-11-14 20:24:05 -07:00
James Betker
125cb16dce Add a FID evaluator for stylegan with structural guidance 2020-11-14 20:16:07 -07:00
James Betker
c9258e2da3 Alter how structural guidance is given to stylegan 2020-11-14 20:15:48 -07:00
James Betker
3397c83447 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-11-14 09:30:09 -07:00
James Betker
423ee7cb90 Allow attention to be specified for stylegan2 2020-11-14 09:29:53 -07:00
James Betker
ec621c69b5 Fix train bug 2020-11-14 09:29:08 -07:00
James Betker
cdc5ac30e9 oddity 2020-11-13 20:11:57 -07:00
James Betker
f406a5dd4c Mods to support stylegan2 in SR mode 2020-11-13 20:11:50 -07:00
James Betker
9c3d0b7560 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-11-13 20:10:47 -07:00
James Betker
67bf55495b Allow hq_batched_key to be specified 2020-11-13 20:10:12 -07:00
James Betker
0b96811611 Fix another issue with gpu ids getting thrown all over hte place 2020-11-13 20:05:52 -07:00
James Betker
c47925ae34 New image extractor utility 2020-11-13 11:04:03 -07:00
James Betker
a07e1a7292 Add separate Evaluator module and FID evaluator 2020-11-13 11:03:54 -07:00
James Betker
080ad61be4 Add option to work with nonrandom latents 2020-11-12 21:23:50 -07:00
James Betker
566b99ca75 GP adjustments for stylegan2 2020-11-12 16:44:51 -07:00
James Betker
fc55bdb24e Mods to how wandb are integrated 2020-11-12 15:45:25 -07:00
James Betker
44a19cd37c ExtensibleTrainer mods to support advanced checkpointing for stylegan2
Basically: stylegan2 makes use of gradient-based normalizers. These
make it so that I cannot use gradient checkpointing. But I love gradient
checkpointing. It makes things really, really fast and memory conscious.

So - only don't checkpoint when we run the regularizer loss. This is a
bit messy, but speeds up training by at least 20%.

Also: pytorch: please make checkpointing a first class citizen.
2020-11-12 15:45:07 -07:00
James Betker
db9e9e28a0 Fix an issue where GPU0 was always being used in non-ddp
Frankly, I don't understand how this has ever worked. WTF.
2020-11-12 15:43:01 -07:00
James Betker
2d3449d7a5 stylegan2 in ml art school! 2020-11-12 15:42:05 -07:00
James Betker
fd97573085 Fixes 2020-11-11 21:49:06 -07:00
James Betker
88f349bdf1 Enable usage of wandb 2020-11-11 21:48:56 -07:00
James Betker
1c065c41b4 Revert "..."
This reverts commit 4b92191880.
2020-11-11 17:24:27 -07:00
James Betker
4b92191880 ... 2020-11-11 14:12:40 -07:00
James Betker
12b57bbd03 Add residual blocks to pyramid disc 2020-11-11 13:56:45 -07:00
James Betker
b4136d766a Back to pyramids, no rrdb 2020-11-11 13:40:24 -07:00
James Betker
42a97de756 Convert PyramidRRDBDisc to RRDBDisc
Had numeric stability issues. This probably makes more sense anyways.
2020-11-11 12:14:14 -07:00
James Betker
72762f200c PyramidRRDB net 2020-11-11 11:25:49 -07:00
James Betker
a1760f8969 Adapt srg2 for video 2020-11-10 16:16:41 -07:00
James Betker
b742d1e5a5 When skipping steps via "every", still run nontrainable injection points 2020-11-10 16:09:17 -07:00
James Betker
91d27372e4 rrdb with adain latent 2020-11-10 16:08:54 -07:00
James Betker
6a2fd5f7d0 Lots of new discriminator nets 2020-11-10 16:06:54 -07:00
James Betker
4e5ba61ae7 SRG2classic further re-integration 2020-11-10 16:06:14 -07:00
James Betker
9e2c96ad5d More latent work 2020-11-07 20:38:56 -07:00
James Betker
6be6c92e5d Fix yet ANOTHER OBO error in multi_frame_dataset 2020-11-06 20:38:34 -07:00
James Betker
0cf52ef52c latent work 2020-11-06 20:38:23 -07:00
James Betker
34d319585c Add srflow arch 2020-11-06 20:38:04 -07:00
James Betker
4469d2e661 More work on RRDB with latent 2020-11-05 22:13:05 -07:00
James Betker
62d3b6496b Latent work checkpoint 2020-11-05 13:31:34 -07:00
James Betker
fd6cdba88f RRDB with latent 2020-11-05 10:04:17 -07:00
James Betker
df47d6cbbb More work in support of training flow networks in tandem with generators 2020-11-04 18:07:48 -07:00
James Betker
c21088e238 Fix OBO error in multi_frame_dataset
In some datasets, this meant one frame was included in a sequence where it didn't belong. In datasets with mismatched chunk sizes, this resulted in an error.
2020-11-03 14:32:06 -07:00
James Betker
e990be0449 Improve ignore_first logic 2020-11-03 11:56:32 -07:00
James Betker
658a267bab More work on SSIM/PSNR approximators
- Add a network that accomodates this style of approximator while retaining structure
- Migrate to SSIM approximation
- Add a tool to visualize how these approximators are working
- Fix some issues that came up while doign this work
2020-11-03 08:09:58 -07:00
James Betker
85c545835c Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-11-02 08:48:15 -07:00
James Betker
f13fdd43ed Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-11-02 08:47:42 -07:00
James Betker
fed16abc22 Report chunking errors 2020-11-02 08:47:18 -07:00
James Betker
a51daacde2 Fix reporting of d_fake_diff for generators 2020-11-02 08:45:46 -07:00
James Betker
3676f26d94 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-31 20:55:45 -06:00
James Betker
dcfe994fee Add standalone srg2_classic
Trying to investigate how I was so misguided. I *thought* srg2 was considerably
better than RRDB in performance but am not actually seeing that.
2020-10-31 20:55:34 -06:00
James Betker
ea8c20c0e2 Fix bug with multiscale_dataset 2020-10-31 20:54:41 -06:00
James Betker
bb39d3efe5 Bump image corruption factor a bit 2020-10-31 20:50:24 -06:00
James Betker
eb7df63592 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-31 11:09:32 -06:00
James Betker
c2866ad8d2 Disable debugging of comparable pingpong generations 2020-10-31 11:09:10 -06:00
James Betker
7303d8c932 Add psnr approximator 2020-10-31 11:08:55 -06:00
James Betker
565517814e Restore SRG2
Going to try to figure out where SRG lost competitiveness to RRDB..
2020-10-30 14:01:56 -06:00
James Betker
b24ff3c88d Fix bug that causes multiscale dataset to crash 2020-10-30 14:01:24 -06:00
James Betker
74738489b9 Fixes and additional support for progressive zoom 2020-10-30 09:59:54 -06:00
James Betker
a3918fa808 Tecogan & other fixes 2020-10-30 00:19:58 -06:00
James Betker
b316078a15 Fix tecogan_losses fp16 2020-10-29 23:02:20 -06:00
James Betker
3791f95ad0 Enable RRDB to take in reference inputs 2020-10-29 11:07:40 -06:00
James Betker
7d38381d46 Add scaling to rrdb 2020-10-29 09:48:10 -06:00
James Betker
607ff3c67c RRDB with bypass 2020-10-29 09:39:45 -06:00
James Betker
1655b9e242 Fix fast_forward teco loss bug 2020-10-28 17:49:54 -06:00
James Betker
25b007a0f5 Increase jpeg corruption & add error 2020-10-28 17:37:39 -06:00
James Betker
796659b0ac Add 'jpeg-normal' corruption 2020-10-28 16:40:47 -06:00
James Betker
515905e904 Add a min_loss that is DDP compatible 2020-10-28 15:46:59 -06:00
James Betker
f133243ac8 Extra logging for teco_resgen 2020-10-28 15:21:22 -06:00
James Betker
2ab5054d4c Add noise to teco disc 2020-10-27 22:48:23 -06:00
James Betker
4dc16d5889 Upgrade tecogan_losses for speed 2020-10-27 22:40:15 -06:00
James Betker
ac3da0c5a6 Make tecogen functional 2020-10-27 21:08:59 -06:00
James Betker
10da206db6 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-27 20:59:59 -06:00
James Betker
9848f4c6cb Add teco_resgen 2020-10-27 20:59:55 -06:00
James Betker
543c384a91 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-27 20:59:16 -06:00
James Betker
da53090ce6 More adjustments to support distributed training with teco & on multi_modal_train 2020-10-27 20:58:03 -06:00
James Betker
00bb568956 further checkpointify spsr_arch 2020-10-27 17:54:28 -06:00
James Betker
c2727a0150 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-27 15:24:19 -06:00
James Betker
2a3eec8fd7 Fix some distributed training snafus 2020-10-27 15:24:05 -06:00
James Betker
d923a62ed3 Allow SPSR to checkpoint 2020-10-27 15:23:20 -06:00
James Betker
11a9e223a6 Retrofit SPSR_arch so it is capable of accepting a ref 2020-10-27 11:14:36 -06:00
James Betker
8202ee72b9 Re-add original SPSR_arch 2020-10-27 11:00:38 -06:00
James Betker
31cf1ac98d Retrofit full_image_dataset to work with new arch. 2020-10-27 10:26:19 -06:00
James Betker
ade0a129da Include psnr in test.py 2020-10-27 10:25:42 -06:00
James Betker
231137ab0a Revert RRDB back to original model 2020-10-27 10:25:31 -06:00
James Betker
1ce863849a Remove temporary base_model change 2020-10-26 11:13:01 -06:00
James Betker
54accfa693 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-26 11:12:37 -06:00
James Betker
ff58c6484a Fixes to unified chunk datasets to support stereoscopic training 2020-10-26 11:12:22 -06:00
James Betker
b2f803588b Fix multi_modal_train.py 2020-10-26 11:10:22 -06:00
James Betker
f857eb00a8 Allow tecogan losses to compute at 32px 2020-10-26 11:09:55 -06:00
James Betker
629b968901 ChainedGen 4x alteration
Increases conv window for teco_recurrent in the 4x case so all data
can be used.

base_model changes should be temporary.
2020-10-26 10:54:51 -06:00
James Betker
85c07f85d9 Update flownet submodule 2020-10-24 11:59:00 -06:00
James Betker
327cdbe110 Support configurable multi-modal training 2020-10-24 11:57:39 -06:00
James Betker
9c3d059ef0 Updates to be able to train flownet2 in ExtensibleTrainer
Only supports basic losses for now, though.
2020-10-24 11:56:39 -06:00
James Betker
1dbcbfbac8 Restore ChainedEmbeddingGenWithStructure
Still using this guy, after all
2020-10-24 11:54:52 -06:00
James Betker
8e5b6682bf Add PairedFrameDataset 2020-10-23 20:58:07 -06:00
James Betker
7a75d10784 Arch cleanup 2020-10-23 09:35:33 -06:00
James Betker
646d6a621a Support 4x zoom on ChainedEmbeddingGen 2020-10-23 09:25:58 -06:00
James Betker
8636492db0 Copy train.py mods to train2 2020-10-22 17:16:36 -06:00
James Betker
e9c0b9f0fd More adjustments to support multi-modal training
Specifically - looks like at least MSE loss cannot handle autocasted tensors
2020-10-22 16:49:34 -06:00
James Betker
76789a456f Class-ify train.py and workon multi-modal trainer 2020-10-22 16:15:31 -06:00
James Betker
15e00e9014 Finish integration with autocast
Note: autocast is broken when also using checkpoint(). Overcome this by modifying
torch's checkpoint() function in place to also use autocast.
2020-10-22 14:39:19 -06:00
James Betker
d7ee14f721 Move to torch.cuda.amp (not working)
Running into OOM errors, needs diagnosing. Checkpointing here.
2020-10-22 13:58:05 -06:00
James Betker
3e3d2af1f3 Add multi-modal trainer 2020-10-22 13:27:32 -06:00
James Betker
40dc2938e8 Fix multifaceted chain gen 2020-10-22 13:27:06 -06:00
James Betker
f9dc472f63 Misc nonfunctional mods to datasets 2020-10-22 10:16:17 -06:00
James Betker
43c4f92123 Collapse progressive zoom candidates into the batch dimension
This contributes a significant speedup to training this type of network
since losses can operate on the entire prediction spectrum at once.
2020-10-21 22:37:23 -06:00
James Betker
680d635420 Enable ExtensibleTrainer to skip steps when state keys are missing 2020-10-21 22:22:28 -06:00
James Betker
d1175f0de1 Add FFT injector 2020-10-21 22:22:00 -06:00
James Betker
1ef559d7ca Add a ChainedEmbeddingGen which can be simueltaneously used with multiple training paradigms 2020-10-21 22:21:51 -06:00
James Betker
931aa65dd0 Allow recurrent losses to be weighted 2020-10-21 16:59:44 -06:00
James Betker
5753e77d67 ChainedGen: Output debugging information on blocks 2020-10-21 16:36:23 -06:00
James Betker
b54de69153 Misc 2020-10-21 11:08:21 -06:00
James Betker
71c3820d2d Fix process_video 2020-10-21 11:08:12 -06:00
James Betker
3c6e600e48 Add capacity for models to self-report visuals 2020-10-21 11:08:03 -06:00
James Betker
dca5cddb3b Add bypass to ChainedEmbeddingGen 2020-10-21 11:07:45 -06:00
James Betker
d8c6a4bbb8 Misc 2020-10-20 12:56:52 -06:00
James Betker
aba83e7497 Don't apply jpeg corruption & noise corruption together
This causes some severe noise.
2020-10-20 12:56:35 -06:00
James Betker
111450f4e7 Use areal interpolate for multiscale_dataset 2020-10-19 15:30:25 -06:00
James Betker
a63bf2ea2f Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-19 15:26:11 -06:00
James Betker
76e4f0c086 Restore test.py for use as standalone validator 2020-10-19 15:26:07 -06:00
James Betker
1b1ca297f8 Fix recurrent=None bug in ChainedEmbeddingGen 2020-10-19 15:25:12 -06:00
James Betker
331c40f0c8 Allow starting step to be forced
Useful for testing purposes or to force a validation.
2020-10-19 15:23:04 -06:00
James Betker
8ca566b621 Revert "Misc"
This reverts commit 0e3ea63a14.

# Conflicts:
#	codes/test.py
#	codes/train.py
2020-10-19 13:34:54 -06:00
James Betker
b28e4d9cc7 Add spread loss
Experimental loss that peaks around 0.
2020-10-19 11:31:19 -06:00
James Betker
9b9a6e5925 Add get_paths() to base_unsupervised_image_dataset 2020-10-19 11:30:06 -06:00
James Betker
981d64413b Support validation over a custom injector
Also re-enable PSNR
2020-10-19 11:01:56 -06:00
James Betker
ffad0e0422 Allow image corruption in multiscale dataset 2020-10-19 10:10:27 -06:00
James Betker
668cafa798 Push correct patch of recurrent embedding to upstream image, rather than whole thing 2020-10-18 22:39:52 -06:00
James Betker
7df378a944 Remove separated vgg discriminator
Checkpointing happens inline instead. Was a dumb idea..

Also fixes some loss reporting issues.
2020-10-18 12:10:24 -06:00
James Betker
c709d38cd5 Fix memory leak with recurrent loss 2020-10-18 10:22:10 -06:00
James Betker
552e70a032 Get rid of excessive checkpointed disc params 2020-10-18 10:09:37 -06:00
James Betker
6a0d5f4813 Add a checkpointable discriminator 2020-10-18 09:57:47 -06:00
James Betker
9ead2c0a08 Multiscale training in! 2020-10-17 22:54:12 -06:00
James Betker
e706911c83 Fix spinenet bug 2020-10-17 20:20:36 -06:00
James Betker
b008a27d39 Spinenet should allow bypassing the initial conv
This makes feeding in references for recurrence easier.
2020-10-17 20:16:47 -06:00
James Betker
c7f3fc4dd9 Enable chunk_with_reference to work without centers
Moving away from this so it doesn't matter too much. Also fixes an issue
with the "ignore" flag.
2020-10-17 20:09:08 -06:00
James Betker
b45e132a9d Allow first n tiles to be ignored
Helps zoom in with chunked dataset
2020-10-17 09:45:03 -06:00
James Betker
c1c9c5681f Swap recurrence 2020-10-17 08:40:28 -06:00
James Betker
6141aa1110 More recurrence fixes for chainedgen 2020-10-17 08:35:46 -06:00
James Betker
cf8118a85b Allow recurrence to specified for chainedgen 2020-10-17 08:32:29 -06:00
James Betker
fc4c064867 Add recurrent support to chainedgenwithstructure 2020-10-17 08:31:34 -06:00
James Betker
d4a3e11ab2 Don't use several stages of spinenet_arch
These are used for lower outputs which I am not using
2020-10-17 08:28:37 -06:00
James Betker
d1c63ae339 Go back to torch's DDP
Apex was having some weird crashing issues.
2020-10-16 20:47:35 -06:00
James Betker
d856378b2e Add ChainedGenWithStructure 2020-10-16 20:44:36 -06:00
James Betker
96f1be30ed Add use_generator_as_filter 2020-10-16 20:43:55 -06:00
James Betker
617d97e19d Add ChainedEmbeddingGen 2020-10-15 23:18:08 -06:00
James Betker
c4543ce124 Set post_transform_block to None where applicable 2020-10-15 17:20:42 -06:00
James Betker
6f8705e8cb SSGSimpler network 2020-10-15 17:18:44 -06:00
James Betker
1ba01d69b5 Move datasets to INTER_AREA interpolation for downsizing
Looks **FAR** better visually
2020-10-15 17:18:23 -06:00
James Betker
d56745b2ec JPEG-broad adjustment 2020-10-15 10:14:51 -06:00
James Betker
eda75c9779 Cleanup fixes 2020-10-15 10:13:17 -06:00
James Betker
920865defb Arch work 2020-10-15 10:13:06 -06:00
James Betker
1dc0b05428 Add multiscale dataset 2020-10-15 10:12:50 -06:00
James Betker
0f4e03183f New image corruptor gradations 2020-10-15 10:12:25 -06:00
James Betker
1f20d59c31 Revert big switch back 2020-10-14 11:03:34 -06:00
James Betker
9815980329 Update SwitchedConv 2020-10-13 20:57:12 -06:00
James Betker
24792bdb4f Codebase cleanup
Removed a lot of legacy stuff I have no intent on using again.
Plan is to shape this repo into something more extensible (get it? hah!)
2020-10-13 20:56:39 -06:00
James Betker
e620fc05ba Mods to support video processing with teco networks 2020-10-13 20:47:05 -06:00
James Betker
17d78195ee Mods to SRG to support returning switch logits 2020-10-13 20:46:37 -06:00
James Betker
cc915303a5 Fix SPSR calls into SwitchComputer 2020-10-13 10:14:47 -06:00
James Betker
bdf4c38899 Merge remote-tracking branch 'origin/gan_lab' into gan_lab
# Conflicts:
#	codes/models/archs/SwitchedResidualGenerator_arch.py
2020-10-13 10:12:26 -06:00
James Betker
9a5d6162e9 Add the "BigSwitch" 2020-10-13 10:11:10 -06:00
James Betker
8014f050ac Clear metrics properly
Holy cow, what a PITA bug.
2020-10-13 10:07:49 -06:00
James Betker
4d52374e60 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-12 17:43:51 -06:00
James Betker
731700ab2c checkpoint in ssg 2020-10-12 17:43:28 -06:00
James Betker
ca523215c6 Fix recurrent std in arch 2020-10-12 17:42:32 -06:00
James Betker
05377973bf Allow initial recurrent input to be specified (optionally) 2020-10-12 17:36:43 -06:00
James Betker
597b6e92d6 Add ssgr1 recurrence 2020-10-12 17:18:19 -06:00
James Betker
c1a00f31b7 Update switched_conv 2020-10-12 10:37:45 -06:00
James Betker
d7d7590f3e Fix constant injector - wasn't working in test 2020-10-12 10:36:30 -06:00
James Betker
e7cf337dba Fix bug with chunk_with_reference 2020-10-12 10:23:03 -06:00
James Betker
ce163ad4a9 Update SSGdeep 2020-10-12 10:22:08 -06:00
James Betker
2bc5701b10 misc 2020-10-12 10:21:25 -06:00
James Betker
3409d88a1c Add PANet arch 2020-10-12 10:20:55 -06:00
James Betker
7cbf4fa665 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-11 08:33:30 -06:00
James Betker
92cb83958a Return zeros rather than None when image cant be read 2020-10-11 08:33:18 -06:00
James Betker
a9c2e97391 Constant injector and teco fixes 2020-10-11 08:20:07 -06:00
James Betker
e785029936 Mods needed to support SPSR archs with teco gan 2020-10-10 22:39:55 -06:00
James Betker
120072d464 Add constant injector 2020-10-10 21:50:23 -06:00
James Betker
f99812e14d Fix tecogan_losses errors 2020-10-10 20:30:14 -06:00
James Betker
3a5b23b9f7 Alter teco_losses to feed a recurrent input in as separate 2020-10-10 20:21:09 -06:00
James Betker
0d30d18a3d Add MarginRemoval injector 2020-10-09 20:35:56 -06:00
James Betker
0011d445c8 Fix loss indexing 2020-10-09 20:20:51 -06:00
James Betker
202eb11fdc For element loss added 2020-10-09 19:51:44 -06:00
James Betker
61e5047c60 Fix loss accumulator when buffers are not filled
They were reporting incorrect losses.
2020-10-09 19:47:59 -06:00
James Betker
fe50d6f9d0 Fix attention images 2020-10-09 19:21:55 -06:00
James Betker
7e777ea34c Allow tecogan to be used in process_video 2020-10-09 19:21:43 -06:00
James Betker
58d8bf8f69 Add network architecture built for teco 2020-10-09 08:40:14 -06:00
James Betker
b3d0baaf17 Improve multiframe dataset memory usage 2020-10-09 08:40:00 -06:00
James Betker
afe6af88af Fix attention print issue 2020-10-08 18:34:00 -06:00
James Betker
4c85ee51a4 Converge SSG architectures into unified switching base class
Also adds attention norm histogram to logging
2020-10-08 17:23:21 -06:00
James Betker
3cc56cd00b Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-08 16:12:05 -06:00
James Betker
7d8d9dafbb misc 2020-10-08 16:12:00 -06:00
James Betker
856ef4d21d Update switched_conv 2020-10-08 16:10:23 -06:00
James Betker
1eb516d686 Fix more distributed bugs 2020-10-08 14:32:45 -06:00
James Betker
b36ba0460c Fix multi-frame dataset OBO error 2020-10-08 12:21:04 -06:00
James Betker
fba29d7dcc Move to apex distributeddataparallel and add switch all_reduce
Torch's distributed_data_parallel is missing "delay_allreduce", which is
necessary to get gradient checkpointing to work with recurrent models.
2020-10-08 11:20:05 -06:00
James Betker
c174ac0fd5 Allow tecogan to support generators that only output a tensor (instead of a list) 2020-10-08 09:26:25 -06:00
James Betker
969bcd9021 Use local checkpoint in SSG 2020-10-08 08:54:46 -06:00
James Betker
c93dd623d7 Tecogan losses work 2020-10-07 23:11:58 -06:00
James Betker
29bf78d791 Update switched_conv submodule 2020-10-07 23:11:50 -06:00
James Betker
c96f5b2686 Import switched_conv as a submodule 2020-10-07 23:10:54 -06:00
James Betker
c352c8bce4 More tecogan fixes 2020-10-07 12:41:17 -06:00
James Betker
a62a5dbb5f Clone and detach in recursively_detach 2020-10-07 12:41:00 -06:00
James Betker
1c44d395af Tecogan work
Its training!  There's still probably plenty of bugs though..
2020-10-07 09:03:30 -06:00
James Betker
e9d7371a61 Add concatenate injector 2020-10-07 09:02:42 -06:00
James Betker
8a7e993aea Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-06 20:41:58 -06:00
James Betker
b2c4b2a16d Move gpu_ids out of if statement 2020-10-06 20:40:20 -06:00
James Betker
1e415b249b Add tag that can be applied to prevent parameter training 2020-10-06 20:39:49 -06:00
James Betker
2f2e3f33f8 StackedSwitchedGenerator_5lyr 2020-10-06 20:39:32 -06:00
James Betker
6217b48e3f Fix spsr_arch bug 2020-10-06 20:38:47 -06:00
James Betker
4290918359 Add distributed_checkpoint for more efficient checkpoints 2020-10-06 20:38:38 -06:00
James Betker
cffc596141 Integrate flownet2 into codebase, add teco visual debugs 2020-10-06 20:35:39 -06:00
James Betker
e4b89a172f Reduce spsr7 memory usage 2020-10-05 22:05:56 -06:00
James Betker
4111942ada Support attention deferral in deep ssgr 2020-10-05 19:35:55 -06:00
James Betker
840927063a Work on tecogan losses 2020-10-05 19:35:28 -06:00
James Betker
0e3ea63a14 Misc 2020-10-05 18:01:50 -06:00
James Betker
2875822024 SPSR9 arch
takes some of the stuff I learned with SGSR yesterday and applies it to spsr
2020-10-05 08:47:51 -06:00
James Betker
51044929af Don't compute attention statistics on multiple generator invocations of the same data 2020-10-05 00:34:29 -06:00
James Betker
e760658fdb Another fix.. 2020-10-04 21:08:00 -06:00
James Betker
a890e3a9c0 Fix geometric loss not handling 0 index 2020-10-04 21:05:01 -06:00
James Betker
c3ef8a4a31 Stacked switches - return a tuple 2020-10-04 21:02:24 -06:00
James Betker
13f97e1e97 Add recursive loss 2020-10-04 20:48:15 -06:00
James Betker
ffd069fd97 Lots of SSG work
- Checkpointed pretty much the entire model - enabling recurrent inputs
- Added two new models for test - adding depth (again) and removing SPSR (in lieu of the new losses)
2020-10-04 20:48:08 -06:00
James Betker
aca2c7ab41 Full checkpoint-ize SSG1 2020-10-04 18:24:52 -06:00
James Betker
fc396baf1a Move loaded_options to util
Doesn't seem to work with python 3.6
2020-10-03 20:29:06 -06:00
James Betker
2d8e9a9d30 Options fix? 2020-10-03 20:27:12 -06:00
James Betker
e3294939b0 Revert "SSG: offer option to use BN-based attention normalization"
Didn't work. Oh well.

This reverts commit 5cd2b37591.
2020-10-03 17:54:53 -06:00
James Betker
43c6c67fd1 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-03 16:17:31 -06:00
James Betker
5cd2b37591 SSG: offer option to use BN-based attention normalization
Not sure how this is going to work, lets try it.
2020-10-03 16:16:19 -06:00
James Betker
c896939523 Fix recursive checkpoint 2020-10-03 16:15:52 -06:00
James Betker
3cbb9ecd45 Misc 2020-10-03 16:15:42 -06:00
James Betker
35731502c3 Fix checkpoint recursion 2020-10-03 12:52:50 -06:00
James Betker
9b4ed82093 Get rid of unused convs in spsr7 2020-10-03 11:36:26 -06:00
James Betker
b2b81b13a4 Remove recursive utils import 2020-10-03 11:30:05 -06:00
James Betker
3561cc164d Fix up fea_loss calculator (for validation)
Not sure how this was working in regular training mode, but it
was failing in DDP.
2020-10-03 11:19:20 -06:00
James Betker
21d3bb83b2 Use tqdm reporting with validation 2020-10-03 11:16:39 -06:00
James Betker
6c9718ad64 Don't log if you aren't 0 rank 2020-10-03 11:14:13 -06:00
James Betker
922b1d76df Don't record visuals when not on rank 0 2020-10-03 11:10:03 -06:00
James Betker
8197fd646f Don't accumulate losses for metrics when the loss isn't a tensor 2020-10-03 11:03:55 -06:00
James Betker
19a4075e1e Allow checkpointing to be disabled in the options file
Also makes options a global variable for usage in utils.
2020-10-03 11:03:28 -06:00
James Betker
dd9d7b27ac Add more sophisticated mechanism for balancing GAN losses 2020-10-02 22:53:42 -06:00
James Betker
39865ca3df TOTAL_loss, dumbo 2020-10-02 21:06:10 -06:00
James Betker
4e44fcd655 Loss accumulator fix 2020-10-02 20:55:33 -06:00
James Betker
567b4d50a4 ExtensibleTrainer - don't compute backward when there is no loss 2020-10-02 20:54:06 -06:00
James Betker
146a9125f2 Modify geometric & translational losses so they can be used with embeddings 2020-10-02 20:40:13 -06:00
James Betker
e30a1443cd Change sw2 refs 2020-10-02 09:01:18 -06:00
James Betker
e38716925f Fix spsr8 class init 2020-10-02 09:00:18 -06:00
James Betker
efbf6b737b Update validate_data to work with SingleImageDataset 2020-10-02 08:58:34 -06:00
James Betker
35469f08e2 Spsr 8 2020-10-02 08:58:15 -06:00
James Betker
c9a9e5c525 Prompt user for gpu_id if multiple gpus are detected 2020-10-01 17:24:50 -06:00
James Betker
aa4fd89018 resnext with groupnorm 2020-10-01 15:49:28 -06:00
James Betker
8beaa47933 resnext discriminator 2020-10-01 11:48:14 -06:00
James Betker
55f2764fef Allow fixup50 to be used as a discriminator 2020-10-01 11:28:18 -06:00
James Betker
7986185fcb Change 'mod_step' to 'every' 2020-10-01 11:28:06 -06:00
James Betker
d9ae970fd9 SSG update 2020-10-01 11:27:51 -06:00
James Betker
e3053e4e55 Exchange SpsrNet for SpsrNetSimplified 2020-09-30 17:01:04 -06:00
James Betker
66d4512029 Fix up translational equivariance loss so it's ready for prime time 2020-09-30 12:01:00 -06:00
James Betker
896b4f5be2 Revert "spsr7 adjustments"
This reverts commit 9fee1cec71.
2020-09-29 18:30:41 -06:00
James Betker
9fee1cec71 spsr7 adjustments 2020-09-29 17:19:59 -06:00
James Betker
dc8f3b24de Don't let duplicate keys be used for injectors and losses 2020-09-29 16:59:44 -06:00
James Betker
0b5a033503 spsr7 + cleanup
SPSR7 adds ref onto spsr6, makes more "common sense" mods.
2020-09-29 16:59:26 -06:00
James Betker
f9b83176f1 Fix bugs in extensibletrainer 2020-09-28 22:09:42 -06:00
James Betker
db52bec4ab spsr6
This is meant to be a variant of SPSR5 that harkens
back to the simpler earlier architectures that do not
have embeddings or ref_ inputs, but do have deep
multiplexers. It does, however, use some of the new
conjoin mechanisms.
2020-09-28 22:09:27 -06:00
James Betker
7e240f2fed Recurrent / teco work 2020-09-28 22:06:56 -06:00
James Betker
57814f18cf More features for multi-frame-dataset 2020-09-28 14:26:15 -06:00
James Betker
aeaf185314 Add RCAN 2020-09-27 16:00:41 -06:00
James Betker
4d29b7729e Model arch cleanup 2020-09-27 11:18:45 -06:00
James Betker
7dff802144 Add MultiFrameDataset
Retrieves video sequence patches rather than single images.
2020-09-27 11:13:06 -06:00
James Betker
d8c3fc9327 Fix random noise corruptor
It was functioning as a color shift
2020-09-27 11:12:24 -06:00
James Betker
c85da79697 Move many dataset functions into a base class 2020-09-27 11:11:58 -06:00
James Betker
eb12b5f887 Misc 2020-09-26 21:27:17 -06:00
James Betker
31641d7f63 Add ImagePatchInjector and TranslationalLoss 2020-09-26 21:25:32 -06:00
James Betker
d8621e611a BackboneSpineNoHead takes ref 2020-09-26 21:25:04 -06:00
James Betker
5a27187c59 More mods to accomodate new dataset 2020-09-25 22:45:57 -06:00
James Betker
254cb1e915 More dataset integration work 2020-09-25 22:19:38 -06:00
James Betker
6d0490a0e6 Tecogan implementation work 2020-09-25 16:38:23 -06:00
James Betker
ce4613ecb9 Finish up single_image_dataset work
Sweet!
2020-09-25 16:37:54 -06:00
James Betker
1cf73c2cce Fix dataset for a val set that includes lq 2020-09-24 18:01:07 -06:00
James Betker
ea565b7eaf More fixes 2020-09-24 17:51:52 -06:00
James Betker
553917a8d1 Fix torchvision import bug 2020-09-24 17:38:34 -06:00
James Betker
58886109d4 Update how spsr arches do attention to conform with sgsr 2020-09-24 16:53:54 -06:00
James Betker
9a50a7966d SiLU doesnt support inplace 2020-09-23 21:09:13 -06:00
James Betker
eda0eadba2 Use custom SiLU
Torch didnt have this before 1.7
2020-09-23 21:05:06 -06:00
James Betker
05963157c1 Several things
- Fixes to 'after' and 'before' defs for steps (turns out they werent working)
- Feature nets take in a list of layers to extract. Not fully implemented yet.
- Fixes bugs with RAGAN
- Allows real input into generator gan to not be detached by param
2020-09-23 11:56:36 -06:00
James Betker
4ab989e015 try again.. 2020-09-22 18:27:52 -06:00
James Betker
3b6c957194 Fix? it again? 2020-09-22 18:25:59 -06:00
James Betker
7b60d9e0d8 Fix? cosine loss 2020-09-22 18:18:35 -06:00
James Betker
2e18c4c22d Add CosineEmbeddingLoss to F 2020-09-22 17:10:29 -06:00
James Betker
f40beb5460 Add 'before' and 'after' defs to injections, steps and optimizers 2020-09-22 17:03:22 -06:00
James Betker
419f77ec19 Some new backbones 2020-09-21 12:36:49 -06:00
James Betker
9429544a60 Spinenet: implementation without 4x downsampling right off the bat 2020-09-21 12:36:30 -06:00
James Betker
384e3d54cc Extract images into jpg, have a multiplier & size threshold 2020-09-21 12:36:03 -06:00
James Betker
bde35ced47 Fix recursive detach 2020-09-20 19:08:13 -06:00
James Betker
53a5657850 Fix SSGR 2020-09-20 19:07:15 -06:00
James Betker
17c569ea62 Add geometric loss 2020-09-20 16:24:23 -06:00
James Betker
17dd99b29b Fix bug with discriminator noise addition
It wasn't using the scale and was applying the noise to the
underlying state variable.
2020-09-20 12:00:27 -06:00
James Betker
dab8ab8a8f Offer option to configure the size of the normal distribution that the target size is drawn from 2020-09-20 11:59:31 -06:00
James Betker
3138f98fbc Allow discriminator noise to be injected at the loss level, cleans up configs 2020-09-19 21:47:52 -06:00
James Betker
e9a39bfa14 Recursively detach all outputs, even if they are nested in data structures 2020-09-19 21:47:34 -06:00
James Betker
fe82785ba5 Add some new architectures to ssg 2020-09-19 21:47:10 -06:00
James Betker
b83f097082 Get rid of get_debug_values from RRDB, rectify outputs 2020-09-19 21:46:36 -06:00
James Betker
e0bd68efda Add ImageFlowInjector 2020-09-19 10:07:00 -06:00
James Betker
e2a146abc7 Add in experiments hook 2020-09-19 10:05:25 -06:00
James Betker
4f75cf0f02 Revert "Full image dataset operates on lists"
Going with an entirely new dataset instead..

This reverts commit 36ec32bf11.
2020-09-18 09:50:43 -06:00
James Betker
36ec32bf11 Full image dataset operates on lists 2020-09-18 09:50:26 -06:00
James Betker
3cb2a9a9d3 New dataset, initial work 2020-09-18 09:49:13 -06:00
James Betker
9a17ade550 Some convenience adjustments to ExtensibleTrainer 2020-09-17 21:05:32 -06:00
James Betker
57fc3f490c Add script for extracting image tiles with reference images 2020-09-17 13:30:51 -06:00
James Betker
9963b37200 Add a new script for loading a discriminator network and using it to filter images 2020-09-17 13:30:32 -06:00
James Betker
f5cd23e2d5 Further patch size adjustments 2020-09-16 16:50:35 -06:00
James Betker
723754c133 Update attention debugger outputting for SSG 2020-09-16 13:09:46 -06:00
James Betker
0b047e5f80 Increase scale of the patch selector random distribution
This will cause larger slices of an image to appear more frequently,
increasing the difficulty of the generator.
2020-09-16 08:27:42 -06:00
James Betker
f211575e9d Save models before validation
Validation often fails with OOM, wasting hours of training time.
Save models first.
2020-09-16 08:17:17 -06:00
James Betker
0918430572 SSG network
This branches off of SPSR. It is identical but substantially reduced
in complexity. It's intended to be my long term working arch.
2020-09-15 20:59:24 -06:00
James Betker
c833bd1eac Misc changes 2020-09-15 20:57:59 -06:00
James Betker
6deab85b9b Add BackboneEncoderNoRef 2020-09-15 16:55:38 -06:00
James Betker
d0321ca5de Don't load amp state dict if amp is disabled 2020-09-14 15:21:42 -06:00
James Betker
94deab2792 Fix error serving gt_fullsize_ref in full_image_dataset 2020-09-14 15:05:44 -06:00
James Betker
ccf8438001 SPSR5
This is SPSR4, but the multiplexers have access to the output of the transformations
for making their decision.
2020-09-13 20:10:24 -06:00
James Betker
5b85f891af Only log the name of the first network in the total_loss training set 2020-09-12 16:07:09 -06:00
James Betker
fb595e72a4 Supporting infrastructure in ExtensibleTrainer to train spsr4
Need to be able to train 2 nets in one step: the backbone will be entirely separate
with its own optimizer (for an extremely low LR).

This functionality was already present, just not implemented correctly.
2020-09-11 22:57:06 -06:00
James Betker
4e44bca611 SPSR4
aka - return of the backbone! I'm tired of massively overparameterized generators
with pile-of-shit multiplexers. Let's give this another try..
2020-09-11 22:55:37 -06:00
James Betker
19896abaea Clean up old SwitchedSpsr arch
It didn't work anyways, so why not?
2020-09-11 16:09:28 -06:00
James Betker
4c2ee66fe4 Fix video processor 2020-09-11 13:10:14 -06:00
James Betker
50ca17bb0a Feature mode -> back to LR fea 2020-09-11 13:09:55 -06:00
James Betker
1086f0476b Fix ref branch using fixed filters 2020-09-11 08:58:35 -06:00
James Betker
8c469b8286 Enable memory checkpointing 2020-09-11 08:44:29 -06:00
James Betker
5189b11dac Add combined dataset for training across multiple datasets 2020-09-11 08:44:06 -06:00
James Betker
313424d7b5 Add new referencing discriminator
Also extend the way losses work so that you can pass
parameters into the discriminator from the config file
2020-09-10 21:35:29 -06:00
James Betker
9e5aa166de Report the standard deviation of ref branches
This patch also ups the contribution
2020-09-10 16:34:41 -06:00
James Betker
668bfbff6d Back to best arch for spsr3 2020-09-10 14:58:14 -06:00
James Betker
992b0a8d98 spsr3 with conjoin stage as part of the switch 2020-09-10 09:11:37 -06:00
James Betker
e0fc5eb50c Temporary commit - noise 2020-09-09 17:12:52 -06:00
James Betker
00da69d450 Temporary commit - ref 2020-09-09 17:09:44 -06:00
James Betker
df59d6c99d More spsr3 mods
- Most branches get their own noise vector now.
- First attention branch has the intended sole purpose of raw image processing
- Remove norms from joiner block
2020-09-09 16:46:38 -06:00
James Betker
747ded2bf7 Fixes to the spsr3
Some lessons learned:
- Biases are fairly important as a relief valve. They dont need to be everywhere, but
  most computationally heavy branches should have a bias.
- GroupNorm in SPSR is not a great idea. Since image gradients are represented
   in this model, normal means and standard deviations are not applicable. (imggrad
   has a high representation of 0).
- Don't fuck with the mainline of any generative model. As much as possible, all
   additions should be done through residual connections. Never pollute the mainline
   with reference data, do that in branches. It basically leaves the mode untrainable.
2020-09-09 15:28:14 -06:00
James Betker
0ffac391c1 SPSR with ref joining 2020-09-09 11:17:07 -06:00
James Betker
c41dc9a48c Add missing requirements 2020-09-09 10:46:08 -06:00
James Betker
3027e6e27d Enable amp to be disabled 2020-09-09 10:45:59 -06:00
James Betker
c04f244802 More mods 2020-09-08 20:36:27 -06:00
James Betker
dffbfd2ec4 Allow SRG checkpointing to be toggled 2020-09-08 15:14:43 -06:00
James Betker
e6207d4c50 SPSR3 work
SPSR3 is meant to fix whatever is causing the switching units
inside of the newer SPSR architectures to fail and basically
not use the multiplexers.
2020-09-08 15:14:23 -06:00
James Betker
5606e8b0ee Fix SRGAN_model/fullimgdataset compatibility 1 2020-09-08 11:34:35 -06:00
James Betker
22c98f1567 Move MultiConvBlock to arch_util 2020-09-08 08:17:27 -06:00
James Betker
146ace0859 CSNLN changes (removed because it doesnt train well) 2020-09-08 08:04:16 -06:00
James Betker
f43df7f5f7 Make ExtensibleTrainer compatible with process_video 2020-09-08 08:03:41 -06:00
James Betker
a18ece62ee Add updated spsr net for test 2020-09-07 17:01:48 -06:00
James Betker
55475d2ac1 Clean up unused archs 2020-09-07 11:38:11 -06:00
James Betker
e8613041c0 Add novograd optimizer 2020-09-06 17:27:08 -06:00
James Betker
a5c2388368 Use lower LQ image size when it is being fed in 2020-09-06 17:26:32 -06:00
James Betker
b1238d29cb Fix trainable not applying to discriminators 2020-09-05 20:31:26 -06:00
James Betker
21ae135f23 Allow Novograd to be used as an optimizer 2020-09-05 16:50:13 -06:00
James Betker
912a4d9fea Fix srg computer bug 2020-09-05 07:59:54 -06:00
James Betker
0dfd8eaf3b Support injectors that run in eval only 2020-09-05 07:59:45 -06:00
James Betker
17aa205e96 New dataset that reads from lmdb 2020-09-04 17:32:57 -06:00
James Betker
44c75f7642 Undo SRG change 2020-09-04 17:32:16 -06:00
James Betker
6657a406ac Mods needed to support training a corruptor again:
- Allow original SPSRNet to have a specifiable block increment
- Cleanup
- Bug fixes in code that hasnt been touched in awhile.
2020-09-04 15:33:39 -06:00
James Betker
bfdfaab911 Checkpoint RRDB
Greatly reduces memory consumption with a low performance penalty
2020-09-04 15:32:00 -06:00
James Betker
8580490a85 Reduce usage of resize operations when not needed in dataloaders. 2020-09-04 15:31:24 -06:00
James Betker
6226b52130 Pin memory in dataloaders by default 2020-09-04 15:30:46 -06:00
James Betker
64a24503f6 Add extract_subimages_with_ref_lmdb for generating lmdb with reference images 2020-09-04 15:30:34 -06:00
James Betker
696242064c Use tensor checkpointing to drastically reduce memory usage
This comes at the expense of computation, but since we can use much larger
batches, it results in a net speedup.
2020-09-03 11:33:36 -06:00
James Betker
365813bde3 Add InterpolateInjector 2020-09-03 11:32:47 -06:00
James Betker
d90c96e55e Fix greyscale injector 2020-09-02 10:29:40 -06:00
James Betker
8b52d46847 Interpreted feature loss to extensibletrainer 2020-09-02 10:08:24 -06:00
James Betker
886d59d5df Misc fixes & adjustments 2020-09-01 07:58:11 -06:00
James Betker
0a9b85f239 Fix vgg_gn input_img_factor 2020-08-31 09:50:30 -06:00
James Betker
4b4d08bdec Enable testing in ExtensibleTrainer, fix it in SRGAN_model
Also compute fea loss for this.
2020-08-31 09:41:48 -06:00
James Betker
b2091cb698 feamod fix 2020-08-30 08:08:49 -06:00
James Betker
a56e906f9c train HR feature trainer 2020-08-29 09:27:48 -06:00
James Betker
0e859a8082 4x spsr ref (not workin) 2020-08-29 09:27:18 -06:00
James Betker
623f3b99b2 Stupid pathing.. 2020-08-26 17:58:24 -06:00
James Betker
25832930db Update loss with lr crossgan 2020-08-26 17:57:22 -06:00
James Betker
80aa83bfd2 Try copytree for tb_logger again. 2020-08-26 17:55:02 -06:00
James Betker
cbd5e7a986 Support old school crossgan in extensibletrainer 2020-08-26 17:52:35 -06:00
James Betker
b593d8e7c3 Save tb_logger to alt_path 2020-08-26 17:45:07 -06:00
James Betker
8a6a2e6e2e Rev3 of the full image ref arch 2020-08-26 17:11:01 -06:00
James Betker
f35b3ad28f Fix val behavior for ExtensibleTrainer 2020-08-26 08:44:22 -06:00
James Betker
434ed70a9a Wrap vgg disc 2020-08-25 18:14:45 -06:00
James Betker
83f2f8d239 more debugging 2020-08-25 18:12:12 -06:00
James Betker
3f60281da7 Print when wrapping 2020-08-25 18:08:46 -06:00
James Betker
bae18c05e6 wrap disc grad 2020-08-25 17:58:20 -06:00
James Betker
f85f1e21db Turns out, can't do that 2020-08-25 17:18:52 -06:00
James Betker
935a735327 More dohs 2020-08-25 17:05:16 -06:00
James Betker
53e67bdb9c Distribute get_grad_no_padding 2020-08-25 17:03:18 -06:00
James Betker
2f706b7d93 I an inept. 2020-08-25 16:42:59 -06:00
James Betker
8bae0de769 ffffffffffffffffff 2020-08-25 16:41:01 -06:00
James Betker
1fe16f71dd Fix bug reporting spsr gan weight 2020-08-25 16:37:45 -06:00
James Betker
96586d6592 Fix distributed d_grad 2020-08-25 16:06:27 -06:00
James Betker
09a9079e17 Check rank before doing image logging. 2020-08-25 16:00:49 -06:00
James Betker
a1800f45ef Fix for referencingmultiplexer 2020-08-25 15:43:12 -06:00
James Betker
19487d9bbd Fix distributed launch for large distributed runs 2020-08-25 15:42:59 -06:00
James Betker
03eb29a4d9 Fix LQGT dataset 2020-08-25 11:57:25 -06:00
James Betker
a65b07607c Reference network 2020-08-25 11:56:59 -06:00
James Betker
f224907603 Fix LQGT_dataset, add full_image_dataset 2020-08-24 17:12:43 -06:00
James Betker
5ec04aedc8 Let noise be configurable
LQ noise is not currently configurable for some reason..
2020-08-24 15:00:14 -06:00
James Betker
f9276007a8 More fixes to corrupt_fea 2020-08-23 17:52:18 -06:00
James Betker
0005c56cd4 dbg 2020-08-23 17:43:03 -06:00
James Betker
4bb5b3c981 corfea debugging 2020-08-23 17:39:02 -06:00
James Betker
7713cb8df5 Corrupted features in srgan 2020-08-23 17:32:03 -06:00
James Betker
dffc15184d More ExtensibleTrainer work
It runs now, just need to debug it to reach performance parity with SRGAN. Sweet.
2020-08-23 17:22:45 -06:00
James Betker
afdd93fbe9 Grey feature 2020-08-22 13:41:38 -06:00
James Betker
e59e712e39 More ExtensibleTrainer work 2020-08-22 13:08:33 -06:00
James Betker
f40545f235 ExtensibleTrainer work 2020-08-22 08:24:34 -06:00
James Betker
a498d7b1b3 Report l_g_gan_grad before weight multiplication 2020-08-20 11:57:53 -06:00
James Betker
9d77a4db2e Allow initial temperature to be specified to SPSR net for inference 2020-08-20 11:57:34 -06:00
James Betker
24bdcc1181 Let SwitchedSpsr transform count be specified 2020-08-18 09:10:25 -06:00
James Betker
40bb0597bb misc 2020-08-18 08:50:24 -06:00
James Betker
74cdaa2226 Some work on extensible trainer 2020-08-18 08:49:32 -06:00
James Betker
0c98c61f4a Enable start_step to be specified 2020-08-15 18:34:59 -06:00
James Betker
868d0aa442 Undo early dim reduction on grad branch for SPSR_arch 2020-08-14 16:23:42 -06:00
James Betker
2d205f52ac Unite spsr_arch switched gens
Found a pretty good basis model.
2020-08-12 17:04:45 -06:00
James Betker
bdaa67deb7 Misc 2020-08-12 08:46:15 -06:00
James Betker
3d0ece804b SPSR LR2 2020-08-12 08:45:49 -06:00
James Betker
ab04ca1778 Extensible trainer (in progress) 2020-08-12 08:45:23 -06:00
James Betker
cb316fabc7 Use LR data for image gradient prediction when HR data is disjoint 2020-08-10 15:00:28 -06:00
James Betker
f0e2816239 Denoise attention maps 2020-08-10 14:59:58 -06:00
James Betker
59aba1daa7 LR switched SPSR arch
This variant doesn't do conv processing at HR, which should save
a ton of memory in inference. Lets see how it works.
2020-08-10 13:03:36 -06:00
James Betker
4e972144ae More attention fixes for switched_spsr 2020-08-07 21:11:50 -06:00
James Betker
d02509ef97 spsr_switched missing import 2020-08-07 21:05:29 -06:00
James Betker
887806ffa0 Finish up spsr_switched 2020-08-07 21:03:48 -06:00
James Betker
1d5f4f6102 Crossgan 2020-08-07 21:03:39 -06:00
James Betker
fd7b6ca0a9 Comptue gan_grad_branch.... 2020-08-06 12:11:40 -06:00
James Betker
30b16d5235 Update how branch GAN grad is disseminated 2020-08-06 11:13:02 -06:00
James Betker
1f21c02f8b Add cross-compare discriminator 2020-08-06 08:56:21 -06:00
James Betker
be272248af More RAGAN fixes 2020-08-05 16:47:21 -06:00
James Betker
26a6a5d512 Compute grad GAN loss against both the branch and final target, simplify pixel loss
Also fixes a memory leak issue where we weren't detaching our loss stats when
logging them. This stabilizes memory usage substantially.
2020-08-05 12:08:15 -06:00
James Betker
299ee13988 More RAGAN fixes 2020-08-05 11:03:06 -06:00
James Betker
b8a4df0a0a Enable RAGAN in SPSR, retrofit old RAGAN for efficiency 2020-08-05 10:34:34 -06:00
James Betker
3ab39f0d22 Several new spsr nets 2020-08-05 10:01:24 -06:00
James Betker
3c0a2d6efe Fix grad branch debug out 2020-08-04 16:43:43 -06:00
James Betker
ec2a795d53 Fix multistep optimizer (feeding from wrong config params) 2020-08-04 16:42:58 -06:00
James Betker
4bfbdaf94f Don't recompute generator outputs for D in standard operation
Should significantly improve training performance with negligible
results differences.
2020-08-04 11:28:52 -06:00
James Betker
11b227edfc Whoops 2020-08-04 10:30:40 -06:00
James Betker
6d25bcd5df Apply fixes to grad discriminator 2020-08-04 10:25:13 -06:00
James Betker
96d66f51c5 Update requirements 2020-08-03 16:57:56 -06:00
James Betker
c7e5d3888a Add pix_grad_branch loss to metrics 2020-08-03 16:21:05 -06:00
James Betker
0d070b47a7 Add simplified SPSR architecture
Basically just cleaning up the code, removing some bad conventions,
and reducing complexity somewhat so that I can play around with
this arch a bit more easily.
2020-08-03 10:25:37 -06:00
James Betker
47e24039b5 Fix bug that makes feature loss run even when it is off 2020-08-02 20:37:51 -06:00
James Betker
328afde9c0 Integrate SPSR into SRGAN_model
SPSR_model really isn't that different from SRGAN_model. Rather than continuing to re-implement
everything I've done in SRGAN_model, port the new stuff from SPSR over.

This really demonstrates the need to refactor SRGAN_model a bit to make it cleaner. It is quite the
beast these days..
2020-08-02 12:55:08 -06:00
James Betker
c8da78966b Substantial SPSR mods & fixes
- Added in gradient accumulation via mega-batch-factor
- Added AMP
- Added missing train hooks
- Added debug image outputs
- Cleaned up including removing GradientPenaltyLoss, custom SpectralNorm
- Removed all the custom discriminators
2020-08-02 10:45:24 -06:00
James Betker
f894ba8f98 Add SPSR_module
This is a port from the SPSR repo, it's going to need a lot of work to be properly integrated
but as of this commit it at least runs.
2020-08-01 22:02:54 -06:00
James Betker
f33ed578a2 Update how attention_maps are created 2020-08-01 20:23:46 -06:00
James Betker
c139f5cd17 More torch 1.6 fixes 2020-07-31 17:03:20 -06:00
James Betker
a66fbb32b6 Fix fixed_disc DataParallel issue 2020-07-31 16:59:23 -06:00
James Betker
8dd44182e6 Fix scale torch warning 2020-07-31 16:56:04 -06:00
James Betker
bcebed19b7 Fix pixdisc bugs 2020-07-31 16:38:14 -06:00
James Betker
eb11a08d1c Enable disjoint feature networks
This is done by pre-training a feature net that predicts the features
of HR images from LR images. Then use the original feature network
and this new one in tandem to work only on LR/Gen images.
2020-07-31 16:29:47 -06:00
James Betker
6e086d0c20 Fix fixed_disc 2020-07-31 15:07:10 -06:00
James Betker
d5fa059594 Add capability to have old discriminators serve as feature networks 2020-07-31 14:59:54 -06:00
James Betker
6b45b35447 Allow multi_step_lr_scheduler to load a new LR schedule when restoring state 2020-07-31 11:21:11 -06:00
James Betker
e37726f302 Add feature_model for training custom feature nets 2020-07-31 11:20:39 -06:00
James Betker
7629cb0e61 Add FDPL Loss
New loss type that can replace PSNR loss. Works against the frequency domain
and focuses on frequency features loss during hr->lr conversion.
2020-07-30 20:47:57 -06:00
James Betker
85ee64b8d9 Turn down feadisc intensity
Honestly - this feature is probably going to be removed soon, so backwards
compatibility is not a huge deal anymore.
2020-07-27 15:28:55 -06:00
James Betker
ebb199e884 Get rid of safety valve (probably being encountered in val) 2020-07-26 22:51:59 -06:00
James Betker
0892d5fe99 LQGT_dataset gan debug 2020-07-26 22:48:35 -06:00
James Betker
d09ed4e5f7 Misc fixes 2020-07-26 22:44:24 -06:00
James Betker
c54784ae9e Fix feature disc log item error 2020-07-26 22:25:59 -06:00
James Betker
9a8f227501 Allow separate dataset to pushed in for GAN-only training 2020-07-26 21:44:45 -06:00
James Betker
b06e1784e1 Fix SRG4 & switch disc
"fix". hehe.
2020-07-25 17:16:54 -06:00
James Betker
e6e91a1d75 Add SRG4
Back to the idea that maybe what we need is a hybrid
approach between pure switches and RDB.
2020-07-24 20:32:49 -06:00
James Betker
3320ad685f Fix mega_batch_factor not set for test 2020-07-24 12:26:44 -06:00
James Betker
c50cce2a62 Add an abstract, configurabler weight scheduling class and apply it to the feature weight 2020-07-23 17:03:54 -06:00
James Betker
9ccf771629 Fix feature validation, wrong device
Only shows up in distributed training for some reason.
2020-07-23 10:16:34 -06:00
James Betker
a7541b6d8d Fix illegal tb_logger use in distributed training 2020-07-23 09:14:01 -06:00
James Betker
bba283776c Enable find_unused_parameters for DistributedDataParallel
attention_norm has some parameters which are not used to compute grad,
which is causing failures in the distributed case.
2020-07-23 09:08:13 -06:00
James Betker
dbf6147504 Add switched discriminator
The logic is that the discriminator may be incapable of providing a truly
targeted loss for all image regions since it has to be too generic
(basically the same argument for the switched generator). So add some
switches in! See how it works!
2020-07-22 20:52:59 -06:00
James Betker
8a0a1569f3 Enable force_multiple in LQ_dataset 2020-07-22 20:51:16 -06:00
James Betker
106b8da315 Assert that temperature is set properly in eval mode. 2020-07-22 20:50:59 -06:00
James Betker
c74b9ee2e4 Add a way to disable grad on portions of the generator graph to save memory 2020-07-22 11:40:42 -06:00
James Betker
e3adafbeac Add convert_model.py and a hacky way to add extra layers to a model 2020-07-22 11:39:45 -06:00
James Betker
7f7e17e291 Update feature discriminator further
Move the feature/disc losses closer and add a feature computation layer.
2020-07-20 20:54:45 -06:00
James Betker
46aa776fbb Allow feature discriminator unet to only output closest layer to feature output 2020-07-19 19:05:08 -06:00
James Betker
8a9f215653 Huge set of mods to support progressive generator growth 2020-07-18 14:18:48 -06:00
James Betker
47a525241f Make attention norm optional 2020-07-18 07:24:02 -06:00
James Betker
ad97a6a18a Progressive SRG first check-in 2020-07-18 07:23:26 -06:00
James Betker
b08b1cad45 Fix feature decay 2020-07-16 23:27:06 -06:00
James Betker
3e7a83896b Fix pixgan debugging issues 2020-07-16 11:45:19 -06:00
James Betker
a1bff64d1a More fixes 2020-07-16 10:48:48 -06:00
James Betker
240f254263 More loss fixes 2020-07-16 10:45:50 -06:00
James Betker
6cfa67d831 Fix featuredisc broadcast error 2020-07-16 10:18:30 -06:00
James Betker
8d061a2687 Add u-net discriminator with feature output 2020-07-16 10:10:09 -06:00
James Betker
0c4c388e15 Remove dualoutputsrg
Good idea, didn't pan out.
2020-07-16 10:09:24 -06:00
James Betker
4bcc409fc7 Fix loadSRG2 typo 2020-07-14 10:20:53 -06:00
James Betker
1e4083a35b Apply temperature mods to all SRG models
(Honestly this needs to be base classed at this point)
2020-07-14 10:19:35 -06:00
James Betker
7659bd6818 Fix temperature equation 2020-07-14 10:17:14 -06:00
James Betker
853468ef82 Allow legacy state_dicts in srg2 2020-07-14 10:03:45 -06:00
James Betker
1b1431133b Add DualOutputSRG
Also removes the old multi-return mechanism that Generators support.
Also fixes AttentionNorm.
2020-07-14 09:28:24 -06:00
James Betker
a2285ff2ee Scale anorm by transform count 2020-07-13 08:49:09 -06:00
James Betker
dd0bbd9a7c Enable AttentionNorm on SRG2 2020-07-13 08:38:17 -06:00
James Betker
4c0f770f2a Fix inverted temperature curve bug 2020-07-12 11:02:50 -06:00
James Betker
14d23b9d20 Fixes, do fake swaps less often in pixgan discriminator 2020-07-11 21:22:11 -06:00
James Betker
ba6187859a err5 2020-07-10 23:02:56 -06:00
James Betker
902527dfaa err4 2020-07-10 23:00:21 -06:00
James Betker
020b3361fa err3 2020-07-10 22:57:34 -06:00
James Betker
b3a2c21250 err2 2020-07-10 22:52:02 -06:00
James Betker
716433db1f err1 2020-07-10 22:50:56 -06:00
James Betker
ef9f1307eb Sometimes don't use compression artifacts 2020-07-10 22:25:53 -06:00
James Betker
0b7193392f Implement unet disc
The latest discriminator architecture was already pretty much a unet. This
one makes that official and uses shared layers. It also upsamples one additional
time and throws out the lowest upsampling result.

The intent is to delete the old vgg pixdisc, but I'll keep it around for a bit since
I'm still trying out a few models with it.
2020-07-10 16:24:42 -06:00
James Betker
812c684f7d Update pixgan swap algorithm
- Swap multiple blocks in the image instead of just one. The discriminator was clearly
  learning that most blocks have one region that needs to be fixed.
- Relax block size constraints. This was in place to gaurantee that the discriminator
  signal was clean. Instead, just downsample the "loss image" with bilinear interpolation.
  The result is noisier, but this is actually probably healthy for the discriminator.
2020-07-10 15:56:14 -06:00
James Betker
33ca3832e1 Move ExpansionBlock to arch_util
Also makes all processing blocks have a conformant signature.

Alters ExpansionBlock to perform a processing conv on the passthrough
before the conjoin operation - this will break backwards compatibilty with SRG2.
2020-07-10 15:53:41 -06:00
James Betker
5e8b52f34c Misc changes 2020-07-10 09:45:48 -06:00
James Betker
5f2c722a10 SRG2 revival
Big update to SRG2 architecture to pull in a lot of things that have been learned:
- Use group norm instead of batch norm
- Initialize the weights on the transformations low like is done in RRDB rather than using the scalar. Models live or die by their early stages, and this ones early stage is pretty weak
- Transform multiplexer to use u-net like architecture.
- Just use one set of configuration variables instead of a list - flat networks performed fine in this regard.
2020-07-09 17:34:51 -06:00
James Betker
12da993da8 More fixes... 2020-07-08 22:07:09 -06:00
James Betker
7d6eb28b87 More fixes 2020-07-08 22:00:57 -06:00
James Betker
b2507be13c Fix up pixgan loss and pixdisc 2020-07-08 21:27:48 -06:00
James Betker
26a4a66d1c Bug fixes and new gan mechanism
- Removed a bunch of unnecessary image loggers. These were just consuming space and never being viewed
- Got rid of support of artificial var_ref support. The new pixdisc is what i wanted to implement then - it's much better.
- Add pixgan GAN mechanism. This is purpose-built for the pixdisc. It is intended to promote a healthy discriminator
- Megabatchfactor was applied twice on metrics, fixed that

Adds pix_gan (untested) which swaps a portion of the fake and real image with each other, then expects the discriminator
to properly discriminate the swapped regions.
2020-07-08 17:40:26 -06:00
James Betker
4305be97b4 Update log metrics
They should now be universal regardless of job configuration
2020-07-07 15:33:22 -06:00
James Betker
8a4eb8241d SRG3 work
Operates on top of a pre-trained SpineNET backbone (trained on CoCo 2017 with RetinaNet)

This variant is extremely shallow.
2020-07-07 13:46:40 -06:00
James Betker
0acad81035 More SRG2 adjustments.. 2020-07-06 22:40:40 -06:00
James Betker
086b2f0570 More bugs 2020-07-06 22:28:07 -06:00
James Betker
d4d4f85fc0 Bug fixes 2020-07-06 22:25:40 -06:00
James Betker
3c31bea1ac SRG2 architectural changes 2020-07-06 22:22:29 -06:00
James Betker
9a1c3241f5 Switch discriminator to groupnorm 2020-07-06 20:59:59 -06:00
James Betker
60c6352843 Misc 2020-07-06 20:44:07 -06:00
James Betker
6beefa6d0c PixDisc - Add two more levels of losses coming from this gen at higher resolutions 2020-07-06 11:15:52 -06:00
James Betker
2636d3b620 Fix assertion error 2020-07-06 09:23:53 -06:00
James Betker
8f92c0a088 Interpolate attention well before softmax 2020-07-06 09:18:30 -06:00
James Betker
72f90cabf8 More pixdisc fixes 2020-07-05 22:03:16 -06:00
James Betker
909007ee6a Add G_warmup
Let the Generator get to a point where it is at least competing with the discriminator before firing off.

Backwards from most GAN architectures, but this one is a bit different from most.
2020-07-05 21:58:35 -06:00
James Betker
a47a5dca43 Fix pixdisc bug 2020-07-05 21:57:52 -06:00
James Betker
d0957bd7d4 Alter weight initialization for transformation blocks 2020-07-05 17:32:46 -06:00
James Betker
16d1bf6dd7 Replace ConvBnRelus in SRG2 with Silus 2020-07-05 17:29:20 -06:00
James Betker
10f7e49214 Add ConvBnSilu to replace ConvBnRelu
Relu produced good performance gains over LeakyRelu, but
GAN performance degraded significantly. Try SiLU as an alternative
to see if it's the leaky-ness we are looking for or the smooth activation
curvature.
2020-07-05 13:39:08 -06:00
James Betker
9934e5d082 Move SRG1 to identical to new 2020-07-05 08:49:34 -06:00
James Betker
416538f31c SRG1 conjoined except ConvBnRelu 2020-07-05 08:44:17 -06:00
James Betker
c58c2b09ca Back to remove all biases (looks like a ConvBnRelu made its way in..) 2020-07-04 22:41:02 -06:00
James Betker
86cda86e94 Re-add biases, also add new init
A/B testing where we lost our GAN competitiveness.
2020-07-04 22:24:42 -06:00
James Betker
b03741f30e Remove all biases from generator
Continuing to investigate loss of GAN competitiveness, this is a big difference
between "old" SRG1 and "new".
2020-07-04 22:19:55 -06:00
James Betker
726e946e79 Turn BN off in SRG1
This wont work well but just testing if GAN performance comes back
2020-07-04 14:51:27 -06:00
James Betker
0ee39d419b OrderedDict not needed 2020-07-04 14:09:27 -06:00
James Betker
9048105b72 Break out SRG1 as separate network
Something strange is going on. These networks do not respond to
discriminator gradients properly anymore. SRG1 did, however so
reverting back to last known good state to figure out why.
2020-07-04 13:28:50 -06:00
James Betker
188de5e15a Misc changes 2020-07-04 13:22:50 -06:00
James Betker
510b2f887d Remove RDB from srg2
Doesnt seem to work so great.
2020-07-03 22:31:20 -06:00
James Betker
77d3765364 Fix new feature loss calc 2020-07-03 22:20:13 -06:00
James Betker
ed6a15e768 Add feature to dataset which allows it to force images to be a certain size. 2020-07-03 15:19:16 -06:00
James Betker
da4335c25e Add a feature-based validation test 2020-07-03 15:18:57 -06:00
James Betker
703dec4472 Add SpineNet & integrate with SRG
New version of SRG uses SpineNet for a switch backbone.
2020-07-03 12:07:31 -06:00
James Betker
3ed7a2b9ab Move ConvBnRelu/Lelu to arch_util 2020-07-03 12:06:38 -06:00
James Betker
ea9c6765ca Move train imports into init_dist 2020-07-02 15:11:21 -06:00
James Betker
e9ee67ff10 Integrate RDB into SRG
The last RDB for each cluster is switched.
2020-07-01 17:19:55 -06:00
James Betker
6ac6c95177 Fix scaling bug 2020-07-01 16:42:27 -06:00
James Betker
30653181ba Experiment: get rid of post_switch_conv 2020-07-01 16:30:40 -06:00
James Betker
17191de836 Experiment: bring initialize_weights back again
Something really strange going on here..
2020-07-01 15:58:13 -06:00
James Betker
d1d573de07 Experiment: new init and post-switch-conv 2020-07-01 15:25:54 -06:00
James Betker
480d1299d7 Remove RRDB with switching
This idea never really panned out, removing it.
2020-07-01 12:08:32 -06:00
James Betker
e2398ac83c Experiment: revert initialization changes 2020-07-01 12:08:09 -06:00
James Betker
78276afcaa Experiment: Back to lelu 2020-07-01 11:43:25 -06:00
James Betker
b945021c90 SRG v2 - Move to Relu, rely on Module-based initialization 2020-07-01 11:33:32 -06:00
James Betker
ee6443ad7d Add numeric stability computation script 2020-07-01 11:30:34 -06:00
James Betker
c0bb123504 Misc changes 2020-07-01 11:28:23 -06:00
James Betker
604763be68 NSG r7
Converts the switching trunk to a VGG-style network to make it more comparable
to SRG architectures.
2020-07-01 09:54:29 -06:00
James Betker
87f1e9c56f Invert ResGen2 to operate in LR space 2020-06-30 20:57:40 -06:00
James Betker
e07d8abafb NSG rev 6
- Disable style passthrough
- Process multiplexers starting at base resolution
2020-06-30 20:47:26 -06:00
James Betker
3ce1a1878d NSG improvements (r5)
- Get rid of forwards(), it makes numeric_stability.py not work properly.
- Do stability auditing across layers.
- Upsample last instead of first, work in much higher dimensionality for transforms.
2020-06-30 16:59:57 -06:00
James Betker
75f148022d Even more NSG improvements (r4) 2020-06-30 13:52:47 -06:00
James Betker
773753073f More NSG improvements (v3)
Move to a fully fixup residual network for the switch (no
batch norms). Fix a bunch of other small bugs. Add in a
temporary latent feed-forward from the bottom of the
switch. Fix several initialization issues.
2020-06-29 20:26:51 -06:00
James Betker
4b82d0815d NSG improvements
- Just use resnet blocks for the multiplexer trunk of the generator
- Every block initializes itself, rather than everything at the end
- Cleans up some messy parts of the architecture, including unnecessary
  kernel sizes and places where BN is not used properly.
2020-06-29 10:09:51 -06:00
James Betker
978036e7b3 Add NestedSwitchGenerator
An evolution of SwitchedResidualGenerator, this variant nests attention
modules upon themselves to extend the representative capacity of the
model significantly.
2020-06-28 21:22:05 -06:00
James Betker
6f2bc36c61 Distill_torchscript mods
Starts down the path of writing a custom trace that works using torch's hook mechanism.
2020-06-27 08:28:09 -06:00
James Betker
db08dedfe2 Add recover_tensorboard_log
Generates a tb_logger from raw console output. Useful for colab sessions
that crash.
2020-06-27 08:26:57 -06:00
James Betker
c8a670842e Missed networks.py in last commit 2020-06-25 18:36:06 -06:00
James Betker
407224eba1 Re-work SwitchedResgen2
Got rid of the converged multiplexer bases but kept the configurable architecture. The
new multiplexers look a lot like the old one.

Took some queues from the transformer architecture: translate image to a higher filter-space
and stay there for the duration of the models computation. Also perform convs after each
switch to allow the model to anneal issues that arise.
2020-06-25 18:17:05 -06:00
James Betker
42a10b34ce Re-enable batch norm on switch processing blocks
Found out that batch norm is causing the switches to init really poorly -
not using a significant number of transforms. Might be a great time to
re-consider using the attention norm, but for now just re-enable it.
2020-06-24 21:15:17 -06:00
James Betker
4001db1ede Add ConfigurableSwitchComputer 2020-06-24 19:49:37 -06:00
James Betker
83c3b8b982 Add parameterized noise injection into resgen 2020-06-23 10:16:02 -06:00
James Betker
0584c3b587 Add negative_transforms switch to resgen 2020-06-23 09:41:12 -06:00
James Betker
dfcbe5f2db Add capability to place additional conv into discriminator
This should allow us to support larger images sizes. May need
to add another one of these.
2020-06-23 09:40:33 -06:00
James Betker
bad33de906 Add simple resize to extract images 2020-06-23 09:39:51 -06:00
James Betker
030648f2bc Remove batchnorms from resgen 2020-06-22 17:23:36 -06:00
James Betker
68bcab03ae Add growth channel to switch_growths for flat networks 2020-06-22 10:40:16 -06:00
James Betker
3b81712c49 Remove BN from transforms 2020-06-19 16:52:56 -06:00
James Betker
61364ec7d0 Fix inverse temperature curve logic and add upsample factor 2020-06-19 09:18:30 -06:00
James Betker
0551139b8d Fix resgen temperature curve below 1
It needs to be inverted to maintain a true linear curve
2020-06-18 16:08:07 -06:00
James Betker
efc80f041c Save & load amp state 2020-06-18 11:38:48 -06:00
James Betker
2e3b6bad77 Log tensorboard directly into experiments directory 2020-06-18 11:33:02 -06:00
James Betker
778e7b6931 Add a double-step to attention temperature 2020-06-18 11:29:31 -06:00
James Betker
d2d5e097d5 Add profiling to SRGAN for testing timings 2020-06-18 11:29:10 -06:00
James Betker
45a900fafe Misc 2020-06-18 11:28:55 -06:00
James Betker
59b0533b06 Fix attimage step size 2020-06-17 18:45:24 -06:00
James Betker
645d0ca767 ResidualGen mods
- Add filters_mid spec which allows a expansion->squeeze for the transformation layers.
- Add scale and bias AFTER the switch
- Remove identity transform (models were converging on this)
- Move attention image generation and temperature setting into new function which gets called every step with a save path
2020-06-17 17:18:28 -06:00
James Betker
6f8406fbdc Fixed ConfigurableSwitchedGenerator bug 2020-06-16 16:53:57 -06:00
James Betker
7d541642aa Get rid of SwitchedResidualGenerator
Just use the configurable one instead..
2020-06-16 16:23:29 -06:00
James Betker
379b96eb55 Output histograms with SwitchedResidualGenerator
This also fixes the initialization weight for the configurable generator.
2020-06-16 15:54:37 -06:00
James Betker
f8b67f134b Get proper contiguous view for backwards compatibility 2020-06-16 14:27:16 -06:00
James Betker
2def96203e Mods to SwitchedResidualGenerator_arch
- Increased processing for high-resolution switches
- Do stride=2 first in HalvingProcessingBlock
2020-06-16 14:19:12 -06:00
James Betker
70c764b9d4 Create a configurable SwichedResidualGenerator
Also move attention image generator out of repo
2020-06-16 13:24:07 -06:00
James Betker
df1046c318 New arch: SwitchedResidualGenerator_arch
The concept here is to use switching to split the generator into two functions:
interpretation and transformation. Transformation is done at the pixel level by
relatively simple conv layers, while interpretation is computed at various levels
by far more complicated conv stacks. The two are merged using the switching
mechanism.

This architecture is far less computationally intensive that RRDB.
2020-06-16 11:23:50 -06:00
James Betker
ddfd7f67a0 Get rid of biggan
Not really sure it's a great fit for what is being done here.
2020-06-16 11:21:44 -06:00
James Betker
0a714e8451 Fix initialization in mhead switched rrdb 2020-06-15 21:32:03 -06:00
James Betker
be7982b9ae Add skip heads to switcher
These pass through the input so that it can be selected by the attention mechanism.
2020-06-14 12:46:54 -06:00
James Betker
6c27ddc9b5 Misc 2020-06-14 11:03:02 -06:00
James Betker
6c0e9f45c7 Add GPU mem tracing module 2020-06-14 11:02:54 -06:00
James Betker
48532a0a8a Fix initial_stride on lowdim models 2020-06-14 11:02:16 -06:00
James Betker
532704af40 Multiple modifications for experimental RRDB architectures
- Add LowDimRRDB; essentially a "normal RRDB" but the RDB blocks process at a low dimension using PixelShuffle
- Add switching wrappers around it
- Add support for switching on top of multi-headed inputs and outputs
- Moves PixelUnshuffle to arch_util
2020-06-13 11:37:27 -06:00
James Betker
e89f28ead0 Update multirrdb to do HR fixing in the base image dimension. 2020-06-11 08:43:39 -06:00
James Betker
d3b2cbfe7c Fix loading new state dicts for RRDB 2020-06-11 08:25:57 -06:00
James Betker
5ca53e7786 Add alternative first block for PixShuffleRRDB 2020-06-10 21:45:24 -06:00
James Betker
43b7fccc89 Fix mhead attention integration bug for RRDB 2020-06-10 12:02:33 -06:00
James Betker
12e8fad079 Add serveral new RRDB architectures 2020-06-09 13:28:55 -06:00
James Betker
296135ec18 Add doResizeLoss to dataset
doResizeLoss has a 50% chance to resize the LQ image to 50% size,
then back to original size. This is useful to training a generator to
recover these lost pixel values while also being able to do
repairs on higher resolution images during training.
2020-06-08 11:27:06 -06:00
James Betker
786a4288d6 Allow switched RRDBNet to record metrics and decay temperature 2020-06-08 11:10:38 -06:00
James Betker
ae3301c0ea SwitchedRRDB work
Renames AttentiveRRDB to SwitchedRRDB. Moves SwitchedConv to
an external repo (neonbjb/switchedconv). Switchs RDB blocks instead
of conv blocks. Works good!
2020-06-08 08:47:34 -06:00
James Betker
93528ff8df Merge branch 'gan_lab' of https://github.com/neonbjb/mmsr into gan_lab 2020-06-07 16:59:31 -06:00
James Betker
805bd129b7 Switched conv partial impl 2020-06-07 16:59:22 -06:00
James Betker
9e203d07c4 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-06-07 16:56:12 -06:00
James Betker
299d855b34 Enable forced learning rates 2020-06-07 16:56:05 -06:00
James Betker
efb5b3d078 Add switched_conv 2020-06-07 16:45:07 -06:00
James Betker
063719c5cc Fix attention conv bugs 2020-06-06 18:31:02 -06:00
James Betker
cbedd6340a Add RRDB with attention 2020-06-05 21:02:08 -06:00
James Betker
ef5d8a0ed1 Misc 2020-06-05 21:01:50 -06:00
James Betker
318a604405 Allow weighting of input data
This essentially allows you to give some datasets more
importance than others for the purposes of reaching a more
refined network.
2020-06-04 10:05:21 -06:00
James Betker
edf0f8582e Fix rrdb bug 2020-06-02 11:15:55 -06:00
James Betker
dc17545083 Add RRDB Initial Stride
Allows downsampling immediately before processing, which reduces network complexity on
higher resolution images but keeps a higher filter count.
2020-06-02 10:47:15 -06:00
James Betker
76a38b6a53 Misc 2020-06-02 09:35:52 -06:00
James Betker
726d1913ac Allow validating in batches, remove val size limit 2020-06-02 08:41:22 -06:00