Commit Graph

1140 Commits

Author SHA1 Message Date
James Betker
c717765bcb Notes for lucidrains converter. 2020-12-18 09:55:38 -07:00
James Betker
b4720ea377 Move stylegan to new location 2020-12-18 09:52:36 -07:00
James Betker
1708136b55 Commit my attempt at "conforming" the lucidrains stylegan implementation to the reference spec. Not working. will probably be abandoned. 2020-12-18 09:51:48 -07:00
James Betker
209332292a Rosinality stylegan fix 2020-12-18 09:50:41 -07:00
James Betker
d875ca8342 More refactor changes 2020-12-18 09:24:31 -07:00
James Betker
5640e4efe4 More refactoring 2020-12-18 09:18:34 -07:00
James Betker
b905b108da Large cleanup
Removed a lot of old code that I won't be touching again. Refactored some
code elements into more logical places.
2020-12-18 09:10:44 -07:00
James Betker
2f0a52b7db misc changes 2020-12-18 08:53:45 -07:00
James Betker
a8179ff53c Image label work 2020-12-18 08:53:18 -07:00
James Betker
3074f41877 Get rosinality model converter to work
Mostly, just needed to remove the custom cuda ops, not so bueno on Windows.
2020-12-17 16:03:39 -07:00
James Betker
e838c6e75b Rosinality stylegan2 port 2020-12-17 14:18:46 -07:00
James Betker
12cf052889 Add an image patch labeling UI 2020-12-17 10:16:21 -07:00
James Betker
49327b99fe SRFlow outputs RRDB output 2020-12-16 10:28:02 -07:00
James Betker
c25b49bb12 Clean up of SRFlowNet_arch 2020-12-16 10:27:38 -07:00
James Betker
42ac8e3eeb Remove unnecessary comment from SRFlowNet 2020-12-16 09:43:07 -07:00
James Betker
fb2cfc795b Update requirements, add image_patch_classifier tool 2020-12-16 09:42:50 -07:00
James Betker
09de3052ac Add softmax to spinenet classification head 2020-12-16 09:42:15 -07:00
James Betker
4310e66848 Fix bug in 'corrupt_before_downsize=true' 2020-12-16 09:41:59 -07:00
James Betker
8661207d57 Merge branch 'gan_lab' of https://github.com/neonbjb/DL-Art-School into gan_lab 2020-12-15 17:16:48 -07:00
James Betker
fc376d34b2 Spinenet with logits head 2020-12-15 17:16:19 -07:00
James Betker
8e0e883050 Mods to support labeled datasets & random augs for those datasets 2020-12-15 17:15:56 -07:00
James Betker
e5a3e6b9b5 srflow latent space misc 2020-12-14 23:59:49 -07:00
James Betker
1e14635d88 Add exclusions to extract_subimages_with_ref 2020-12-14 23:59:41 -07:00
James Betker
0a19e53df0 BYOL mods 2020-12-14 23:59:11 -07:00
James Betker
ef7eabf457 Allow RRDB to upscale 8x 2020-12-14 23:58:52 -07:00
James Betker
087e9280ed Add labeling feature to image_folder_dataset 2020-12-14 23:58:37 -07:00
James Betker
ec0ee25f4b Structural latents checkpoint 2020-12-11 12:01:09 -07:00
James Betker
26ceca68c0 BYOL with structure! 2020-12-10 15:07:35 -07:00
James Betker
9c5e272a22 Script to extract models from a wrapped BYOL model 2020-12-10 09:57:52 -07:00
James Betker
a5630d282f Get rid of 2nd trainer 2020-12-10 09:57:38 -07:00
James Betker
8e4b9f42fd New BYOL dataset which uses a form of RandomCrop that lends itself to
structural guidance to the latents.
2020-12-10 09:57:18 -07:00
James Betker
c203cee31e Allow swapping to torch DDP as needed in code 2020-12-09 15:03:59 -07:00
James Betker
66cbae8731 Add random_dataset for testing 2020-12-09 14:55:05 -07:00
James Betker
97ff25a086 BYOL!
Man, is there anything ExtensibleTrainer can't train? :)
2020-12-08 13:07:53 -07:00
James Betker
5369cba8ed Stage 2020-12-08 00:33:07 -07:00
James Betker
bca59ed98a Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-12-07 12:51:04 -07:00
James Betker
ea56eb61f0 Fix DDP errors for discriminator
- Don't define training_net in define_optimizers - this drops the shell and leads to problems downstream
- Get rid of support for multiple training nets per opt. This was half baked and needs a better solution if needed
   downstream.
2020-12-07 12:50:57 -07:00
James Betker
c0aeaabc31 Spinenet playground 2020-12-07 12:49:32 -07:00
James Betker
88fc049c8d spinenet latent playground! 2020-12-05 20:30:36 -07:00
James Betker
11155aead4 Directly use dataset keys
This has been a long time coming. Cleans up messy "GT" nomenclature and simplifies ExtensibleTraner.feed_data
2020-12-04 20:14:53 -07:00
James Betker
8a83b1c716 Go back to apex DDP, fix distributed bugs 2020-12-04 16:39:21 -07:00
James Betker
7a81d4e2f4 Revert gaussian loss changes 2020-12-04 12:49:20 -07:00
James Betker
711780126e Cleanup 2020-12-03 23:42:51 -07:00
James Betker
ac7256d4a3 Do tqdm reporting when calculating flow_gaussian_nll 2020-12-03 23:42:29 -07:00
James Betker
dc9ff8e05b Allow the majority of the srflow steps to checkpoint 2020-12-03 23:41:57 -07:00
James Betker
06d1c62c5a iGPT support!
Sweeeeet
2020-12-03 15:32:21 -07:00
James Betker
c18adbd606 Delete mdcn & panet
Garbage, all of it.
2020-12-02 22:25:57 -07:00
James Betker
f2880b33c9 Get rid of mean shift from MDCN 2020-12-02 14:18:33 -07:00
James Betker
8a00f15746 Implement FlowGaussianNll evaluator 2020-12-02 14:09:54 -07:00
James Betker
edf408508c Fix discriminator 2020-12-01 17:45:56 -07:00
James Betker
c963e5f2ce Add ImageFolderDataset
This one has been a long time coming.. How does torch not have something like this?
2020-12-01 17:45:37 -07:00
James Betker
9a421a41f4 SRFlow: accomodate mismatches between global scale and flow_scale 2020-12-01 11:11:51 -07:00
James Betker
8f65f81ddb Adjustments to subimage extractor 2020-12-01 11:11:30 -07:00
James Betker
e343722d37 Add stepped rrdb 2020-12-01 11:11:15 -07:00
James Betker
2e0bbda640 Remove unused archs 2020-12-01 11:10:48 -07:00
James Betker
a1c8300052 Add mdcn 2020-11-30 16:14:21 -07:00
James Betker
1e0f69e34b extra_conv in gn discriminator, multiframe support in rrdb. 2020-11-29 15:39:50 -07:00
James Betker
da604752e6 Misc RRDB changes 2020-11-29 12:21:31 -07:00
James Betker
f2422f1d75 Latent space playground 2020-11-29 09:33:29 -07:00
James Betker
a1d4c9f83c multires rrdb work 2020-11-28 14:35:46 -07:00
James Betker
929cd45c05 Fix for RRDB scale 2020-11-27 21:37:10 -07:00
James Betker
71fa532356 Adjustments to how flow networks set size and scale 2020-11-27 21:37:00 -07:00
James Betker
6f958bb150 Maybe this is necessary after all? 2020-11-27 15:21:13 -07:00
James Betker
ef8d5f88c1 Bring split gaussian nll out of split so it can be computed accurately with the rest of the nll component 2020-11-27 13:30:21 -07:00
James Betker
11d2b70bdd Latent space playground work 2020-11-27 12:03:16 -07:00
James Betker
4ab49b0d69 RRDB disc work 2020-11-27 12:03:08 -07:00
James Betker
6de4dabb73 Remove srflow (modified version)
Starting from orig and re-working from there.
2020-11-27 12:02:06 -07:00
James Betker
5f5420ff4a Update to srflow_latent_space_playground 2020-11-26 20:31:21 -07:00
James Betker
fd356580c0 Play with lambdas 2020-11-26 20:30:55 -07:00
James Betker
0c6d7971b9 Dataset documentation 2020-11-26 11:58:39 -07:00
James Betker
45a489110f Fix datasets 2020-11-26 11:50:38 -07:00
James Betker
5edaf085e0 Adjustments to latent_space_playground 2020-11-25 15:52:36 -07:00
James Betker
205c9a5335 Learn how to functionally use srflow networks 2020-11-25 13:59:06 -07:00
James Betker
cb045121b3 Expose srflow rrdb 2020-11-24 13:20:20 -07:00
James Betker
f3c1fc1bcd Dataset modifications 2020-11-24 13:20:12 -07:00
James Betker
f6098155cd Mods to tecogan to allow use of embeddings as input 2020-11-24 09:24:02 -07:00
James Betker
b10bcf6436 Rework stylegan_for_sr to incorporate structure as an adain block 2020-11-23 11:31:11 -07:00
James Betker
519ba6f10c Support 2x RRDB with 4x srflow 2020-11-21 14:46:15 -07:00
James Betker
cad92bada8 Report logp and logdet for srflow 2020-11-21 10:13:05 -07:00
James Betker
c37d3faa58 More adjustments to srflow_orig 2020-11-20 19:38:33 -07:00
James Betker
d51d12a41a Adjustments to srflow to (maybe?) fix training 2020-11-20 14:44:24 -07:00
James Betker
6c8c35ac47 Support training RRDB encoder [srflow] 2020-11-20 10:03:06 -07:00
James Betker
5ccdbcefe3 srflow_orig integration 2020-11-19 23:47:24 -07:00
James Betker
f80acfcab6 Throw if dataset isn't going to work with force_multiple setting 2020-11-19 23:47:00 -07:00
James Betker
2b2d754d8e Bring in an original SRFlow implementation for reference 2020-11-19 21:42:39 -07:00
James Betker
1e0d7be3ce "Clean up" SRFlow 2020-11-19 21:42:24 -07:00
James Betker
d7877d0a36 Fixes to teco losses and translational losses 2020-11-19 11:35:05 -07:00
James Betker
b2a05465fc Fix missing requirements 2020-11-18 10:16:39 -07:00
James Betker
5c10264538 Remove pyramid_disc hard dependencies 2020-11-17 18:34:11 -07:00
James Betker
6b679e2b51 Make grad_penalty available to classical discs 2020-11-17 18:31:40 -07:00
James Betker
8a19c9ae15 Add additive mode to rrdb 2020-11-16 20:45:09 -07:00
James Betker
2a507987df Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-11-15 16:16:30 -07:00
James Betker
931ed903c1 Allow combined additive loss 2020-11-15 16:16:18 -07:00
James Betker
4b68116977 import fix 2020-11-15 16:15:42 -07:00
James Betker
98eada1e4c More circular dependency fixes + unet fixes 2020-11-15 11:53:35 -07:00
James Betker
e587d549f7 Fix circular imports 2020-11-15 11:32:35 -07:00
James Betker
99f0cfaab5 Rework stylegan2 divergence losses
Notably: include unet loss
2020-11-15 11:26:44 -07:00
James Betker
ea94b93a37 Fixes for unet 2020-11-15 10:38:33 -07:00
James Betker
89f56b2091 Fix another import 2020-11-14 22:10:45 -07:00
James Betker
9af049c671 Import fix for unet 2020-11-14 22:09:18 -07:00
James Betker
5cade6b874 Move stylegan2 around, bring in unet 2020-11-14 22:04:48 -07:00
James Betker
4c6b14a3f8 Allow extract_square_images to work on multiple images 2020-11-14 20:24:05 -07:00
James Betker
125cb16dce Add a FID evaluator for stylegan with structural guidance 2020-11-14 20:16:07 -07:00
James Betker
c9258e2da3 Alter how structural guidance is given to stylegan 2020-11-14 20:15:48 -07:00
James Betker
3397c83447 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-11-14 09:30:09 -07:00
James Betker
423ee7cb90 Allow attention to be specified for stylegan2 2020-11-14 09:29:53 -07:00
James Betker
ec621c69b5 Fix train bug 2020-11-14 09:29:08 -07:00
James Betker
cdc5ac30e9 oddity 2020-11-13 20:11:57 -07:00
James Betker
f406a5dd4c Mods to support stylegan2 in SR mode 2020-11-13 20:11:50 -07:00
James Betker
9c3d0b7560 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-11-13 20:10:47 -07:00
James Betker
67bf55495b Allow hq_batched_key to be specified 2020-11-13 20:10:12 -07:00
James Betker
0b96811611 Fix another issue with gpu ids getting thrown all over hte place 2020-11-13 20:05:52 -07:00
James Betker
c47925ae34 New image extractor utility 2020-11-13 11:04:03 -07:00
James Betker
a07e1a7292 Add separate Evaluator module and FID evaluator 2020-11-13 11:03:54 -07:00
James Betker
080ad61be4 Add option to work with nonrandom latents 2020-11-12 21:23:50 -07:00
James Betker
566b99ca75 GP adjustments for stylegan2 2020-11-12 16:44:51 -07:00
James Betker
fc55bdb24e Mods to how wandb are integrated 2020-11-12 15:45:25 -07:00
James Betker
44a19cd37c ExtensibleTrainer mods to support advanced checkpointing for stylegan2
Basically: stylegan2 makes use of gradient-based normalizers. These
make it so that I cannot use gradient checkpointing. But I love gradient
checkpointing. It makes things really, really fast and memory conscious.

So - only don't checkpoint when we run the regularizer loss. This is a
bit messy, but speeds up training by at least 20%.

Also: pytorch: please make checkpointing a first class citizen.
2020-11-12 15:45:07 -07:00
James Betker
db9e9e28a0 Fix an issue where GPU0 was always being used in non-ddp
Frankly, I don't understand how this has ever worked. WTF.
2020-11-12 15:43:01 -07:00
James Betker
2d3449d7a5 stylegan2 in ml art school! 2020-11-12 15:42:05 -07:00
James Betker
fd97573085 Fixes 2020-11-11 21:49:06 -07:00
James Betker
88f349bdf1 Enable usage of wandb 2020-11-11 21:48:56 -07:00
James Betker
1c065c41b4 Revert "..."
This reverts commit 4b92191880.
2020-11-11 17:24:27 -07:00
James Betker
4b92191880 ... 2020-11-11 14:12:40 -07:00
James Betker
12b57bbd03 Add residual blocks to pyramid disc 2020-11-11 13:56:45 -07:00
James Betker
b4136d766a Back to pyramids, no rrdb 2020-11-11 13:40:24 -07:00
James Betker
42a97de756 Convert PyramidRRDBDisc to RRDBDisc
Had numeric stability issues. This probably makes more sense anyways.
2020-11-11 12:14:14 -07:00
James Betker
72762f200c PyramidRRDB net 2020-11-11 11:25:49 -07:00
James Betker
a1760f8969 Adapt srg2 for video 2020-11-10 16:16:41 -07:00
James Betker
b742d1e5a5 When skipping steps via "every", still run nontrainable injection points 2020-11-10 16:09:17 -07:00
James Betker
91d27372e4 rrdb with adain latent 2020-11-10 16:08:54 -07:00
James Betker
6a2fd5f7d0 Lots of new discriminator nets 2020-11-10 16:06:54 -07:00
James Betker
4e5ba61ae7 SRG2classic further re-integration 2020-11-10 16:06:14 -07:00
James Betker
9e2c96ad5d More latent work 2020-11-07 20:38:56 -07:00
James Betker
6be6c92e5d Fix yet ANOTHER OBO error in multi_frame_dataset 2020-11-06 20:38:34 -07:00
James Betker
0cf52ef52c latent work 2020-11-06 20:38:23 -07:00
James Betker
34d319585c Add srflow arch 2020-11-06 20:38:04 -07:00
James Betker
4469d2e661 More work on RRDB with latent 2020-11-05 22:13:05 -07:00
James Betker
62d3b6496b Latent work checkpoint 2020-11-05 13:31:34 -07:00
James Betker
fd6cdba88f RRDB with latent 2020-11-05 10:04:17 -07:00
James Betker
df47d6cbbb More work in support of training flow networks in tandem with generators 2020-11-04 18:07:48 -07:00
James Betker
c21088e238 Fix OBO error in multi_frame_dataset
In some datasets, this meant one frame was included in a sequence where it didn't belong. In datasets with mismatched chunk sizes, this resulted in an error.
2020-11-03 14:32:06 -07:00
James Betker
e990be0449 Improve ignore_first logic 2020-11-03 11:56:32 -07:00
James Betker
658a267bab More work on SSIM/PSNR approximators
- Add a network that accomodates this style of approximator while retaining structure
- Migrate to SSIM approximation
- Add a tool to visualize how these approximators are working
- Fix some issues that came up while doign this work
2020-11-03 08:09:58 -07:00
James Betker
85c545835c Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-11-02 08:48:15 -07:00
James Betker
f13fdd43ed Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-11-02 08:47:42 -07:00
James Betker
fed16abc22 Report chunking errors 2020-11-02 08:47:18 -07:00
James Betker
a51daacde2 Fix reporting of d_fake_diff for generators 2020-11-02 08:45:46 -07:00
James Betker
3676f26d94 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-31 20:55:45 -06:00
James Betker
dcfe994fee Add standalone srg2_classic
Trying to investigate how I was so misguided. I *thought* srg2 was considerably
better than RRDB in performance but am not actually seeing that.
2020-10-31 20:55:34 -06:00
James Betker
ea8c20c0e2 Fix bug with multiscale_dataset 2020-10-31 20:54:41 -06:00
James Betker
bb39d3efe5 Bump image corruption factor a bit 2020-10-31 20:50:24 -06:00
James Betker
eb7df63592 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-31 11:09:32 -06:00
James Betker
c2866ad8d2 Disable debugging of comparable pingpong generations 2020-10-31 11:09:10 -06:00
James Betker
7303d8c932 Add psnr approximator 2020-10-31 11:08:55 -06:00
James Betker
565517814e Restore SRG2
Going to try to figure out where SRG lost competitiveness to RRDB..
2020-10-30 14:01:56 -06:00
James Betker
b24ff3c88d Fix bug that causes multiscale dataset to crash 2020-10-30 14:01:24 -06:00
James Betker
74738489b9 Fixes and additional support for progressive zoom 2020-10-30 09:59:54 -06:00
James Betker
a3918fa808 Tecogan & other fixes 2020-10-30 00:19:58 -06:00
James Betker
b316078a15 Fix tecogan_losses fp16 2020-10-29 23:02:20 -06:00
James Betker
3791f95ad0 Enable RRDB to take in reference inputs 2020-10-29 11:07:40 -06:00
James Betker
7d38381d46 Add scaling to rrdb 2020-10-29 09:48:10 -06:00
James Betker
607ff3c67c RRDB with bypass 2020-10-29 09:39:45 -06:00
James Betker
1655b9e242 Fix fast_forward teco loss bug 2020-10-28 17:49:54 -06:00
James Betker
25b007a0f5 Increase jpeg corruption & add error 2020-10-28 17:37:39 -06:00
James Betker
796659b0ac Add 'jpeg-normal' corruption 2020-10-28 16:40:47 -06:00
James Betker
515905e904 Add a min_loss that is DDP compatible 2020-10-28 15:46:59 -06:00
James Betker
f133243ac8 Extra logging for teco_resgen 2020-10-28 15:21:22 -06:00
James Betker
2ab5054d4c Add noise to teco disc 2020-10-27 22:48:23 -06:00
James Betker
4dc16d5889 Upgrade tecogan_losses for speed 2020-10-27 22:40:15 -06:00
James Betker
ac3da0c5a6 Make tecogen functional 2020-10-27 21:08:59 -06:00
James Betker
10da206db6 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-27 20:59:59 -06:00
James Betker
9848f4c6cb Add teco_resgen 2020-10-27 20:59:55 -06:00
James Betker
543c384a91 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-27 20:59:16 -06:00
James Betker
da53090ce6 More adjustments to support distributed training with teco & on multi_modal_train 2020-10-27 20:58:03 -06:00
James Betker
00bb568956 further checkpointify spsr_arch 2020-10-27 17:54:28 -06:00
James Betker
c2727a0150 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-27 15:24:19 -06:00
James Betker
2a3eec8fd7 Fix some distributed training snafus 2020-10-27 15:24:05 -06:00
James Betker
d923a62ed3 Allow SPSR to checkpoint 2020-10-27 15:23:20 -06:00
James Betker
11a9e223a6 Retrofit SPSR_arch so it is capable of accepting a ref 2020-10-27 11:14:36 -06:00
James Betker
8202ee72b9 Re-add original SPSR_arch 2020-10-27 11:00:38 -06:00
James Betker
31cf1ac98d Retrofit full_image_dataset to work with new arch. 2020-10-27 10:26:19 -06:00
James Betker
ade0a129da Include psnr in test.py 2020-10-27 10:25:42 -06:00
James Betker
231137ab0a Revert RRDB back to original model 2020-10-27 10:25:31 -06:00
James Betker
1ce863849a Remove temporary base_model change 2020-10-26 11:13:01 -06:00
James Betker
54accfa693 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-26 11:12:37 -06:00
James Betker
ff58c6484a Fixes to unified chunk datasets to support stereoscopic training 2020-10-26 11:12:22 -06:00
James Betker
b2f803588b Fix multi_modal_train.py 2020-10-26 11:10:22 -06:00
James Betker
f857eb00a8 Allow tecogan losses to compute at 32px 2020-10-26 11:09:55 -06:00
James Betker
629b968901 ChainedGen 4x alteration
Increases conv window for teco_recurrent in the 4x case so all data
can be used.

base_model changes should be temporary.
2020-10-26 10:54:51 -06:00
James Betker
85c07f85d9 Update flownet submodule 2020-10-24 11:59:00 -06:00
James Betker
327cdbe110 Support configurable multi-modal training 2020-10-24 11:57:39 -06:00
James Betker
9c3d059ef0 Updates to be able to train flownet2 in ExtensibleTrainer
Only supports basic losses for now, though.
2020-10-24 11:56:39 -06:00
James Betker
1dbcbfbac8 Restore ChainedEmbeddingGenWithStructure
Still using this guy, after all
2020-10-24 11:54:52 -06:00
James Betker
8e5b6682bf Add PairedFrameDataset 2020-10-23 20:58:07 -06:00
James Betker
7a75d10784 Arch cleanup 2020-10-23 09:35:33 -06:00
James Betker
646d6a621a Support 4x zoom on ChainedEmbeddingGen 2020-10-23 09:25:58 -06:00
James Betker
8636492db0 Copy train.py mods to train2 2020-10-22 17:16:36 -06:00
James Betker
e9c0b9f0fd More adjustments to support multi-modal training
Specifically - looks like at least MSE loss cannot handle autocasted tensors
2020-10-22 16:49:34 -06:00
James Betker
76789a456f Class-ify train.py and workon multi-modal trainer 2020-10-22 16:15:31 -06:00
James Betker
15e00e9014 Finish integration with autocast
Note: autocast is broken when also using checkpoint(). Overcome this by modifying
torch's checkpoint() function in place to also use autocast.
2020-10-22 14:39:19 -06:00
James Betker
d7ee14f721 Move to torch.cuda.amp (not working)
Running into OOM errors, needs diagnosing. Checkpointing here.
2020-10-22 13:58:05 -06:00
James Betker
3e3d2af1f3 Add multi-modal trainer 2020-10-22 13:27:32 -06:00
James Betker
40dc2938e8 Fix multifaceted chain gen 2020-10-22 13:27:06 -06:00
James Betker
f9dc472f63 Misc nonfunctional mods to datasets 2020-10-22 10:16:17 -06:00
James Betker
43c4f92123 Collapse progressive zoom candidates into the batch dimension
This contributes a significant speedup to training this type of network
since losses can operate on the entire prediction spectrum at once.
2020-10-21 22:37:23 -06:00
James Betker
680d635420 Enable ExtensibleTrainer to skip steps when state keys are missing 2020-10-21 22:22:28 -06:00
James Betker
d1175f0de1 Add FFT injector 2020-10-21 22:22:00 -06:00
James Betker
1ef559d7ca Add a ChainedEmbeddingGen which can be simueltaneously used with multiple training paradigms 2020-10-21 22:21:51 -06:00
James Betker
931aa65dd0 Allow recurrent losses to be weighted 2020-10-21 16:59:44 -06:00
James Betker
5753e77d67 ChainedGen: Output debugging information on blocks 2020-10-21 16:36:23 -06:00
James Betker
b54de69153 Misc 2020-10-21 11:08:21 -06:00
James Betker
71c3820d2d Fix process_video 2020-10-21 11:08:12 -06:00
James Betker
3c6e600e48 Add capacity for models to self-report visuals 2020-10-21 11:08:03 -06:00
James Betker
dca5cddb3b Add bypass to ChainedEmbeddingGen 2020-10-21 11:07:45 -06:00
James Betker
d8c6a4bbb8 Misc 2020-10-20 12:56:52 -06:00
James Betker
aba83e7497 Don't apply jpeg corruption & noise corruption together
This causes some severe noise.
2020-10-20 12:56:35 -06:00
James Betker
111450f4e7 Use areal interpolate for multiscale_dataset 2020-10-19 15:30:25 -06:00
James Betker
a63bf2ea2f Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-19 15:26:11 -06:00
James Betker
76e4f0c086 Restore test.py for use as standalone validator 2020-10-19 15:26:07 -06:00
James Betker
1b1ca297f8 Fix recurrent=None bug in ChainedEmbeddingGen 2020-10-19 15:25:12 -06:00
James Betker
331c40f0c8 Allow starting step to be forced
Useful for testing purposes or to force a validation.
2020-10-19 15:23:04 -06:00
James Betker
8ca566b621 Revert "Misc"
This reverts commit 0e3ea63a14.

# Conflicts:
#	codes/test.py
#	codes/train.py
2020-10-19 13:34:54 -06:00
James Betker
b28e4d9cc7 Add spread loss
Experimental loss that peaks around 0.
2020-10-19 11:31:19 -06:00
James Betker
9b9a6e5925 Add get_paths() to base_unsupervised_image_dataset 2020-10-19 11:30:06 -06:00
James Betker
981d64413b Support validation over a custom injector
Also re-enable PSNR
2020-10-19 11:01:56 -06:00
James Betker
ffad0e0422 Allow image corruption in multiscale dataset 2020-10-19 10:10:27 -06:00
James Betker
668cafa798 Push correct patch of recurrent embedding to upstream image, rather than whole thing 2020-10-18 22:39:52 -06:00
James Betker
7df378a944 Remove separated vgg discriminator
Checkpointing happens inline instead. Was a dumb idea..

Also fixes some loss reporting issues.
2020-10-18 12:10:24 -06:00
James Betker
c709d38cd5 Fix memory leak with recurrent loss 2020-10-18 10:22:10 -06:00
James Betker
552e70a032 Get rid of excessive checkpointed disc params 2020-10-18 10:09:37 -06:00
James Betker
6a0d5f4813 Add a checkpointable discriminator 2020-10-18 09:57:47 -06:00
James Betker
9ead2c0a08 Multiscale training in! 2020-10-17 22:54:12 -06:00
James Betker
e706911c83 Fix spinenet bug 2020-10-17 20:20:36 -06:00
James Betker
b008a27d39 Spinenet should allow bypassing the initial conv
This makes feeding in references for recurrence easier.
2020-10-17 20:16:47 -06:00
James Betker
c7f3fc4dd9 Enable chunk_with_reference to work without centers
Moving away from this so it doesn't matter too much. Also fixes an issue
with the "ignore" flag.
2020-10-17 20:09:08 -06:00
James Betker
b45e132a9d Allow first n tiles to be ignored
Helps zoom in with chunked dataset
2020-10-17 09:45:03 -06:00
James Betker
c1c9c5681f Swap recurrence 2020-10-17 08:40:28 -06:00
James Betker
6141aa1110 More recurrence fixes for chainedgen 2020-10-17 08:35:46 -06:00
James Betker
cf8118a85b Allow recurrence to specified for chainedgen 2020-10-17 08:32:29 -06:00
James Betker
fc4c064867 Add recurrent support to chainedgenwithstructure 2020-10-17 08:31:34 -06:00
James Betker
d4a3e11ab2 Don't use several stages of spinenet_arch
These are used for lower outputs which I am not using
2020-10-17 08:28:37 -06:00
James Betker
d1c63ae339 Go back to torch's DDP
Apex was having some weird crashing issues.
2020-10-16 20:47:35 -06:00
James Betker
d856378b2e Add ChainedGenWithStructure 2020-10-16 20:44:36 -06:00
James Betker
96f1be30ed Add use_generator_as_filter 2020-10-16 20:43:55 -06:00
James Betker
617d97e19d Add ChainedEmbeddingGen 2020-10-15 23:18:08 -06:00
James Betker
c4543ce124 Set post_transform_block to None where applicable 2020-10-15 17:20:42 -06:00
James Betker
6f8705e8cb SSGSimpler network 2020-10-15 17:18:44 -06:00
James Betker
1ba01d69b5 Move datasets to INTER_AREA interpolation for downsizing
Looks **FAR** better visually
2020-10-15 17:18:23 -06:00
James Betker
d56745b2ec JPEG-broad adjustment 2020-10-15 10:14:51 -06:00
James Betker
eda75c9779 Cleanup fixes 2020-10-15 10:13:17 -06:00
James Betker
920865defb Arch work 2020-10-15 10:13:06 -06:00
James Betker
1dc0b05428 Add multiscale dataset 2020-10-15 10:12:50 -06:00
James Betker
0f4e03183f New image corruptor gradations 2020-10-15 10:12:25 -06:00
James Betker
1f20d59c31 Revert big switch back 2020-10-14 11:03:34 -06:00
James Betker
9815980329 Update SwitchedConv 2020-10-13 20:57:12 -06:00
James Betker
24792bdb4f Codebase cleanup
Removed a lot of legacy stuff I have no intent on using again.
Plan is to shape this repo into something more extensible (get it? hah!)
2020-10-13 20:56:39 -06:00
James Betker
e620fc05ba Mods to support video processing with teco networks 2020-10-13 20:47:05 -06:00
James Betker
17d78195ee Mods to SRG to support returning switch logits 2020-10-13 20:46:37 -06:00
James Betker
cc915303a5 Fix SPSR calls into SwitchComputer 2020-10-13 10:14:47 -06:00
James Betker
bdf4c38899 Merge remote-tracking branch 'origin/gan_lab' into gan_lab
# Conflicts:
#	codes/models/archs/SwitchedResidualGenerator_arch.py
2020-10-13 10:12:26 -06:00
James Betker
9a5d6162e9 Add the "BigSwitch" 2020-10-13 10:11:10 -06:00
James Betker
8014f050ac Clear metrics properly
Holy cow, what a PITA bug.
2020-10-13 10:07:49 -06:00
James Betker
4d52374e60 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-12 17:43:51 -06:00
James Betker
731700ab2c checkpoint in ssg 2020-10-12 17:43:28 -06:00
James Betker
ca523215c6 Fix recurrent std in arch 2020-10-12 17:42:32 -06:00
James Betker
05377973bf Allow initial recurrent input to be specified (optionally) 2020-10-12 17:36:43 -06:00
James Betker
597b6e92d6 Add ssgr1 recurrence 2020-10-12 17:18:19 -06:00
James Betker
c1a00f31b7 Update switched_conv 2020-10-12 10:37:45 -06:00
James Betker
d7d7590f3e Fix constant injector - wasn't working in test 2020-10-12 10:36:30 -06:00
James Betker
e7cf337dba Fix bug with chunk_with_reference 2020-10-12 10:23:03 -06:00
James Betker
ce163ad4a9 Update SSGdeep 2020-10-12 10:22:08 -06:00
James Betker
2bc5701b10 misc 2020-10-12 10:21:25 -06:00
James Betker
3409d88a1c Add PANet arch 2020-10-12 10:20:55 -06:00
James Betker
7cbf4fa665 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-11 08:33:30 -06:00
James Betker
92cb83958a Return zeros rather than None when image cant be read 2020-10-11 08:33:18 -06:00
James Betker
a9c2e97391 Constant injector and teco fixes 2020-10-11 08:20:07 -06:00
James Betker
e785029936 Mods needed to support SPSR archs with teco gan 2020-10-10 22:39:55 -06:00
James Betker
120072d464 Add constant injector 2020-10-10 21:50:23 -06:00
James Betker
f99812e14d Fix tecogan_losses errors 2020-10-10 20:30:14 -06:00
James Betker
3a5b23b9f7 Alter teco_losses to feed a recurrent input in as separate 2020-10-10 20:21:09 -06:00
James Betker
0d30d18a3d Add MarginRemoval injector 2020-10-09 20:35:56 -06:00
James Betker
0011d445c8 Fix loss indexing 2020-10-09 20:20:51 -06:00
James Betker
202eb11fdc For element loss added 2020-10-09 19:51:44 -06:00
James Betker
61e5047c60 Fix loss accumulator when buffers are not filled
They were reporting incorrect losses.
2020-10-09 19:47:59 -06:00
James Betker
fe50d6f9d0 Fix attention images 2020-10-09 19:21:55 -06:00
James Betker
7e777ea34c Allow tecogan to be used in process_video 2020-10-09 19:21:43 -06:00
James Betker
58d8bf8f69 Add network architecture built for teco 2020-10-09 08:40:14 -06:00
James Betker
b3d0baaf17 Improve multiframe dataset memory usage 2020-10-09 08:40:00 -06:00
James Betker
afe6af88af Fix attention print issue 2020-10-08 18:34:00 -06:00
James Betker
4c85ee51a4 Converge SSG architectures into unified switching base class
Also adds attention norm histogram to logging
2020-10-08 17:23:21 -06:00
James Betker
3cc56cd00b Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-08 16:12:05 -06:00
James Betker
7d8d9dafbb misc 2020-10-08 16:12:00 -06:00
James Betker
856ef4d21d Update switched_conv 2020-10-08 16:10:23 -06:00
James Betker
1eb516d686 Fix more distributed bugs 2020-10-08 14:32:45 -06:00
James Betker
b36ba0460c Fix multi-frame dataset OBO error 2020-10-08 12:21:04 -06:00
James Betker
fba29d7dcc Move to apex distributeddataparallel and add switch all_reduce
Torch's distributed_data_parallel is missing "delay_allreduce", which is
necessary to get gradient checkpointing to work with recurrent models.
2020-10-08 11:20:05 -06:00
James Betker
c174ac0fd5 Allow tecogan to support generators that only output a tensor (instead of a list) 2020-10-08 09:26:25 -06:00
James Betker
969bcd9021 Use local checkpoint in SSG 2020-10-08 08:54:46 -06:00
James Betker
c93dd623d7 Tecogan losses work 2020-10-07 23:11:58 -06:00
James Betker
29bf78d791 Update switched_conv submodule 2020-10-07 23:11:50 -06:00
James Betker
c96f5b2686 Import switched_conv as a submodule 2020-10-07 23:10:54 -06:00
James Betker
c352c8bce4 More tecogan fixes 2020-10-07 12:41:17 -06:00
James Betker
a62a5dbb5f Clone and detach in recursively_detach 2020-10-07 12:41:00 -06:00
James Betker
1c44d395af Tecogan work
Its training!  There's still probably plenty of bugs though..
2020-10-07 09:03:30 -06:00
James Betker
e9d7371a61 Add concatenate injector 2020-10-07 09:02:42 -06:00
James Betker
8a7e993aea Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-06 20:41:58 -06:00
James Betker
b2c4b2a16d Move gpu_ids out of if statement 2020-10-06 20:40:20 -06:00
James Betker
1e415b249b Add tag that can be applied to prevent parameter training 2020-10-06 20:39:49 -06:00
James Betker
2f2e3f33f8 StackedSwitchedGenerator_5lyr 2020-10-06 20:39:32 -06:00
James Betker
6217b48e3f Fix spsr_arch bug 2020-10-06 20:38:47 -06:00
James Betker
4290918359 Add distributed_checkpoint for more efficient checkpoints 2020-10-06 20:38:38 -06:00
James Betker
cffc596141 Integrate flownet2 into codebase, add teco visual debugs 2020-10-06 20:35:39 -06:00
James Betker
e4b89a172f Reduce spsr7 memory usage 2020-10-05 22:05:56 -06:00
James Betker
4111942ada Support attention deferral in deep ssgr 2020-10-05 19:35:55 -06:00
James Betker
840927063a Work on tecogan losses 2020-10-05 19:35:28 -06:00
James Betker
0e3ea63a14 Misc 2020-10-05 18:01:50 -06:00
James Betker
2875822024 SPSR9 arch
takes some of the stuff I learned with SGSR yesterday and applies it to spsr
2020-10-05 08:47:51 -06:00
James Betker
51044929af Don't compute attention statistics on multiple generator invocations of the same data 2020-10-05 00:34:29 -06:00
James Betker
e760658fdb Another fix.. 2020-10-04 21:08:00 -06:00
James Betker
a890e3a9c0 Fix geometric loss not handling 0 index 2020-10-04 21:05:01 -06:00
James Betker
c3ef8a4a31 Stacked switches - return a tuple 2020-10-04 21:02:24 -06:00
James Betker
13f97e1e97 Add recursive loss 2020-10-04 20:48:15 -06:00
James Betker
ffd069fd97 Lots of SSG work
- Checkpointed pretty much the entire model - enabling recurrent inputs
- Added two new models for test - adding depth (again) and removing SPSR (in lieu of the new losses)
2020-10-04 20:48:08 -06:00
James Betker
aca2c7ab41 Full checkpoint-ize SSG1 2020-10-04 18:24:52 -06:00
James Betker
fc396baf1a Move loaded_options to util
Doesn't seem to work with python 3.6
2020-10-03 20:29:06 -06:00
James Betker
2d8e9a9d30 Options fix? 2020-10-03 20:27:12 -06:00
James Betker
e3294939b0 Revert "SSG: offer option to use BN-based attention normalization"
Didn't work. Oh well.

This reverts commit 5cd2b37591.
2020-10-03 17:54:53 -06:00
James Betker
43c6c67fd1 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-03 16:17:31 -06:00
James Betker
5cd2b37591 SSG: offer option to use BN-based attention normalization
Not sure how this is going to work, lets try it.
2020-10-03 16:16:19 -06:00
James Betker
c896939523 Fix recursive checkpoint 2020-10-03 16:15:52 -06:00
James Betker
3cbb9ecd45 Misc 2020-10-03 16:15:42 -06:00
James Betker
35731502c3 Fix checkpoint recursion 2020-10-03 12:52:50 -06:00
James Betker
9b4ed82093 Get rid of unused convs in spsr7 2020-10-03 11:36:26 -06:00
James Betker
b2b81b13a4 Remove recursive utils import 2020-10-03 11:30:05 -06:00
James Betker
3561cc164d Fix up fea_loss calculator (for validation)
Not sure how this was working in regular training mode, but it
was failing in DDP.
2020-10-03 11:19:20 -06:00
James Betker
21d3bb83b2 Use tqdm reporting with validation 2020-10-03 11:16:39 -06:00
James Betker
6c9718ad64 Don't log if you aren't 0 rank 2020-10-03 11:14:13 -06:00
James Betker
922b1d76df Don't record visuals when not on rank 0 2020-10-03 11:10:03 -06:00
James Betker
8197fd646f Don't accumulate losses for metrics when the loss isn't a tensor 2020-10-03 11:03:55 -06:00
James Betker
19a4075e1e Allow checkpointing to be disabled in the options file
Also makes options a global variable for usage in utils.
2020-10-03 11:03:28 -06:00
James Betker
dd9d7b27ac Add more sophisticated mechanism for balancing GAN losses 2020-10-02 22:53:42 -06:00
James Betker
39865ca3df TOTAL_loss, dumbo 2020-10-02 21:06:10 -06:00
James Betker
4e44fcd655 Loss accumulator fix 2020-10-02 20:55:33 -06:00
James Betker
567b4d50a4 ExtensibleTrainer - don't compute backward when there is no loss 2020-10-02 20:54:06 -06:00
James Betker
146a9125f2 Modify geometric & translational losses so they can be used with embeddings 2020-10-02 20:40:13 -06:00
James Betker
e30a1443cd Change sw2 refs 2020-10-02 09:01:18 -06:00
James Betker
e38716925f Fix spsr8 class init 2020-10-02 09:00:18 -06:00
James Betker
efbf6b737b Update validate_data to work with SingleImageDataset 2020-10-02 08:58:34 -06:00
James Betker
35469f08e2 Spsr 8 2020-10-02 08:58:15 -06:00