Commit Graph

724 Commits

Author SHA1 Message Date
James Betker
3074f41877 Get rosinality model converter to work
Mostly, just needed to remove the custom cuda ops, not so bueno on Windows.
2020-12-17 16:03:39 -07:00
James Betker
e838c6e75b Rosinality stylegan2 port 2020-12-17 14:18:46 -07:00
James Betker
49327b99fe SRFlow outputs RRDB output 2020-12-16 10:28:02 -07:00
James Betker
c25b49bb12 Clean up of SRFlowNet_arch 2020-12-16 10:27:38 -07:00
James Betker
42ac8e3eeb Remove unnecessary comment from SRFlowNet 2020-12-16 09:43:07 -07:00
James Betker
09de3052ac Add softmax to spinenet classification head 2020-12-16 09:42:15 -07:00
James Betker
8661207d57 Merge branch 'gan_lab' of https://github.com/neonbjb/DL-Art-School into gan_lab 2020-12-15 17:16:48 -07:00
James Betker
fc376d34b2 Spinenet with logits head 2020-12-15 17:16:19 -07:00
James Betker
0a19e53df0 BYOL mods 2020-12-14 23:59:11 -07:00
James Betker
ef7eabf457 Allow RRDB to upscale 8x 2020-12-14 23:58:52 -07:00
James Betker
ec0ee25f4b Structural latents checkpoint 2020-12-11 12:01:09 -07:00
James Betker
26ceca68c0 BYOL with structure! 2020-12-10 15:07:35 -07:00
James Betker
c203cee31e Allow swapping to torch DDP as needed in code 2020-12-09 15:03:59 -07:00
James Betker
97ff25a086 BYOL!
Man, is there anything ExtensibleTrainer can't train? :)
2020-12-08 13:07:53 -07:00
James Betker
bca59ed98a Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-12-07 12:51:04 -07:00
James Betker
ea56eb61f0 Fix DDP errors for discriminator
- Don't define training_net in define_optimizers - this drops the shell and leads to problems downstream
- Get rid of support for multiple training nets per opt. This was half baked and needs a better solution if needed
   downstream.
2020-12-07 12:50:57 -07:00
James Betker
88fc049c8d spinenet latent playground! 2020-12-05 20:30:36 -07:00
James Betker
11155aead4 Directly use dataset keys
This has been a long time coming. Cleans up messy "GT" nomenclature and simplifies ExtensibleTraner.feed_data
2020-12-04 20:14:53 -07:00
James Betker
8a83b1c716 Go back to apex DDP, fix distributed bugs 2020-12-04 16:39:21 -07:00
James Betker
7a81d4e2f4 Revert gaussian loss changes 2020-12-04 12:49:20 -07:00
James Betker
711780126e Cleanup 2020-12-03 23:42:51 -07:00
James Betker
ac7256d4a3 Do tqdm reporting when calculating flow_gaussian_nll 2020-12-03 23:42:29 -07:00
James Betker
dc9ff8e05b Allow the majority of the srflow steps to checkpoint 2020-12-03 23:41:57 -07:00
James Betker
06d1c62c5a iGPT support!
Sweeeeet
2020-12-03 15:32:21 -07:00
James Betker
c18adbd606 Delete mdcn & panet
Garbage, all of it.
2020-12-02 22:25:57 -07:00
James Betker
f2880b33c9 Get rid of mean shift from MDCN 2020-12-02 14:18:33 -07:00
James Betker
8a00f15746 Implement FlowGaussianNll evaluator 2020-12-02 14:09:54 -07:00
James Betker
edf408508c Fix discriminator 2020-12-01 17:45:56 -07:00
James Betker
9a421a41f4 SRFlow: accomodate mismatches between global scale and flow_scale 2020-12-01 11:11:51 -07:00
James Betker
e343722d37 Add stepped rrdb 2020-12-01 11:11:15 -07:00
James Betker
2e0bbda640 Remove unused archs 2020-12-01 11:10:48 -07:00
James Betker
a1c8300052 Add mdcn 2020-11-30 16:14:21 -07:00
James Betker
1e0f69e34b extra_conv in gn discriminator, multiframe support in rrdb. 2020-11-29 15:39:50 -07:00
James Betker
da604752e6 Misc RRDB changes 2020-11-29 12:21:31 -07:00
James Betker
a1d4c9f83c multires rrdb work 2020-11-28 14:35:46 -07:00
James Betker
929cd45c05 Fix for RRDB scale 2020-11-27 21:37:10 -07:00
James Betker
71fa532356 Adjustments to how flow networks set size and scale 2020-11-27 21:37:00 -07:00
James Betker
6f958bb150 Maybe this is necessary after all? 2020-11-27 15:21:13 -07:00
James Betker
ef8d5f88c1 Bring split gaussian nll out of split so it can be computed accurately with the rest of the nll component 2020-11-27 13:30:21 -07:00
James Betker
4ab49b0d69 RRDB disc work 2020-11-27 12:03:08 -07:00
James Betker
6de4dabb73 Remove srflow (modified version)
Starting from orig and re-working from there.
2020-11-27 12:02:06 -07:00
James Betker
fd356580c0 Play with lambdas 2020-11-26 20:30:55 -07:00
James Betker
cb045121b3 Expose srflow rrdb 2020-11-24 13:20:20 -07:00
James Betker
f6098155cd Mods to tecogan to allow use of embeddings as input 2020-11-24 09:24:02 -07:00
James Betker
b10bcf6436 Rework stylegan_for_sr to incorporate structure as an adain block 2020-11-23 11:31:11 -07:00
James Betker
519ba6f10c Support 2x RRDB with 4x srflow 2020-11-21 14:46:15 -07:00
James Betker
cad92bada8 Report logp and logdet for srflow 2020-11-21 10:13:05 -07:00
James Betker
c37d3faa58 More adjustments to srflow_orig 2020-11-20 19:38:33 -07:00
James Betker
d51d12a41a Adjustments to srflow to (maybe?) fix training 2020-11-20 14:44:24 -07:00
James Betker
6c8c35ac47 Support training RRDB encoder [srflow] 2020-11-20 10:03:06 -07:00
James Betker
5ccdbcefe3 srflow_orig integration 2020-11-19 23:47:24 -07:00
James Betker
2b2d754d8e Bring in an original SRFlow implementation for reference 2020-11-19 21:42:39 -07:00
James Betker
1e0d7be3ce "Clean up" SRFlow 2020-11-19 21:42:24 -07:00
James Betker
d7877d0a36 Fixes to teco losses and translational losses 2020-11-19 11:35:05 -07:00
James Betker
5c10264538 Remove pyramid_disc hard dependencies 2020-11-17 18:34:11 -07:00
James Betker
6b679e2b51 Make grad_penalty available to classical discs 2020-11-17 18:31:40 -07:00
James Betker
8a19c9ae15 Add additive mode to rrdb 2020-11-16 20:45:09 -07:00
James Betker
2a507987df Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-11-15 16:16:30 -07:00
James Betker
931ed903c1 Allow combined additive loss 2020-11-15 16:16:18 -07:00
James Betker
4b68116977 import fix 2020-11-15 16:15:42 -07:00
James Betker
98eada1e4c More circular dependency fixes + unet fixes 2020-11-15 11:53:35 -07:00
James Betker
e587d549f7 Fix circular imports 2020-11-15 11:32:35 -07:00
James Betker
99f0cfaab5 Rework stylegan2 divergence losses
Notably: include unet loss
2020-11-15 11:26:44 -07:00
James Betker
ea94b93a37 Fixes for unet 2020-11-15 10:38:33 -07:00
James Betker
89f56b2091 Fix another import 2020-11-14 22:10:45 -07:00
James Betker
9af049c671 Import fix for unet 2020-11-14 22:09:18 -07:00
James Betker
5cade6b874 Move stylegan2 around, bring in unet 2020-11-14 22:04:48 -07:00
James Betker
125cb16dce Add a FID evaluator for stylegan with structural guidance 2020-11-14 20:16:07 -07:00
James Betker
c9258e2da3 Alter how structural guidance is given to stylegan 2020-11-14 20:15:48 -07:00
James Betker
3397c83447 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-11-14 09:30:09 -07:00
James Betker
423ee7cb90 Allow attention to be specified for stylegan2 2020-11-14 09:29:53 -07:00
James Betker
f406a5dd4c Mods to support stylegan2 in SR mode 2020-11-13 20:11:50 -07:00
James Betker
9c3d0b7560 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-11-13 20:10:47 -07:00
James Betker
67bf55495b Allow hq_batched_key to be specified 2020-11-13 20:10:12 -07:00
James Betker
0b96811611 Fix another issue with gpu ids getting thrown all over hte place 2020-11-13 20:05:52 -07:00
James Betker
a07e1a7292 Add separate Evaluator module and FID evaluator 2020-11-13 11:03:54 -07:00
James Betker
080ad61be4 Add option to work with nonrandom latents 2020-11-12 21:23:50 -07:00
James Betker
566b99ca75 GP adjustments for stylegan2 2020-11-12 16:44:51 -07:00
James Betker
44a19cd37c ExtensibleTrainer mods to support advanced checkpointing for stylegan2
Basically: stylegan2 makes use of gradient-based normalizers. These
make it so that I cannot use gradient checkpointing. But I love gradient
checkpointing. It makes things really, really fast and memory conscious.

So - only don't checkpoint when we run the regularizer loss. This is a
bit messy, but speeds up training by at least 20%.

Also: pytorch: please make checkpointing a first class citizen.
2020-11-12 15:45:07 -07:00
James Betker
db9e9e28a0 Fix an issue where GPU0 was always being used in non-ddp
Frankly, I don't understand how this has ever worked. WTF.
2020-11-12 15:43:01 -07:00
James Betker
2d3449d7a5 stylegan2 in ml art school! 2020-11-12 15:42:05 -07:00
James Betker
fd97573085 Fixes 2020-11-11 21:49:06 -07:00
James Betker
88f349bdf1 Enable usage of wandb 2020-11-11 21:48:56 -07:00
James Betker
1c065c41b4 Revert "..."
This reverts commit 4b92191880.
2020-11-11 17:24:27 -07:00
James Betker
4b92191880 ... 2020-11-11 14:12:40 -07:00
James Betker
12b57bbd03 Add residual blocks to pyramid disc 2020-11-11 13:56:45 -07:00
James Betker
b4136d766a Back to pyramids, no rrdb 2020-11-11 13:40:24 -07:00
James Betker
42a97de756 Convert PyramidRRDBDisc to RRDBDisc
Had numeric stability issues. This probably makes more sense anyways.
2020-11-11 12:14:14 -07:00
James Betker
72762f200c PyramidRRDB net 2020-11-11 11:25:49 -07:00
James Betker
a1760f8969 Adapt srg2 for video 2020-11-10 16:16:41 -07:00
James Betker
b742d1e5a5 When skipping steps via "every", still run nontrainable injection points 2020-11-10 16:09:17 -07:00
James Betker
91d27372e4 rrdb with adain latent 2020-11-10 16:08:54 -07:00
James Betker
6a2fd5f7d0 Lots of new discriminator nets 2020-11-10 16:06:54 -07:00
James Betker
4e5ba61ae7 SRG2classic further re-integration 2020-11-10 16:06:14 -07:00
James Betker
9e2c96ad5d More latent work 2020-11-07 20:38:56 -07:00
James Betker
0cf52ef52c latent work 2020-11-06 20:38:23 -07:00
James Betker
34d319585c Add srflow arch 2020-11-06 20:38:04 -07:00
James Betker
4469d2e661 More work on RRDB with latent 2020-11-05 22:13:05 -07:00
James Betker
62d3b6496b Latent work checkpoint 2020-11-05 13:31:34 -07:00
James Betker
fd6cdba88f RRDB with latent 2020-11-05 10:04:17 -07:00
James Betker
df47d6cbbb More work in support of training flow networks in tandem with generators 2020-11-04 18:07:48 -07:00
James Betker
658a267bab More work on SSIM/PSNR approximators
- Add a network that accomodates this style of approximator while retaining structure
- Migrate to SSIM approximation
- Add a tool to visualize how these approximators are working
- Fix some issues that came up while doign this work
2020-11-03 08:09:58 -07:00
James Betker
a51daacde2 Fix reporting of d_fake_diff for generators 2020-11-02 08:45:46 -07:00
James Betker
dcfe994fee Add standalone srg2_classic
Trying to investigate how I was so misguided. I *thought* srg2 was considerably
better than RRDB in performance but am not actually seeing that.
2020-10-31 20:55:34 -06:00
James Betker
eb7df63592 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-31 11:09:32 -06:00
James Betker
c2866ad8d2 Disable debugging of comparable pingpong generations 2020-10-31 11:09:10 -06:00
James Betker
7303d8c932 Add psnr approximator 2020-10-31 11:08:55 -06:00
James Betker
565517814e Restore SRG2
Going to try to figure out where SRG lost competitiveness to RRDB..
2020-10-30 14:01:56 -06:00
James Betker
74738489b9 Fixes and additional support for progressive zoom 2020-10-30 09:59:54 -06:00
James Betker
a3918fa808 Tecogan & other fixes 2020-10-30 00:19:58 -06:00
James Betker
b316078a15 Fix tecogan_losses fp16 2020-10-29 23:02:20 -06:00
James Betker
3791f95ad0 Enable RRDB to take in reference inputs 2020-10-29 11:07:40 -06:00
James Betker
7d38381d46 Add scaling to rrdb 2020-10-29 09:48:10 -06:00
James Betker
607ff3c67c RRDB with bypass 2020-10-29 09:39:45 -06:00
James Betker
1655b9e242 Fix fast_forward teco loss bug 2020-10-28 17:49:54 -06:00
James Betker
515905e904 Add a min_loss that is DDP compatible 2020-10-28 15:46:59 -06:00
James Betker
f133243ac8 Extra logging for teco_resgen 2020-10-28 15:21:22 -06:00
James Betker
2ab5054d4c Add noise to teco disc 2020-10-27 22:48:23 -06:00
James Betker
4dc16d5889 Upgrade tecogan_losses for speed 2020-10-27 22:40:15 -06:00
James Betker
ac3da0c5a6 Make tecogen functional 2020-10-27 21:08:59 -06:00
James Betker
10da206db6 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-27 20:59:59 -06:00
James Betker
9848f4c6cb Add teco_resgen 2020-10-27 20:59:55 -06:00
James Betker
543c384a91 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-27 20:59:16 -06:00
James Betker
da53090ce6 More adjustments to support distributed training with teco & on multi_modal_train 2020-10-27 20:58:03 -06:00
James Betker
00bb568956 further checkpointify spsr_arch 2020-10-27 17:54:28 -06:00
James Betker
c2727a0150 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-27 15:24:19 -06:00
James Betker
2a3eec8fd7 Fix some distributed training snafus 2020-10-27 15:24:05 -06:00
James Betker
d923a62ed3 Allow SPSR to checkpoint 2020-10-27 15:23:20 -06:00
James Betker
11a9e223a6 Retrofit SPSR_arch so it is capable of accepting a ref 2020-10-27 11:14:36 -06:00
James Betker
8202ee72b9 Re-add original SPSR_arch 2020-10-27 11:00:38 -06:00
James Betker
231137ab0a Revert RRDB back to original model 2020-10-27 10:25:31 -06:00
James Betker
1ce863849a Remove temporary base_model change 2020-10-26 11:13:01 -06:00
James Betker
54accfa693 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-26 11:12:37 -06:00
James Betker
ff58c6484a Fixes to unified chunk datasets to support stereoscopic training 2020-10-26 11:12:22 -06:00
James Betker
f857eb00a8 Allow tecogan losses to compute at 32px 2020-10-26 11:09:55 -06:00
James Betker
629b968901 ChainedGen 4x alteration
Increases conv window for teco_recurrent in the 4x case so all data
can be used.

base_model changes should be temporary.
2020-10-26 10:54:51 -06:00
James Betker
85c07f85d9 Update flownet submodule 2020-10-24 11:59:00 -06:00
James Betker
9c3d059ef0 Updates to be able to train flownet2 in ExtensibleTrainer
Only supports basic losses for now, though.
2020-10-24 11:56:39 -06:00
James Betker
1dbcbfbac8 Restore ChainedEmbeddingGenWithStructure
Still using this guy, after all
2020-10-24 11:54:52 -06:00
James Betker
7a75d10784 Arch cleanup 2020-10-23 09:35:33 -06:00
James Betker
646d6a621a Support 4x zoom on ChainedEmbeddingGen 2020-10-23 09:25:58 -06:00
James Betker
e9c0b9f0fd More adjustments to support multi-modal training
Specifically - looks like at least MSE loss cannot handle autocasted tensors
2020-10-22 16:49:34 -06:00
James Betker
76789a456f Class-ify train.py and workon multi-modal trainer 2020-10-22 16:15:31 -06:00
James Betker
15e00e9014 Finish integration with autocast
Note: autocast is broken when also using checkpoint(). Overcome this by modifying
torch's checkpoint() function in place to also use autocast.
2020-10-22 14:39:19 -06:00
James Betker
d7ee14f721 Move to torch.cuda.amp (not working)
Running into OOM errors, needs diagnosing. Checkpointing here.
2020-10-22 13:58:05 -06:00
James Betker
3e3d2af1f3 Add multi-modal trainer 2020-10-22 13:27:32 -06:00
James Betker
40dc2938e8 Fix multifaceted chain gen 2020-10-22 13:27:06 -06:00
James Betker
43c4f92123 Collapse progressive zoom candidates into the batch dimension
This contributes a significant speedup to training this type of network
since losses can operate on the entire prediction spectrum at once.
2020-10-21 22:37:23 -06:00
James Betker
680d635420 Enable ExtensibleTrainer to skip steps when state keys are missing 2020-10-21 22:22:28 -06:00
James Betker
d1175f0de1 Add FFT injector 2020-10-21 22:22:00 -06:00