James Betker
2cdac6bd09
Add PWCNet for human optical flow
2021-01-25 08:25:44 -07:00
James Betker
51b63b2aa6
Add switched_conv with hard routing and make vqvae use it.
2021-01-25 08:25:29 -07:00
James Betker
ae4ff4a1e7
Enable lambda visualization
2021-01-23 15:53:27 -07:00
James Betker
10ec6bda1d
lambda nets in switched_conv and a vqvae to use it
2021-01-23 14:57:57 -07:00
James Betker
b374dcdd46
update vqvae to double codebook size for bottom quantizer
2021-01-23 13:47:07 -07:00
James Betker
1b8a26db93
New switched_conv
2021-01-23 13:46:30 -07:00
James Betker
d919ae7148
Add VQVAE with no Conv2dTranspose
2021-01-18 08:49:59 -07:00
James Betker
587a4f4050
resnet_unet_3
...
I'm being really lazy here - these nets are not really different from each other
except at which layer they terminate. This one terminates at 2x downsampling,
which is simply indicative of a direction I want to go for testing these pixpro networks.
2021-01-15 14:51:03 -07:00
James Betker
038b8654b6
Pixpro: unwrap losses
2021-01-13 11:54:25 -07:00
James Betker
8990801a3f
Fix pixpro stochastic sampling bugs
2021-01-13 11:34:24 -07:00
James Betker
19475a072f
Pixpro: Rather than using a latent square for pixpro, use an entirely stochastic sampling of the pixels
2021-01-13 11:26:51 -07:00
James Betker
d1007ccfe7
Adjustments to pixpro to allow training against networks with arbitrarily large structural latents
...
- The pixpro latent now rescales the latent space instead of using a "coordinate vector", which
**might** have performance implications.
- The latent against which the pixel loss is computed can now be a small, randomly sampled patch
out of the entire latent, allowing further memory/computational discounts. Since the loss
computation does not have a receptive field, this should not alter the loss.
- The instance projection size can now be separate from the pixel projection size.
- PixContrast removed entirely.
- ResUnet with full resolution added.
2021-01-12 09:17:45 -07:00
James Betker
34f8c8641f
Support training imagenet classifier
2021-01-11 20:09:16 -07:00
James Betker
f3db381fa1
Allow uresnet to use pretrained resnet50
2021-01-10 12:57:31 -07:00
James Betker
07168ecfb4
Enable vqvae to use a switched_conv variant
2021-01-09 20:53:14 -07:00
James Betker
5a8156026a
Did anyone ask for k-means clustering?
...
This is so cool...
2021-01-07 22:37:41 -07:00
James Betker
de10c7246a
Add injected noise into bypass maps
2021-01-07 16:31:12 -07:00
James Betker
61a86a3c1e
VQVAE
2021-01-07 10:20:15 -07:00
James Betker
01a589e712
Adjustments to pixpro & resnet-unet
...
I'm not really satisfied with what I got out of these networks on round 1.
Lets try again..
2021-01-06 15:00:46 -07:00
James Betker
2f2f87bbea
Styled SR fixes
2021-01-05 20:14:39 -07:00
James Betker
9fed90393f
Add lucidrains pixpro trainer
2021-01-05 20:14:22 -07:00
James Betker
ade2732c82
Transfer learning for styleSR
...
This is a concept from "Lifelong Learning GAN", although I'm skeptical of it's novelty -
basically you scale and shift the weights for the generator and discriminator of a pretrained
GAN to "shift" into new modalities, e.g. faces->birds or whatever. There are some interesting
applications of this that I would like to try out.
2021-01-04 20:10:48 -07:00
James Betker
2c65b6b28e
More mods to support styledsr
2021-01-04 11:32:28 -07:00
James Betker
2225fe6ac2
Undo lucidrains changes for new discriminator
...
This "new" code will live in the styledsr directory from now on.
2021-01-04 10:57:09 -07:00
James Betker
40ec71da81
Move styled_sr into its own folder
2021-01-04 10:54:34 -07:00
James Betker
5916f5f7d4
Misc fixes
2021-01-04 10:53:53 -07:00
James Betker
4d8064c32c
Modifications to allow partially trained stylegan discriminators to be used
2021-01-03 16:37:18 -07:00
James Betker
bdbab65082
Allow optimizers to train separate param groups, add higher dimensional VGG discriminator
...
Did this to support training 512x512px networks off of a pretrained 256x256 network.
2021-01-02 15:10:06 -07:00
James Betker
193cdc6636
Move discriminators to the create_model paradigm
...
Also cleans up a lot of old discriminator models that I have no intention
of using again.
2021-01-01 15:56:09 -07:00
James Betker
f39179e85a
styled_sr: fix bug when using initial_stride
2021-01-01 12:13:21 -07:00
James Betker
913fc3b75e
Need init to pick up styled_sr
2021-01-01 12:10:32 -07:00
James Betker
e992e18767
Add initial_stride term to style_sr
...
Also fix fid and a networks.py issue.
2021-01-01 11:59:36 -07:00
James Betker
e214e6ce33
Styled SR model
2020-12-31 20:54:18 -07:00
James Betker
b1fb82476b
Add gp debug (fix)
2020-12-30 15:26:54 -07:00
James Betker
63cf3d3126
Injector auto-registration
...
I love it!
2020-12-29 20:58:02 -07:00
James Betker
a777c1e4f9
Misc script fixes
2020-12-29 20:25:09 -07:00
James Betker
ba543d1152
Glean mods
...
- Fixes fixed upscale factor issues
- Refines a few ops to decrease computation & parameterization
2020-12-27 12:25:06 -07:00
James Betker
f9be049adb
GLEAN mod to support custom initial strides
2020-12-26 13:51:14 -07:00
James Betker
3fd627fc62
Mods to support image classification & filtering
2020-12-26 13:49:27 -07:00
James Betker
10fdfa1563
Migrate generators to dynamic model registration
2020-12-24 23:02:10 -07:00
James Betker
29db7c7a02
Further mods to BYOL
2020-12-24 09:28:41 -07:00
James Betker
036684893e
Add LARS optimizer & support for BYOL idiosyncrasies
...
- Added LARS and SGD optimizer variants that support turning off certain
features for BN and bias layers
- Added a variant of pytorch's resnet model that supports gradient checkpointing.
- Modify the trainer infrastructure to support above
- Fix bug with BYOL (should have been nonfunctional)
2020-12-23 20:33:43 -07:00
James Betker
1bbcb96ee8
Implement a few changes to support training BYOL networks
2020-12-23 10:50:23 -07:00
James Betker
ae666dc520
Fix bugs with srflow after refactor
2020-12-19 10:28:23 -07:00
James Betker
4328c2f713
Change default ReLU slope to .2 BREAKS COMPATIBILITY
...
This conforms my ConvGnLelu implementation with the generally accepted negative_slope=.2. I have no idea where I got .1. This will break backwards compatibility with some older models but will likely improve their performance when freshly trained. I did some auditing to find what these models might be, and I am not actively using any of them, so probably OK.
2020-12-19 08:28:03 -07:00
James Betker
9377d34ac3
glean mods
2020-12-19 08:26:07 -07:00
James Betker
92f9a129f7
GLEAN!
2020-12-18 16:04:19 -07:00
James Betker
c717765bcb
Notes for lucidrains converter.
2020-12-18 09:55:38 -07:00
James Betker
b4720ea377
Move stylegan to new location
2020-12-18 09:52:36 -07:00
James Betker
1708136b55
Commit my attempt at "conforming" the lucidrains stylegan implementation to the reference spec. Not working. will probably be abandoned.
2020-12-18 09:51:48 -07:00
James Betker
209332292a
Rosinality stylegan fix
2020-12-18 09:50:41 -07:00
James Betker
d875ca8342
More refactor changes
2020-12-18 09:24:31 -07:00
James Betker
5640e4efe4
More refactoring
2020-12-18 09:18:34 -07:00
James Betker
b905b108da
Large cleanup
...
Removed a lot of old code that I won't be touching again. Refactored some
code elements into more logical places.
2020-12-18 09:10:44 -07:00
James Betker
3074f41877
Get rosinality model converter to work
...
Mostly, just needed to remove the custom cuda ops, not so bueno on Windows.
2020-12-17 16:03:39 -07:00
James Betker
e838c6e75b
Rosinality stylegan2 port
2020-12-17 14:18:46 -07:00
James Betker
49327b99fe
SRFlow outputs RRDB output
2020-12-16 10:28:02 -07:00
James Betker
c25b49bb12
Clean up of SRFlowNet_arch
2020-12-16 10:27:38 -07:00
James Betker
42ac8e3eeb
Remove unnecessary comment from SRFlowNet
2020-12-16 09:43:07 -07:00
James Betker
09de3052ac
Add softmax to spinenet classification head
2020-12-16 09:42:15 -07:00
James Betker
8661207d57
Merge branch 'gan_lab' of https://github.com/neonbjb/DL-Art-School into gan_lab
2020-12-15 17:16:48 -07:00
James Betker
fc376d34b2
Spinenet with logits head
2020-12-15 17:16:19 -07:00
James Betker
0a19e53df0
BYOL mods
2020-12-14 23:59:11 -07:00
James Betker
ef7eabf457
Allow RRDB to upscale 8x
2020-12-14 23:58:52 -07:00
James Betker
ec0ee25f4b
Structural latents checkpoint
2020-12-11 12:01:09 -07:00
James Betker
26ceca68c0
BYOL with structure!
2020-12-10 15:07:35 -07:00
James Betker
c203cee31e
Allow swapping to torch DDP as needed in code
2020-12-09 15:03:59 -07:00
James Betker
97ff25a086
BYOL!
...
Man, is there anything ExtensibleTrainer can't train? :)
2020-12-08 13:07:53 -07:00
James Betker
bca59ed98a
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-12-07 12:51:04 -07:00
James Betker
ea56eb61f0
Fix DDP errors for discriminator
...
- Don't define training_net in define_optimizers - this drops the shell and leads to problems downstream
- Get rid of support for multiple training nets per opt. This was half baked and needs a better solution if needed
downstream.
2020-12-07 12:50:57 -07:00
James Betker
88fc049c8d
spinenet latent playground!
2020-12-05 20:30:36 -07:00
James Betker
11155aead4
Directly use dataset keys
...
This has been a long time coming. Cleans up messy "GT" nomenclature and simplifies ExtensibleTraner.feed_data
2020-12-04 20:14:53 -07:00
James Betker
8a83b1c716
Go back to apex DDP, fix distributed bugs
2020-12-04 16:39:21 -07:00
James Betker
7a81d4e2f4
Revert gaussian loss changes
2020-12-04 12:49:20 -07:00
James Betker
711780126e
Cleanup
2020-12-03 23:42:51 -07:00
James Betker
ac7256d4a3
Do tqdm reporting when calculating flow_gaussian_nll
2020-12-03 23:42:29 -07:00
James Betker
dc9ff8e05b
Allow the majority of the srflow steps to checkpoint
2020-12-03 23:41:57 -07:00
James Betker
06d1c62c5a
iGPT support!
...
Sweeeeet
2020-12-03 15:32:21 -07:00
James Betker
c18adbd606
Delete mdcn & panet
...
Garbage, all of it.
2020-12-02 22:25:57 -07:00
James Betker
f2880b33c9
Get rid of mean shift from MDCN
2020-12-02 14:18:33 -07:00
James Betker
8a00f15746
Implement FlowGaussianNll evaluator
2020-12-02 14:09:54 -07:00
James Betker
edf408508c
Fix discriminator
2020-12-01 17:45:56 -07:00
James Betker
9a421a41f4
SRFlow: accomodate mismatches between global scale and flow_scale
2020-12-01 11:11:51 -07:00
James Betker
e343722d37
Add stepped rrdb
2020-12-01 11:11:15 -07:00
James Betker
2e0bbda640
Remove unused archs
2020-12-01 11:10:48 -07:00
James Betker
a1c8300052
Add mdcn
2020-11-30 16:14:21 -07:00
James Betker
1e0f69e34b
extra_conv in gn discriminator, multiframe support in rrdb.
2020-11-29 15:39:50 -07:00
James Betker
da604752e6
Misc RRDB changes
2020-11-29 12:21:31 -07:00
James Betker
a1d4c9f83c
multires rrdb work
2020-11-28 14:35:46 -07:00
James Betker
929cd45c05
Fix for RRDB scale
2020-11-27 21:37:10 -07:00
James Betker
71fa532356
Adjustments to how flow networks set size and scale
2020-11-27 21:37:00 -07:00
James Betker
6f958bb150
Maybe this is necessary after all?
2020-11-27 15:21:13 -07:00
James Betker
ef8d5f88c1
Bring split gaussian nll out of split so it can be computed accurately with the rest of the nll component
2020-11-27 13:30:21 -07:00
James Betker
4ab49b0d69
RRDB disc work
2020-11-27 12:03:08 -07:00
James Betker
6de4dabb73
Remove srflow (modified version)
...
Starting from orig and re-working from there.
2020-11-27 12:02:06 -07:00
James Betker
fd356580c0
Play with lambdas
2020-11-26 20:30:55 -07:00
James Betker
cb045121b3
Expose srflow rrdb
2020-11-24 13:20:20 -07:00
James Betker
f6098155cd
Mods to tecogan to allow use of embeddings as input
2020-11-24 09:24:02 -07:00
James Betker
b10bcf6436
Rework stylegan_for_sr to incorporate structure as an adain block
2020-11-23 11:31:11 -07:00
James Betker
519ba6f10c
Support 2x RRDB with 4x srflow
2020-11-21 14:46:15 -07:00
James Betker
cad92bada8
Report logp and logdet for srflow
2020-11-21 10:13:05 -07:00
James Betker
c37d3faa58
More adjustments to srflow_orig
2020-11-20 19:38:33 -07:00
James Betker
d51d12a41a
Adjustments to srflow to (maybe?) fix training
2020-11-20 14:44:24 -07:00
James Betker
6c8c35ac47
Support training RRDB encoder [srflow]
2020-11-20 10:03:06 -07:00
James Betker
5ccdbcefe3
srflow_orig integration
2020-11-19 23:47:24 -07:00
James Betker
2b2d754d8e
Bring in an original SRFlow implementation for reference
2020-11-19 21:42:39 -07:00
James Betker
1e0d7be3ce
"Clean up" SRFlow
2020-11-19 21:42:24 -07:00
James Betker
d7877d0a36
Fixes to teco losses and translational losses
2020-11-19 11:35:05 -07:00
James Betker
5c10264538
Remove pyramid_disc hard dependencies
2020-11-17 18:34:11 -07:00
James Betker
6b679e2b51
Make grad_penalty available to classical discs
2020-11-17 18:31:40 -07:00
James Betker
8a19c9ae15
Add additive mode to rrdb
2020-11-16 20:45:09 -07:00
James Betker
2a507987df
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-11-15 16:16:30 -07:00
James Betker
931ed903c1
Allow combined additive loss
2020-11-15 16:16:18 -07:00
James Betker
4b68116977
import fix
2020-11-15 16:15:42 -07:00
James Betker
98eada1e4c
More circular dependency fixes + unet fixes
2020-11-15 11:53:35 -07:00
James Betker
e587d549f7
Fix circular imports
2020-11-15 11:32:35 -07:00
James Betker
99f0cfaab5
Rework stylegan2 divergence losses
...
Notably: include unet loss
2020-11-15 11:26:44 -07:00
James Betker
ea94b93a37
Fixes for unet
2020-11-15 10:38:33 -07:00
James Betker
89f56b2091
Fix another import
2020-11-14 22:10:45 -07:00
James Betker
9af049c671
Import fix for unet
2020-11-14 22:09:18 -07:00
James Betker
5cade6b874
Move stylegan2 around, bring in unet
2020-11-14 22:04:48 -07:00
James Betker
125cb16dce
Add a FID evaluator for stylegan with structural guidance
2020-11-14 20:16:07 -07:00
James Betker
c9258e2da3
Alter how structural guidance is given to stylegan
2020-11-14 20:15:48 -07:00
James Betker
3397c83447
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-11-14 09:30:09 -07:00
James Betker
423ee7cb90
Allow attention to be specified for stylegan2
2020-11-14 09:29:53 -07:00
James Betker
f406a5dd4c
Mods to support stylegan2 in SR mode
2020-11-13 20:11:50 -07:00
James Betker
9c3d0b7560
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-11-13 20:10:47 -07:00
James Betker
67bf55495b
Allow hq_batched_key to be specified
2020-11-13 20:10:12 -07:00
James Betker
0b96811611
Fix another issue with gpu ids getting thrown all over hte place
2020-11-13 20:05:52 -07:00
James Betker
a07e1a7292
Add separate Evaluator module and FID evaluator
2020-11-13 11:03:54 -07:00
James Betker
080ad61be4
Add option to work with nonrandom latents
2020-11-12 21:23:50 -07:00
James Betker
566b99ca75
GP adjustments for stylegan2
2020-11-12 16:44:51 -07:00
James Betker
44a19cd37c
ExtensibleTrainer mods to support advanced checkpointing for stylegan2
...
Basically: stylegan2 makes use of gradient-based normalizers. These
make it so that I cannot use gradient checkpointing. But I love gradient
checkpointing. It makes things really, really fast and memory conscious.
So - only don't checkpoint when we run the regularizer loss. This is a
bit messy, but speeds up training by at least 20%.
Also: pytorch: please make checkpointing a first class citizen.
2020-11-12 15:45:07 -07:00
James Betker
db9e9e28a0
Fix an issue where GPU0 was always being used in non-ddp
...
Frankly, I don't understand how this has ever worked. WTF.
2020-11-12 15:43:01 -07:00
James Betker
2d3449d7a5
stylegan2 in ml art school!
2020-11-12 15:42:05 -07:00
James Betker
fd97573085
Fixes
2020-11-11 21:49:06 -07:00
James Betker
88f349bdf1
Enable usage of wandb
2020-11-11 21:48:56 -07:00
James Betker
1c065c41b4
Revert "..."
...
This reverts commit 4b92191880
.
2020-11-11 17:24:27 -07:00
James Betker
4b92191880
...
2020-11-11 14:12:40 -07:00
James Betker
12b57bbd03
Add residual blocks to pyramid disc
2020-11-11 13:56:45 -07:00
James Betker
b4136d766a
Back to pyramids, no rrdb
2020-11-11 13:40:24 -07:00
James Betker
42a97de756
Convert PyramidRRDBDisc to RRDBDisc
...
Had numeric stability issues. This probably makes more sense anyways.
2020-11-11 12:14:14 -07:00
James Betker
72762f200c
PyramidRRDB net
2020-11-11 11:25:49 -07:00
James Betker
a1760f8969
Adapt srg2 for video
2020-11-10 16:16:41 -07:00
James Betker
b742d1e5a5
When skipping steps via "every", still run nontrainable injection points
2020-11-10 16:09:17 -07:00
James Betker
91d27372e4
rrdb with adain latent
2020-11-10 16:08:54 -07:00
James Betker
6a2fd5f7d0
Lots of new discriminator nets
2020-11-10 16:06:54 -07:00
James Betker
4e5ba61ae7
SRG2classic further re-integration
2020-11-10 16:06:14 -07:00
James Betker
9e2c96ad5d
More latent work
2020-11-07 20:38:56 -07:00
James Betker
0cf52ef52c
latent work
2020-11-06 20:38:23 -07:00
James Betker
34d319585c
Add srflow arch
2020-11-06 20:38:04 -07:00
James Betker
4469d2e661
More work on RRDB with latent
2020-11-05 22:13:05 -07:00
James Betker
62d3b6496b
Latent work checkpoint
2020-11-05 13:31:34 -07:00
James Betker
fd6cdba88f
RRDB with latent
2020-11-05 10:04:17 -07:00
James Betker
df47d6cbbb
More work in support of training flow networks in tandem with generators
2020-11-04 18:07:48 -07:00
James Betker
658a267bab
More work on SSIM/PSNR approximators
...
- Add a network that accomodates this style of approximator while retaining structure
- Migrate to SSIM approximation
- Add a tool to visualize how these approximators are working
- Fix some issues that came up while doign this work
2020-11-03 08:09:58 -07:00
James Betker
a51daacde2
Fix reporting of d_fake_diff for generators
2020-11-02 08:45:46 -07:00
James Betker
dcfe994fee
Add standalone srg2_classic
...
Trying to investigate how I was so misguided. I *thought* srg2 was considerably
better than RRDB in performance but am not actually seeing that.
2020-10-31 20:55:34 -06:00
James Betker
eb7df63592
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-10-31 11:09:32 -06:00
James Betker
c2866ad8d2
Disable debugging of comparable pingpong generations
2020-10-31 11:09:10 -06:00
James Betker
7303d8c932
Add psnr approximator
2020-10-31 11:08:55 -06:00
James Betker
565517814e
Restore SRG2
...
Going to try to figure out where SRG lost competitiveness to RRDB..
2020-10-30 14:01:56 -06:00
James Betker
74738489b9
Fixes and additional support for progressive zoom
2020-10-30 09:59:54 -06:00
James Betker
a3918fa808
Tecogan & other fixes
2020-10-30 00:19:58 -06:00
James Betker
b316078a15
Fix tecogan_losses fp16
2020-10-29 23:02:20 -06:00
James Betker
3791f95ad0
Enable RRDB to take in reference inputs
2020-10-29 11:07:40 -06:00
James Betker
7d38381d46
Add scaling to rrdb
2020-10-29 09:48:10 -06:00
James Betker
607ff3c67c
RRDB with bypass
2020-10-29 09:39:45 -06:00
James Betker
1655b9e242
Fix fast_forward teco loss bug
2020-10-28 17:49:54 -06:00
James Betker
515905e904
Add a min_loss that is DDP compatible
2020-10-28 15:46:59 -06:00
James Betker
f133243ac8
Extra logging for teco_resgen
2020-10-28 15:21:22 -06:00
James Betker
2ab5054d4c
Add noise to teco disc
2020-10-27 22:48:23 -06:00
James Betker
4dc16d5889
Upgrade tecogan_losses for speed
2020-10-27 22:40:15 -06:00
James Betker
ac3da0c5a6
Make tecogen functional
2020-10-27 21:08:59 -06:00
James Betker
10da206db6
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-10-27 20:59:59 -06:00
James Betker
9848f4c6cb
Add teco_resgen
2020-10-27 20:59:55 -06:00
James Betker
543c384a91
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-10-27 20:59:16 -06:00
James Betker
da53090ce6
More adjustments to support distributed training with teco & on multi_modal_train
2020-10-27 20:58:03 -06:00
James Betker
00bb568956
further checkpointify spsr_arch
2020-10-27 17:54:28 -06:00
James Betker
c2727a0150
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-10-27 15:24:19 -06:00
James Betker
2a3eec8fd7
Fix some distributed training snafus
2020-10-27 15:24:05 -06:00
James Betker
d923a62ed3
Allow SPSR to checkpoint
2020-10-27 15:23:20 -06:00
James Betker
11a9e223a6
Retrofit SPSR_arch so it is capable of accepting a ref
2020-10-27 11:14:36 -06:00
James Betker
8202ee72b9
Re-add original SPSR_arch
2020-10-27 11:00:38 -06:00
James Betker
231137ab0a
Revert RRDB back to original model
2020-10-27 10:25:31 -06:00
James Betker
1ce863849a
Remove temporary base_model change
2020-10-26 11:13:01 -06:00
James Betker
54accfa693
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-10-26 11:12:37 -06:00
James Betker
ff58c6484a
Fixes to unified chunk datasets to support stereoscopic training
2020-10-26 11:12:22 -06:00
James Betker
f857eb00a8
Allow tecogan losses to compute at 32px
2020-10-26 11:09:55 -06:00
James Betker
629b968901
ChainedGen 4x alteration
...
Increases conv window for teco_recurrent in the 4x case so all data
can be used.
base_model changes should be temporary.
2020-10-26 10:54:51 -06:00
James Betker
85c07f85d9
Update flownet submodule
2020-10-24 11:59:00 -06:00
James Betker
9c3d059ef0
Updates to be able to train flownet2 in ExtensibleTrainer
...
Only supports basic losses for now, though.
2020-10-24 11:56:39 -06:00
James Betker
1dbcbfbac8
Restore ChainedEmbeddingGenWithStructure
...
Still using this guy, after all
2020-10-24 11:54:52 -06:00
James Betker
7a75d10784
Arch cleanup
2020-10-23 09:35:33 -06:00
James Betker
646d6a621a
Support 4x zoom on ChainedEmbeddingGen
2020-10-23 09:25:58 -06:00
James Betker
e9c0b9f0fd
More adjustments to support multi-modal training
...
Specifically - looks like at least MSE loss cannot handle autocasted tensors
2020-10-22 16:49:34 -06:00
James Betker
76789a456f
Class-ify train.py and workon multi-modal trainer
2020-10-22 16:15:31 -06:00
James Betker
15e00e9014
Finish integration with autocast
...
Note: autocast is broken when also using checkpoint(). Overcome this by modifying
torch's checkpoint() function in place to also use autocast.
2020-10-22 14:39:19 -06:00
James Betker
d7ee14f721
Move to torch.cuda.amp (not working)
...
Running into OOM errors, needs diagnosing. Checkpointing here.
2020-10-22 13:58:05 -06:00
James Betker
3e3d2af1f3
Add multi-modal trainer
2020-10-22 13:27:32 -06:00
James Betker
40dc2938e8
Fix multifaceted chain gen
2020-10-22 13:27:06 -06:00
James Betker
43c4f92123
Collapse progressive zoom candidates into the batch dimension
...
This contributes a significant speedup to training this type of network
since losses can operate on the entire prediction spectrum at once.
2020-10-21 22:37:23 -06:00
James Betker
680d635420
Enable ExtensibleTrainer to skip steps when state keys are missing
2020-10-21 22:22:28 -06:00
James Betker
d1175f0de1
Add FFT injector
2020-10-21 22:22:00 -06:00
James Betker
1ef559d7ca
Add a ChainedEmbeddingGen which can be simueltaneously used with multiple training paradigms
2020-10-21 22:21:51 -06:00
James Betker
931aa65dd0
Allow recurrent losses to be weighted
2020-10-21 16:59:44 -06:00
James Betker
5753e77d67
ChainedGen: Output debugging information on blocks
2020-10-21 16:36:23 -06:00
James Betker
3c6e600e48
Add capacity for models to self-report visuals
2020-10-21 11:08:03 -06:00
James Betker
dca5cddb3b
Add bypass to ChainedEmbeddingGen
2020-10-21 11:07:45 -06:00
James Betker
a63bf2ea2f
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-10-19 15:26:11 -06:00
James Betker
76e4f0c086
Restore test.py for use as standalone validator
2020-10-19 15:26:07 -06:00
James Betker
1b1ca297f8
Fix recurrent=None bug in ChainedEmbeddingGen
2020-10-19 15:25:12 -06:00
James Betker
b28e4d9cc7
Add spread loss
...
Experimental loss that peaks around 0.
2020-10-19 11:31:19 -06:00
James Betker
981d64413b
Support validation over a custom injector
...
Also re-enable PSNR
2020-10-19 11:01:56 -06:00
James Betker
668cafa798
Push correct patch of recurrent embedding to upstream image, rather than whole thing
2020-10-18 22:39:52 -06:00
James Betker
7df378a944
Remove separated vgg discriminator
...
Checkpointing happens inline instead. Was a dumb idea..
Also fixes some loss reporting issues.
2020-10-18 12:10:24 -06:00
James Betker
c709d38cd5
Fix memory leak with recurrent loss
2020-10-18 10:22:10 -06:00
James Betker
552e70a032
Get rid of excessive checkpointed disc params
2020-10-18 10:09:37 -06:00
James Betker
6a0d5f4813
Add a checkpointable discriminator
2020-10-18 09:57:47 -06:00
James Betker
9ead2c0a08
Multiscale training in!
2020-10-17 22:54:12 -06:00
James Betker
e706911c83
Fix spinenet bug
2020-10-17 20:20:36 -06:00
James Betker
b008a27d39
Spinenet should allow bypassing the initial conv
...
This makes feeding in references for recurrence easier.
2020-10-17 20:16:47 -06:00
James Betker
c1c9c5681f
Swap recurrence
2020-10-17 08:40:28 -06:00
James Betker
6141aa1110
More recurrence fixes for chainedgen
2020-10-17 08:35:46 -06:00
James Betker
cf8118a85b
Allow recurrence to specified for chainedgen
2020-10-17 08:32:29 -06:00
James Betker
fc4c064867
Add recurrent support to chainedgenwithstructure
2020-10-17 08:31:34 -06:00
James Betker
d4a3e11ab2
Don't use several stages of spinenet_arch
...
These are used for lower outputs which I am not using
2020-10-17 08:28:37 -06:00
James Betker
d1c63ae339
Go back to torch's DDP
...
Apex was having some weird crashing issues.
2020-10-16 20:47:35 -06:00
James Betker
d856378b2e
Add ChainedGenWithStructure
2020-10-16 20:44:36 -06:00
James Betker
617d97e19d
Add ChainedEmbeddingGen
2020-10-15 23:18:08 -06:00
James Betker
c4543ce124
Set post_transform_block to None where applicable
2020-10-15 17:20:42 -06:00
James Betker
6f8705e8cb
SSGSimpler network
2020-10-15 17:18:44 -06:00
James Betker
eda75c9779
Cleanup fixes
2020-10-15 10:13:17 -06:00
James Betker
920865defb
Arch work
2020-10-15 10:13:06 -06:00
James Betker
1f20d59c31
Revert big switch back
2020-10-14 11:03:34 -06:00
James Betker
24792bdb4f
Codebase cleanup
...
Removed a lot of legacy stuff I have no intent on using again.
Plan is to shape this repo into something more extensible (get it? hah!)
2020-10-13 20:56:39 -06:00
James Betker
e620fc05ba
Mods to support video processing with teco networks
2020-10-13 20:47:05 -06:00
James Betker
17d78195ee
Mods to SRG to support returning switch logits
2020-10-13 20:46:37 -06:00
James Betker
cc915303a5
Fix SPSR calls into SwitchComputer
2020-10-13 10:14:47 -06:00
James Betker
bdf4c38899
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
...
# Conflicts:
# codes/models/archs/SwitchedResidualGenerator_arch.py
2020-10-13 10:12:26 -06:00
James Betker
9a5d6162e9
Add the "BigSwitch"
2020-10-13 10:11:10 -06:00
James Betker
8014f050ac
Clear metrics properly
...
Holy cow, what a PITA bug.
2020-10-13 10:07:49 -06:00
James Betker
4d52374e60
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-10-12 17:43:51 -06:00
James Betker
731700ab2c
checkpoint in ssg
2020-10-12 17:43:28 -06:00
James Betker
ca523215c6
Fix recurrent std in arch
2020-10-12 17:42:32 -06:00
James Betker
05377973bf
Allow initial recurrent input to be specified (optionally)
2020-10-12 17:36:43 -06:00
James Betker
597b6e92d6
Add ssgr1 recurrence
2020-10-12 17:18:19 -06:00
James Betker
d7d7590f3e
Fix constant injector - wasn't working in test
2020-10-12 10:36:30 -06:00
James Betker
ce163ad4a9
Update SSGdeep
2020-10-12 10:22:08 -06:00
James Betker
3409d88a1c
Add PANet arch
2020-10-12 10:20:55 -06:00
James Betker
a9c2e97391
Constant injector and teco fixes
2020-10-11 08:20:07 -06:00
James Betker
e785029936
Mods needed to support SPSR archs with teco gan
2020-10-10 22:39:55 -06:00
James Betker
120072d464
Add constant injector
2020-10-10 21:50:23 -06:00
James Betker
f99812e14d
Fix tecogan_losses errors
2020-10-10 20:30:14 -06:00
James Betker
3a5b23b9f7
Alter teco_losses to feed a recurrent input in as separate
2020-10-10 20:21:09 -06:00
James Betker
0d30d18a3d
Add MarginRemoval injector
2020-10-09 20:35:56 -06:00
James Betker
0011d445c8
Fix loss indexing
2020-10-09 20:20:51 -06:00
James Betker
202eb11fdc
For element loss added
2020-10-09 19:51:44 -06:00
James Betker
fe50d6f9d0
Fix attention images
2020-10-09 19:21:55 -06:00
James Betker
7e777ea34c
Allow tecogan to be used in process_video
2020-10-09 19:21:43 -06:00
James Betker
58d8bf8f69
Add network architecture built for teco
2020-10-09 08:40:14 -06:00
James Betker
afe6af88af
Fix attention print issue
2020-10-08 18:34:00 -06:00
James Betker
4c85ee51a4
Converge SSG architectures into unified switching base class
...
Also adds attention norm histogram to logging
2020-10-08 17:23:21 -06:00
James Betker
1eb516d686
Fix more distributed bugs
2020-10-08 14:32:45 -06:00
James Betker
fba29d7dcc
Move to apex distributeddataparallel and add switch all_reduce
...
Torch's distributed_data_parallel is missing "delay_allreduce", which is
necessary to get gradient checkpointing to work with recurrent models.
2020-10-08 11:20:05 -06:00
James Betker
c174ac0fd5
Allow tecogan to support generators that only output a tensor (instead of a list)
2020-10-08 09:26:25 -06:00
James Betker
969bcd9021
Use local checkpoint in SSG
2020-10-08 08:54:46 -06:00
James Betker
c93dd623d7
Tecogan losses work
2020-10-07 23:11:58 -06:00
James Betker
c96f5b2686
Import switched_conv as a submodule
2020-10-07 23:10:54 -06:00
James Betker
c352c8bce4
More tecogan fixes
2020-10-07 12:41:17 -06:00
James Betker
1c44d395af
Tecogan work
...
Its training! There's still probably plenty of bugs though..
2020-10-07 09:03:30 -06:00
James Betker
e9d7371a61
Add concatenate injector
2020-10-07 09:02:42 -06:00
James Betker
8a7e993aea
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-10-06 20:41:58 -06:00
James Betker
1e415b249b
Add tag that can be applied to prevent parameter training
2020-10-06 20:39:49 -06:00
James Betker
2f2e3f33f8
StackedSwitchedGenerator_5lyr
2020-10-06 20:39:32 -06:00
James Betker
6217b48e3f
Fix spsr_arch bug
2020-10-06 20:38:47 -06:00
James Betker
cffc596141
Integrate flownet2 into codebase, add teco visual debugs
2020-10-06 20:35:39 -06:00
James Betker
e4b89a172f
Reduce spsr7 memory usage
2020-10-05 22:05:56 -06:00
James Betker
4111942ada
Support attention deferral in deep ssgr
2020-10-05 19:35:55 -06:00
James Betker
840927063a
Work on tecogan losses
2020-10-05 19:35:28 -06:00
James Betker
2875822024
SPSR9 arch
...
takes some of the stuff I learned with SGSR yesterday and applies it to spsr
2020-10-05 08:47:51 -06:00
James Betker
51044929af
Don't compute attention statistics on multiple generator invocations of the same data
2020-10-05 00:34:29 -06:00
James Betker
e760658fdb
Another fix..
2020-10-04 21:08:00 -06:00
James Betker
a890e3a9c0
Fix geometric loss not handling 0 index
2020-10-04 21:05:01 -06:00
James Betker
c3ef8a4a31
Stacked switches - return a tuple
2020-10-04 21:02:24 -06:00
James Betker
13f97e1e97
Add recursive loss
2020-10-04 20:48:15 -06:00
James Betker
ffd069fd97
Lots of SSG work
...
- Checkpointed pretty much the entire model - enabling recurrent inputs
- Added two new models for test - adding depth (again) and removing SPSR (in lieu of the new losses)
2020-10-04 20:48:08 -06:00
James Betker
aca2c7ab41
Full checkpoint-ize SSG1
2020-10-04 18:24:52 -06:00
James Betker
e3294939b0
Revert "SSG: offer option to use BN-based attention normalization"
...
Didn't work. Oh well.
This reverts commit 5cd2b37591
.
2020-10-03 17:54:53 -06:00
James Betker
5cd2b37591
SSG: offer option to use BN-based attention normalization
...
Not sure how this is going to work, lets try it.
2020-10-03 16:16:19 -06:00
James Betker
9b4ed82093
Get rid of unused convs in spsr7
2020-10-03 11:36:26 -06:00
James Betker
3561cc164d
Fix up fea_loss calculator (for validation)
...
Not sure how this was working in regular training mode, but it
was failing in DDP.
2020-10-03 11:19:20 -06:00
James Betker
6c9718ad64
Don't log if you aren't 0 rank
2020-10-03 11:14:13 -06:00
James Betker
922b1d76df
Don't record visuals when not on rank 0
2020-10-03 11:10:03 -06:00
James Betker
8197fd646f
Don't accumulate losses for metrics when the loss isn't a tensor
2020-10-03 11:03:55 -06:00
James Betker
19a4075e1e
Allow checkpointing to be disabled in the options file
...
Also makes options a global variable for usage in utils.
2020-10-03 11:03:28 -06:00
James Betker
dd9d7b27ac
Add more sophisticated mechanism for balancing GAN losses
2020-10-02 22:53:42 -06:00
James Betker
39865ca3df
TOTAL_loss, dumbo
2020-10-02 21:06:10 -06:00
James Betker
4e44fcd655
Loss accumulator fix
2020-10-02 20:55:33 -06:00
James Betker
567b4d50a4
ExtensibleTrainer - don't compute backward when there is no loss
2020-10-02 20:54:06 -06:00