Commit Graph

526 Commits

Author SHA1 Message Date
James Betker
680d635420 Enable ExtensibleTrainer to skip steps when state keys are missing 2020-10-21 22:22:28 -06:00
James Betker
d1175f0de1 Add FFT injector 2020-10-21 22:22:00 -06:00
James Betker
1ef559d7ca Add a ChainedEmbeddingGen which can be simueltaneously used with multiple training paradigms 2020-10-21 22:21:51 -06:00
James Betker
931aa65dd0 Allow recurrent losses to be weighted 2020-10-21 16:59:44 -06:00
James Betker
5753e77d67 ChainedGen: Output debugging information on blocks 2020-10-21 16:36:23 -06:00
James Betker
3c6e600e48 Add capacity for models to self-report visuals 2020-10-21 11:08:03 -06:00
James Betker
dca5cddb3b Add bypass to ChainedEmbeddingGen 2020-10-21 11:07:45 -06:00
James Betker
a63bf2ea2f Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-19 15:26:11 -06:00
James Betker
76e4f0c086 Restore test.py for use as standalone validator 2020-10-19 15:26:07 -06:00
James Betker
1b1ca297f8 Fix recurrent=None bug in ChainedEmbeddingGen 2020-10-19 15:25:12 -06:00
James Betker
b28e4d9cc7 Add spread loss
Experimental loss that peaks around 0.
2020-10-19 11:31:19 -06:00
James Betker
981d64413b Support validation over a custom injector
Also re-enable PSNR
2020-10-19 11:01:56 -06:00
James Betker
668cafa798 Push correct patch of recurrent embedding to upstream image, rather than whole thing 2020-10-18 22:39:52 -06:00
James Betker
7df378a944 Remove separated vgg discriminator
Checkpointing happens inline instead. Was a dumb idea..

Also fixes some loss reporting issues.
2020-10-18 12:10:24 -06:00
James Betker
c709d38cd5 Fix memory leak with recurrent loss 2020-10-18 10:22:10 -06:00
James Betker
552e70a032 Get rid of excessive checkpointed disc params 2020-10-18 10:09:37 -06:00
James Betker
6a0d5f4813 Add a checkpointable discriminator 2020-10-18 09:57:47 -06:00
James Betker
9ead2c0a08 Multiscale training in! 2020-10-17 22:54:12 -06:00
James Betker
e706911c83 Fix spinenet bug 2020-10-17 20:20:36 -06:00
James Betker
b008a27d39 Spinenet should allow bypassing the initial conv
This makes feeding in references for recurrence easier.
2020-10-17 20:16:47 -06:00
James Betker
c1c9c5681f Swap recurrence 2020-10-17 08:40:28 -06:00
James Betker
6141aa1110 More recurrence fixes for chainedgen 2020-10-17 08:35:46 -06:00
James Betker
cf8118a85b Allow recurrence to specified for chainedgen 2020-10-17 08:32:29 -06:00
James Betker
fc4c064867 Add recurrent support to chainedgenwithstructure 2020-10-17 08:31:34 -06:00
James Betker
d4a3e11ab2 Don't use several stages of spinenet_arch
These are used for lower outputs which I am not using
2020-10-17 08:28:37 -06:00
James Betker
d1c63ae339 Go back to torch's DDP
Apex was having some weird crashing issues.
2020-10-16 20:47:35 -06:00
James Betker
d856378b2e Add ChainedGenWithStructure 2020-10-16 20:44:36 -06:00
James Betker
617d97e19d Add ChainedEmbeddingGen 2020-10-15 23:18:08 -06:00
James Betker
c4543ce124 Set post_transform_block to None where applicable 2020-10-15 17:20:42 -06:00
James Betker
6f8705e8cb SSGSimpler network 2020-10-15 17:18:44 -06:00
James Betker
eda75c9779 Cleanup fixes 2020-10-15 10:13:17 -06:00
James Betker
920865defb Arch work 2020-10-15 10:13:06 -06:00
James Betker
1f20d59c31 Revert big switch back 2020-10-14 11:03:34 -06:00
James Betker
24792bdb4f Codebase cleanup
Removed a lot of legacy stuff I have no intent on using again.
Plan is to shape this repo into something more extensible (get it? hah!)
2020-10-13 20:56:39 -06:00
James Betker
e620fc05ba Mods to support video processing with teco networks 2020-10-13 20:47:05 -06:00
James Betker
17d78195ee Mods to SRG to support returning switch logits 2020-10-13 20:46:37 -06:00
James Betker
cc915303a5 Fix SPSR calls into SwitchComputer 2020-10-13 10:14:47 -06:00
James Betker
bdf4c38899 Merge remote-tracking branch 'origin/gan_lab' into gan_lab
# Conflicts:
#	codes/models/archs/SwitchedResidualGenerator_arch.py
2020-10-13 10:12:26 -06:00
James Betker
9a5d6162e9 Add the "BigSwitch" 2020-10-13 10:11:10 -06:00
James Betker
8014f050ac Clear metrics properly
Holy cow, what a PITA bug.
2020-10-13 10:07:49 -06:00
James Betker
4d52374e60 Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-12 17:43:51 -06:00
James Betker
731700ab2c checkpoint in ssg 2020-10-12 17:43:28 -06:00
James Betker
ca523215c6 Fix recurrent std in arch 2020-10-12 17:42:32 -06:00
James Betker
05377973bf Allow initial recurrent input to be specified (optionally) 2020-10-12 17:36:43 -06:00
James Betker
597b6e92d6 Add ssgr1 recurrence 2020-10-12 17:18:19 -06:00
James Betker
d7d7590f3e Fix constant injector - wasn't working in test 2020-10-12 10:36:30 -06:00
James Betker
ce163ad4a9 Update SSGdeep 2020-10-12 10:22:08 -06:00
James Betker
3409d88a1c Add PANet arch 2020-10-12 10:20:55 -06:00
James Betker
a9c2e97391 Constant injector and teco fixes 2020-10-11 08:20:07 -06:00
James Betker
e785029936 Mods needed to support SPSR archs with teco gan 2020-10-10 22:39:55 -06:00
James Betker
120072d464 Add constant injector 2020-10-10 21:50:23 -06:00
James Betker
f99812e14d Fix tecogan_losses errors 2020-10-10 20:30:14 -06:00
James Betker
3a5b23b9f7 Alter teco_losses to feed a recurrent input in as separate 2020-10-10 20:21:09 -06:00
James Betker
0d30d18a3d Add MarginRemoval injector 2020-10-09 20:35:56 -06:00
James Betker
0011d445c8 Fix loss indexing 2020-10-09 20:20:51 -06:00
James Betker
202eb11fdc For element loss added 2020-10-09 19:51:44 -06:00
James Betker
fe50d6f9d0 Fix attention images 2020-10-09 19:21:55 -06:00
James Betker
7e777ea34c Allow tecogan to be used in process_video 2020-10-09 19:21:43 -06:00
James Betker
58d8bf8f69 Add network architecture built for teco 2020-10-09 08:40:14 -06:00
James Betker
afe6af88af Fix attention print issue 2020-10-08 18:34:00 -06:00
James Betker
4c85ee51a4 Converge SSG architectures into unified switching base class
Also adds attention norm histogram to logging
2020-10-08 17:23:21 -06:00
James Betker
1eb516d686 Fix more distributed bugs 2020-10-08 14:32:45 -06:00
James Betker
fba29d7dcc Move to apex distributeddataparallel and add switch all_reduce
Torch's distributed_data_parallel is missing "delay_allreduce", which is
necessary to get gradient checkpointing to work with recurrent models.
2020-10-08 11:20:05 -06:00
James Betker
c174ac0fd5 Allow tecogan to support generators that only output a tensor (instead of a list) 2020-10-08 09:26:25 -06:00
James Betker
969bcd9021 Use local checkpoint in SSG 2020-10-08 08:54:46 -06:00
James Betker
c93dd623d7 Tecogan losses work 2020-10-07 23:11:58 -06:00
James Betker
c96f5b2686 Import switched_conv as a submodule 2020-10-07 23:10:54 -06:00
James Betker
c352c8bce4 More tecogan fixes 2020-10-07 12:41:17 -06:00
James Betker
1c44d395af Tecogan work
Its training!  There's still probably plenty of bugs though..
2020-10-07 09:03:30 -06:00
James Betker
e9d7371a61 Add concatenate injector 2020-10-07 09:02:42 -06:00
James Betker
8a7e993aea Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-06 20:41:58 -06:00
James Betker
1e415b249b Add tag that can be applied to prevent parameter training 2020-10-06 20:39:49 -06:00
James Betker
2f2e3f33f8 StackedSwitchedGenerator_5lyr 2020-10-06 20:39:32 -06:00
James Betker
6217b48e3f Fix spsr_arch bug 2020-10-06 20:38:47 -06:00
James Betker
cffc596141 Integrate flownet2 into codebase, add teco visual debugs 2020-10-06 20:35:39 -06:00
James Betker
e4b89a172f Reduce spsr7 memory usage 2020-10-05 22:05:56 -06:00
James Betker
4111942ada Support attention deferral in deep ssgr 2020-10-05 19:35:55 -06:00
James Betker
840927063a Work on tecogan losses 2020-10-05 19:35:28 -06:00
James Betker
2875822024 SPSR9 arch
takes some of the stuff I learned with SGSR yesterday and applies it to spsr
2020-10-05 08:47:51 -06:00
James Betker
51044929af Don't compute attention statistics on multiple generator invocations of the same data 2020-10-05 00:34:29 -06:00
James Betker
e760658fdb Another fix.. 2020-10-04 21:08:00 -06:00
James Betker
a890e3a9c0 Fix geometric loss not handling 0 index 2020-10-04 21:05:01 -06:00
James Betker
c3ef8a4a31 Stacked switches - return a tuple 2020-10-04 21:02:24 -06:00
James Betker
13f97e1e97 Add recursive loss 2020-10-04 20:48:15 -06:00
James Betker
ffd069fd97 Lots of SSG work
- Checkpointed pretty much the entire model - enabling recurrent inputs
- Added two new models for test - adding depth (again) and removing SPSR (in lieu of the new losses)
2020-10-04 20:48:08 -06:00
James Betker
aca2c7ab41 Full checkpoint-ize SSG1 2020-10-04 18:24:52 -06:00
James Betker
e3294939b0 Revert "SSG: offer option to use BN-based attention normalization"
Didn't work. Oh well.

This reverts commit 5cd2b37591.
2020-10-03 17:54:53 -06:00
James Betker
5cd2b37591 SSG: offer option to use BN-based attention normalization
Not sure how this is going to work, lets try it.
2020-10-03 16:16:19 -06:00
James Betker
9b4ed82093 Get rid of unused convs in spsr7 2020-10-03 11:36:26 -06:00
James Betker
3561cc164d Fix up fea_loss calculator (for validation)
Not sure how this was working in regular training mode, but it
was failing in DDP.
2020-10-03 11:19:20 -06:00
James Betker
6c9718ad64 Don't log if you aren't 0 rank 2020-10-03 11:14:13 -06:00
James Betker
922b1d76df Don't record visuals when not on rank 0 2020-10-03 11:10:03 -06:00
James Betker
8197fd646f Don't accumulate losses for metrics when the loss isn't a tensor 2020-10-03 11:03:55 -06:00
James Betker
19a4075e1e Allow checkpointing to be disabled in the options file
Also makes options a global variable for usage in utils.
2020-10-03 11:03:28 -06:00
James Betker
dd9d7b27ac Add more sophisticated mechanism for balancing GAN losses 2020-10-02 22:53:42 -06:00
James Betker
39865ca3df TOTAL_loss, dumbo 2020-10-02 21:06:10 -06:00
James Betker
4e44fcd655 Loss accumulator fix 2020-10-02 20:55:33 -06:00
James Betker
567b4d50a4 ExtensibleTrainer - don't compute backward when there is no loss 2020-10-02 20:54:06 -06:00
James Betker
146a9125f2 Modify geometric & translational losses so they can be used with embeddings 2020-10-02 20:40:13 -06:00
James Betker
e30a1443cd Change sw2 refs 2020-10-02 09:01:18 -06:00