Commit Graph

301 Commits

Author SHA1 Message Date
James Betker
00bb568956 further checkpointify spsr_arch 2020-10-27 17:54:28 -06:00
James Betker
d923a62ed3 Allow SPSR to checkpoint 2020-10-27 15:23:20 -06:00
James Betker
11a9e223a6 Retrofit SPSR_arch so it is capable of accepting a ref 2020-10-27 11:14:36 -06:00
James Betker
8202ee72b9 Re-add original SPSR_arch 2020-10-27 11:00:38 -06:00
James Betker
231137ab0a Revert RRDB back to original model 2020-10-27 10:25:31 -06:00
James Betker
629b968901 ChainedGen 4x alteration
Increases conv window for teco_recurrent in the 4x case so all data
can be used.

base_model changes should be temporary.
2020-10-26 10:54:51 -06:00
James Betker
1dbcbfbac8 Restore ChainedEmbeddingGenWithStructure
Still using this guy, after all
2020-10-24 11:54:52 -06:00
James Betker
7a75d10784 Arch cleanup 2020-10-23 09:35:33 -06:00
James Betker
646d6a621a Support 4x zoom on ChainedEmbeddingGen 2020-10-23 09:25:58 -06:00
James Betker
d7ee14f721 Move to torch.cuda.amp (not working)
Running into OOM errors, needs diagnosing. Checkpointing here.
2020-10-22 13:58:05 -06:00
James Betker
40dc2938e8 Fix multifaceted chain gen 2020-10-22 13:27:06 -06:00
James Betker
1ef559d7ca Add a ChainedEmbeddingGen which can be simueltaneously used with multiple training paradigms 2020-10-21 22:21:51 -06:00
James Betker
5753e77d67 ChainedGen: Output debugging information on blocks 2020-10-21 16:36:23 -06:00
James Betker
dca5cddb3b Add bypass to ChainedEmbeddingGen 2020-10-21 11:07:45 -06:00
James Betker
a63bf2ea2f Merge remote-tracking branch 'origin/gan_lab' into gan_lab 2020-10-19 15:26:11 -06:00
James Betker
76e4f0c086 Restore test.py for use as standalone validator 2020-10-19 15:26:07 -06:00
James Betker
1b1ca297f8 Fix recurrent=None bug in ChainedEmbeddingGen 2020-10-19 15:25:12 -06:00
James Betker
7df378a944 Remove separated vgg discriminator
Checkpointing happens inline instead. Was a dumb idea..

Also fixes some loss reporting issues.
2020-10-18 12:10:24 -06:00
James Betker
552e70a032 Get rid of excessive checkpointed disc params 2020-10-18 10:09:37 -06:00
James Betker
6a0d5f4813 Add a checkpointable discriminator 2020-10-18 09:57:47 -06:00
James Betker
9ead2c0a08 Multiscale training in! 2020-10-17 22:54:12 -06:00
James Betker
e706911c83 Fix spinenet bug 2020-10-17 20:20:36 -06:00
James Betker
b008a27d39 Spinenet should allow bypassing the initial conv
This makes feeding in references for recurrence easier.
2020-10-17 20:16:47 -06:00
James Betker
c1c9c5681f Swap recurrence 2020-10-17 08:40:28 -06:00
James Betker
6141aa1110 More recurrence fixes for chainedgen 2020-10-17 08:35:46 -06:00
James Betker
fc4c064867 Add recurrent support to chainedgenwithstructure 2020-10-17 08:31:34 -06:00
James Betker
d4a3e11ab2 Don't use several stages of spinenet_arch
These are used for lower outputs which I am not using
2020-10-17 08:28:37 -06:00
James Betker
d856378b2e Add ChainedGenWithStructure 2020-10-16 20:44:36 -06:00
James Betker
617d97e19d Add ChainedEmbeddingGen 2020-10-15 23:18:08 -06:00
James Betker
c4543ce124 Set post_transform_block to None where applicable 2020-10-15 17:20:42 -06:00
James Betker
6f8705e8cb SSGSimpler network 2020-10-15 17:18:44 -06:00
James Betker
920865defb Arch work 2020-10-15 10:13:06 -06:00
James Betker
1f20d59c31 Revert big switch back 2020-10-14 11:03:34 -06:00
James Betker
17d78195ee Mods to SRG to support returning switch logits 2020-10-13 20:46:37 -06:00
James Betker
cc915303a5 Fix SPSR calls into SwitchComputer 2020-10-13 10:14:47 -06:00
James Betker
9a5d6162e9 Add the "BigSwitch" 2020-10-13 10:11:10 -06:00
James Betker
ca523215c6 Fix recurrent std in arch 2020-10-12 17:42:32 -06:00
James Betker
597b6e92d6 Add ssgr1 recurrence 2020-10-12 17:18:19 -06:00
James Betker
ce163ad4a9 Update SSGdeep 2020-10-12 10:22:08 -06:00
James Betker
3409d88a1c Add PANet arch 2020-10-12 10:20:55 -06:00
James Betker
e785029936 Mods needed to support SPSR archs with teco gan 2020-10-10 22:39:55 -06:00
James Betker
fe50d6f9d0 Fix attention images 2020-10-09 19:21:55 -06:00
James Betker
58d8bf8f69 Add network architecture built for teco 2020-10-09 08:40:14 -06:00
James Betker
afe6af88af Fix attention print issue 2020-10-08 18:34:00 -06:00
James Betker
4c85ee51a4 Converge SSG architectures into unified switching base class
Also adds attention norm histogram to logging
2020-10-08 17:23:21 -06:00
James Betker
fba29d7dcc Move to apex distributeddataparallel and add switch all_reduce
Torch's distributed_data_parallel is missing "delay_allreduce", which is
necessary to get gradient checkpointing to work with recurrent models.
2020-10-08 11:20:05 -06:00
James Betker
969bcd9021 Use local checkpoint in SSG 2020-10-08 08:54:46 -06:00
James Betker
c96f5b2686 Import switched_conv as a submodule 2020-10-07 23:10:54 -06:00
James Betker
c352c8bce4 More tecogan fixes 2020-10-07 12:41:17 -06:00
James Betker
2f2e3f33f8 StackedSwitchedGenerator_5lyr 2020-10-06 20:39:32 -06:00