James Betker
76789a456f
Class-ify train.py and workon multi-modal trainer
2020-10-22 16:15:31 -06:00
James Betker
15e00e9014
Finish integration with autocast
...
Note: autocast is broken when also using checkpoint(). Overcome this by modifying
torch's checkpoint() function in place to also use autocast.
2020-10-22 14:39:19 -06:00
James Betker
d7ee14f721
Move to torch.cuda.amp (not working)
...
Running into OOM errors, needs diagnosing. Checkpointing here.
2020-10-22 13:58:05 -06:00
James Betker
3e3d2af1f3
Add multi-modal trainer
2020-10-22 13:27:32 -06:00
James Betker
40dc2938e8
Fix multifaceted chain gen
2020-10-22 13:27:06 -06:00
James Betker
f9dc472f63
Misc nonfunctional mods to datasets
2020-10-22 10:16:17 -06:00
James Betker
43c4f92123
Collapse progressive zoom candidates into the batch dimension
...
This contributes a significant speedup to training this type of network
since losses can operate on the entire prediction spectrum at once.
2020-10-21 22:37:23 -06:00
James Betker
680d635420
Enable ExtensibleTrainer to skip steps when state keys are missing
2020-10-21 22:22:28 -06:00
James Betker
d1175f0de1
Add FFT injector
2020-10-21 22:22:00 -06:00
James Betker
1ef559d7ca
Add a ChainedEmbeddingGen which can be simueltaneously used with multiple training paradigms
2020-10-21 22:21:51 -06:00
James Betker
931aa65dd0
Allow recurrent losses to be weighted
2020-10-21 16:59:44 -06:00
James Betker
5753e77d67
ChainedGen: Output debugging information on blocks
2020-10-21 16:36:23 -06:00
James Betker
b54de69153
Misc
2020-10-21 11:08:21 -06:00
James Betker
71c3820d2d
Fix process_video
2020-10-21 11:08:12 -06:00
James Betker
3c6e600e48
Add capacity for models to self-report visuals
2020-10-21 11:08:03 -06:00
James Betker
dca5cddb3b
Add bypass to ChainedEmbeddingGen
2020-10-21 11:07:45 -06:00
James Betker
d8c6a4bbb8
Misc
2020-10-20 12:56:52 -06:00
James Betker
aba83e7497
Don't apply jpeg corruption & noise corruption together
...
This causes some severe noise.
2020-10-20 12:56:35 -06:00
James Betker
111450f4e7
Use areal interpolate for multiscale_dataset
2020-10-19 15:30:25 -06:00
James Betker
a63bf2ea2f
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-10-19 15:26:11 -06:00
James Betker
76e4f0c086
Restore test.py for use as standalone validator
2020-10-19 15:26:07 -06:00
James Betker
1b1ca297f8
Fix recurrent=None bug in ChainedEmbeddingGen
2020-10-19 15:25:12 -06:00
James Betker
331c40f0c8
Allow starting step to be forced
...
Useful for testing purposes or to force a validation.
2020-10-19 15:23:04 -06:00
James Betker
8ca566b621
Revert "Misc"
...
This reverts commit 0e3ea63a14
.
# Conflicts:
# codes/test.py
# codes/train.py
2020-10-19 13:34:54 -06:00
James Betker
b28e4d9cc7
Add spread loss
...
Experimental loss that peaks around 0.
2020-10-19 11:31:19 -06:00
James Betker
9b9a6e5925
Add get_paths() to base_unsupervised_image_dataset
2020-10-19 11:30:06 -06:00
James Betker
981d64413b
Support validation over a custom injector
...
Also re-enable PSNR
2020-10-19 11:01:56 -06:00
James Betker
ffad0e0422
Allow image corruption in multiscale dataset
2020-10-19 10:10:27 -06:00
James Betker
668cafa798
Push correct patch of recurrent embedding to upstream image, rather than whole thing
2020-10-18 22:39:52 -06:00
James Betker
7df378a944
Remove separated vgg discriminator
...
Checkpointing happens inline instead. Was a dumb idea..
Also fixes some loss reporting issues.
2020-10-18 12:10:24 -06:00
James Betker
c709d38cd5
Fix memory leak with recurrent loss
2020-10-18 10:22:10 -06:00
James Betker
552e70a032
Get rid of excessive checkpointed disc params
2020-10-18 10:09:37 -06:00
James Betker
6a0d5f4813
Add a checkpointable discriminator
2020-10-18 09:57:47 -06:00
James Betker
9ead2c0a08
Multiscale training in!
2020-10-17 22:54:12 -06:00
James Betker
e706911c83
Fix spinenet bug
2020-10-17 20:20:36 -06:00
James Betker
b008a27d39
Spinenet should allow bypassing the initial conv
...
This makes feeding in references for recurrence easier.
2020-10-17 20:16:47 -06:00
James Betker
c7f3fc4dd9
Enable chunk_with_reference to work without centers
...
Moving away from this so it doesn't matter too much. Also fixes an issue
with the "ignore" flag.
2020-10-17 20:09:08 -06:00
James Betker
b45e132a9d
Allow first n tiles to be ignored
...
Helps zoom in with chunked dataset
2020-10-17 09:45:03 -06:00
James Betker
c1c9c5681f
Swap recurrence
2020-10-17 08:40:28 -06:00
James Betker
6141aa1110
More recurrence fixes for chainedgen
2020-10-17 08:35:46 -06:00
James Betker
cf8118a85b
Allow recurrence to specified for chainedgen
2020-10-17 08:32:29 -06:00
James Betker
fc4c064867
Add recurrent support to chainedgenwithstructure
2020-10-17 08:31:34 -06:00
James Betker
d4a3e11ab2
Don't use several stages of spinenet_arch
...
These are used for lower outputs which I am not using
2020-10-17 08:28:37 -06:00
James Betker
d1c63ae339
Go back to torch's DDP
...
Apex was having some weird crashing issues.
2020-10-16 20:47:35 -06:00
James Betker
d856378b2e
Add ChainedGenWithStructure
2020-10-16 20:44:36 -06:00
James Betker
96f1be30ed
Add use_generator_as_filter
2020-10-16 20:43:55 -06:00
James Betker
617d97e19d
Add ChainedEmbeddingGen
2020-10-15 23:18:08 -06:00
James Betker
c4543ce124
Set post_transform_block to None where applicable
2020-10-15 17:20:42 -06:00
James Betker
6f8705e8cb
SSGSimpler network
2020-10-15 17:18:44 -06:00
James Betker
1ba01d69b5
Move datasets to INTER_AREA interpolation for downsizing
...
Looks **FAR** better visually
2020-10-15 17:18:23 -06:00
James Betker
d56745b2ec
JPEG-broad adjustment
2020-10-15 10:14:51 -06:00
James Betker
eda75c9779
Cleanup fixes
2020-10-15 10:13:17 -06:00
James Betker
920865defb
Arch work
2020-10-15 10:13:06 -06:00
James Betker
1dc0b05428
Add multiscale dataset
2020-10-15 10:12:50 -06:00
James Betker
0f4e03183f
New image corruptor gradations
2020-10-15 10:12:25 -06:00
James Betker
1f20d59c31
Revert big switch back
2020-10-14 11:03:34 -06:00
James Betker
9815980329
Update SwitchedConv
2020-10-13 20:57:12 -06:00
James Betker
24792bdb4f
Codebase cleanup
...
Removed a lot of legacy stuff I have no intent on using again.
Plan is to shape this repo into something more extensible (get it? hah!)
2020-10-13 20:56:39 -06:00
James Betker
e620fc05ba
Mods to support video processing with teco networks
2020-10-13 20:47:05 -06:00
James Betker
17d78195ee
Mods to SRG to support returning switch logits
2020-10-13 20:46:37 -06:00
James Betker
cc915303a5
Fix SPSR calls into SwitchComputer
2020-10-13 10:14:47 -06:00
James Betker
bdf4c38899
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
...
# Conflicts:
# codes/models/archs/SwitchedResidualGenerator_arch.py
2020-10-13 10:12:26 -06:00
James Betker
9a5d6162e9
Add the "BigSwitch"
2020-10-13 10:11:10 -06:00
James Betker
8014f050ac
Clear metrics properly
...
Holy cow, what a PITA bug.
2020-10-13 10:07:49 -06:00
James Betker
4d52374e60
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-10-12 17:43:51 -06:00
James Betker
731700ab2c
checkpoint in ssg
2020-10-12 17:43:28 -06:00
James Betker
ca523215c6
Fix recurrent std in arch
2020-10-12 17:42:32 -06:00
James Betker
05377973bf
Allow initial recurrent input to be specified (optionally)
2020-10-12 17:36:43 -06:00
James Betker
597b6e92d6
Add ssgr1 recurrence
2020-10-12 17:18:19 -06:00
James Betker
c1a00f31b7
Update switched_conv
2020-10-12 10:37:45 -06:00
James Betker
d7d7590f3e
Fix constant injector - wasn't working in test
2020-10-12 10:36:30 -06:00
James Betker
e7cf337dba
Fix bug with chunk_with_reference
2020-10-12 10:23:03 -06:00
James Betker
ce163ad4a9
Update SSGdeep
2020-10-12 10:22:08 -06:00
James Betker
2bc5701b10
misc
2020-10-12 10:21:25 -06:00
James Betker
3409d88a1c
Add PANet arch
2020-10-12 10:20:55 -06:00
James Betker
7cbf4fa665
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-10-11 08:33:30 -06:00
James Betker
92cb83958a
Return zeros rather than None when image cant be read
2020-10-11 08:33:18 -06:00
James Betker
a9c2e97391
Constant injector and teco fixes
2020-10-11 08:20:07 -06:00
James Betker
e785029936
Mods needed to support SPSR archs with teco gan
2020-10-10 22:39:55 -06:00
James Betker
120072d464
Add constant injector
2020-10-10 21:50:23 -06:00
James Betker
f99812e14d
Fix tecogan_losses errors
2020-10-10 20:30:14 -06:00
James Betker
3a5b23b9f7
Alter teco_losses to feed a recurrent input in as separate
2020-10-10 20:21:09 -06:00
James Betker
0d30d18a3d
Add MarginRemoval injector
2020-10-09 20:35:56 -06:00
James Betker
0011d445c8
Fix loss indexing
2020-10-09 20:20:51 -06:00
James Betker
202eb11fdc
For element loss added
2020-10-09 19:51:44 -06:00
James Betker
61e5047c60
Fix loss accumulator when buffers are not filled
...
They were reporting incorrect losses.
2020-10-09 19:47:59 -06:00
James Betker
fe50d6f9d0
Fix attention images
2020-10-09 19:21:55 -06:00
James Betker
7e777ea34c
Allow tecogan to be used in process_video
2020-10-09 19:21:43 -06:00
James Betker
58d8bf8f69
Add network architecture built for teco
2020-10-09 08:40:14 -06:00
James Betker
b3d0baaf17
Improve multiframe dataset memory usage
2020-10-09 08:40:00 -06:00
James Betker
afe6af88af
Fix attention print issue
2020-10-08 18:34:00 -06:00
James Betker
4c85ee51a4
Converge SSG architectures into unified switching base class
...
Also adds attention norm histogram to logging
2020-10-08 17:23:21 -06:00
James Betker
3cc56cd00b
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-10-08 16:12:05 -06:00
James Betker
7d8d9dafbb
misc
2020-10-08 16:12:00 -06:00
James Betker
856ef4d21d
Update switched_conv
2020-10-08 16:10:23 -06:00
James Betker
1eb516d686
Fix more distributed bugs
2020-10-08 14:32:45 -06:00
James Betker
b36ba0460c
Fix multi-frame dataset OBO error
2020-10-08 12:21:04 -06:00
James Betker
fba29d7dcc
Move to apex distributeddataparallel and add switch all_reduce
...
Torch's distributed_data_parallel is missing "delay_allreduce", which is
necessary to get gradient checkpointing to work with recurrent models.
2020-10-08 11:20:05 -06:00
James Betker
c174ac0fd5
Allow tecogan to support generators that only output a tensor (instead of a list)
2020-10-08 09:26:25 -06:00
James Betker
969bcd9021
Use local checkpoint in SSG
2020-10-08 08:54:46 -06:00
James Betker
c93dd623d7
Tecogan losses work
2020-10-07 23:11:58 -06:00
James Betker
29bf78d791
Update switched_conv submodule
2020-10-07 23:11:50 -06:00
James Betker
c96f5b2686
Import switched_conv as a submodule
2020-10-07 23:10:54 -06:00
James Betker
c352c8bce4
More tecogan fixes
2020-10-07 12:41:17 -06:00
James Betker
a62a5dbb5f
Clone and detach in recursively_detach
2020-10-07 12:41:00 -06:00
James Betker
1c44d395af
Tecogan work
...
Its training! There's still probably plenty of bugs though..
2020-10-07 09:03:30 -06:00
James Betker
e9d7371a61
Add concatenate injector
2020-10-07 09:02:42 -06:00
James Betker
8a7e993aea
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-10-06 20:41:58 -06:00
James Betker
b2c4b2a16d
Move gpu_ids out of if statement
2020-10-06 20:40:20 -06:00
James Betker
1e415b249b
Add tag that can be applied to prevent parameter training
2020-10-06 20:39:49 -06:00
James Betker
2f2e3f33f8
StackedSwitchedGenerator_5lyr
2020-10-06 20:39:32 -06:00
James Betker
6217b48e3f
Fix spsr_arch bug
2020-10-06 20:38:47 -06:00
James Betker
4290918359
Add distributed_checkpoint for more efficient checkpoints
2020-10-06 20:38:38 -06:00
James Betker
cffc596141
Integrate flownet2 into codebase, add teco visual debugs
2020-10-06 20:35:39 -06:00
James Betker
e4b89a172f
Reduce spsr7 memory usage
2020-10-05 22:05:56 -06:00
James Betker
4111942ada
Support attention deferral in deep ssgr
2020-10-05 19:35:55 -06:00
James Betker
840927063a
Work on tecogan losses
2020-10-05 19:35:28 -06:00
James Betker
0e3ea63a14
Misc
2020-10-05 18:01:50 -06:00
James Betker
2875822024
SPSR9 arch
...
takes some of the stuff I learned with SGSR yesterday and applies it to spsr
2020-10-05 08:47:51 -06:00
James Betker
51044929af
Don't compute attention statistics on multiple generator invocations of the same data
2020-10-05 00:34:29 -06:00
James Betker
e760658fdb
Another fix..
2020-10-04 21:08:00 -06:00
James Betker
a890e3a9c0
Fix geometric loss not handling 0 index
2020-10-04 21:05:01 -06:00
James Betker
c3ef8a4a31
Stacked switches - return a tuple
2020-10-04 21:02:24 -06:00
James Betker
13f97e1e97
Add recursive loss
2020-10-04 20:48:15 -06:00
James Betker
ffd069fd97
Lots of SSG work
...
- Checkpointed pretty much the entire model - enabling recurrent inputs
- Added two new models for test - adding depth (again) and removing SPSR (in lieu of the new losses)
2020-10-04 20:48:08 -06:00
James Betker
aca2c7ab41
Full checkpoint-ize SSG1
2020-10-04 18:24:52 -06:00
James Betker
fc396baf1a
Move loaded_options to util
...
Doesn't seem to work with python 3.6
2020-10-03 20:29:06 -06:00
James Betker
2d8e9a9d30
Options fix?
2020-10-03 20:27:12 -06:00
James Betker
e3294939b0
Revert "SSG: offer option to use BN-based attention normalization"
...
Didn't work. Oh well.
This reverts commit 5cd2b37591
.
2020-10-03 17:54:53 -06:00
James Betker
43c6c67fd1
Merge remote-tracking branch 'origin/gan_lab' into gan_lab
2020-10-03 16:17:31 -06:00
James Betker
5cd2b37591
SSG: offer option to use BN-based attention normalization
...
Not sure how this is going to work, lets try it.
2020-10-03 16:16:19 -06:00
James Betker
c896939523
Fix recursive checkpoint
2020-10-03 16:15:52 -06:00
James Betker
3cbb9ecd45
Misc
2020-10-03 16:15:42 -06:00
James Betker
35731502c3
Fix checkpoint recursion
2020-10-03 12:52:50 -06:00
James Betker
9b4ed82093
Get rid of unused convs in spsr7
2020-10-03 11:36:26 -06:00
James Betker
b2b81b13a4
Remove recursive utils import
2020-10-03 11:30:05 -06:00
James Betker
3561cc164d
Fix up fea_loss calculator (for validation)
...
Not sure how this was working in regular training mode, but it
was failing in DDP.
2020-10-03 11:19:20 -06:00
James Betker
21d3bb83b2
Use tqdm reporting with validation
2020-10-03 11:16:39 -06:00
James Betker
6c9718ad64
Don't log if you aren't 0 rank
2020-10-03 11:14:13 -06:00
James Betker
922b1d76df
Don't record visuals when not on rank 0
2020-10-03 11:10:03 -06:00
James Betker
8197fd646f
Don't accumulate losses for metrics when the loss isn't a tensor
2020-10-03 11:03:55 -06:00
James Betker
19a4075e1e
Allow checkpointing to be disabled in the options file
...
Also makes options a global variable for usage in utils.
2020-10-03 11:03:28 -06:00
James Betker
dd9d7b27ac
Add more sophisticated mechanism for balancing GAN losses
2020-10-02 22:53:42 -06:00
James Betker
39865ca3df
TOTAL_loss, dumbo
2020-10-02 21:06:10 -06:00
James Betker
4e44fcd655
Loss accumulator fix
2020-10-02 20:55:33 -06:00
James Betker
567b4d50a4
ExtensibleTrainer - don't compute backward when there is no loss
2020-10-02 20:54:06 -06:00
James Betker
146a9125f2
Modify geometric & translational losses so they can be used with embeddings
2020-10-02 20:40:13 -06:00
James Betker
e30a1443cd
Change sw2 refs
2020-10-02 09:01:18 -06:00
James Betker
e38716925f
Fix spsr8 class init
2020-10-02 09:00:18 -06:00
James Betker
efbf6b737b
Update validate_data to work with SingleImageDataset
2020-10-02 08:58:34 -06:00
James Betker
35469f08e2
Spsr 8
2020-10-02 08:58:15 -06:00
James Betker
c9a9e5c525
Prompt user for gpu_id if multiple gpus are detected
2020-10-01 17:24:50 -06:00
James Betker
aa4fd89018
resnext with groupnorm
2020-10-01 15:49:28 -06:00
James Betker
8beaa47933
resnext discriminator
2020-10-01 11:48:14 -06:00
James Betker
55f2764fef
Allow fixup50 to be used as a discriminator
2020-10-01 11:28:18 -06:00
James Betker
7986185fcb
Change 'mod_step' to 'every'
2020-10-01 11:28:06 -06:00
James Betker
d9ae970fd9
SSG update
2020-10-01 11:27:51 -06:00
James Betker
e3053e4e55
Exchange SpsrNet for SpsrNetSimplified
2020-09-30 17:01:04 -06:00
James Betker
66d4512029
Fix up translational equivariance loss so it's ready for prime time
2020-09-30 12:01:00 -06:00
James Betker
896b4f5be2
Revert "spsr7 adjustments"
...
This reverts commit 9fee1cec71
.
2020-09-29 18:30:41 -06:00
James Betker
9fee1cec71
spsr7 adjustments
2020-09-29 17:19:59 -06:00
James Betker
dc8f3b24de
Don't let duplicate keys be used for injectors and losses
2020-09-29 16:59:44 -06:00
James Betker
0b5a033503
spsr7 + cleanup
...
SPSR7 adds ref onto spsr6, makes more "common sense" mods.
2020-09-29 16:59:26 -06:00
James Betker
f9b83176f1
Fix bugs in extensibletrainer
2020-09-28 22:09:42 -06:00
James Betker
db52bec4ab
spsr6
...
This is meant to be a variant of SPSR5 that harkens
back to the simpler earlier architectures that do not
have embeddings or ref_ inputs, but do have deep
multiplexers. It does, however, use some of the new
conjoin mechanisms.
2020-09-28 22:09:27 -06:00
James Betker
7e240f2fed
Recurrent / teco work
2020-09-28 22:06:56 -06:00
James Betker
57814f18cf
More features for multi-frame-dataset
2020-09-28 14:26:15 -06:00
James Betker
aeaf185314
Add RCAN
2020-09-27 16:00:41 -06:00
James Betker
4d29b7729e
Model arch cleanup
2020-09-27 11:18:45 -06:00
James Betker
7dff802144
Add MultiFrameDataset
...
Retrieves video sequence patches rather than single images.
2020-09-27 11:13:06 -06:00
James Betker
d8c3fc9327
Fix random noise corruptor
...
It was functioning as a color shift
2020-09-27 11:12:24 -06:00
James Betker
c85da79697
Move many dataset functions into a base class
2020-09-27 11:11:58 -06:00
James Betker
eb12b5f887
Misc
2020-09-26 21:27:17 -06:00
James Betker
31641d7f63
Add ImagePatchInjector and TranslationalLoss
2020-09-26 21:25:32 -06:00
James Betker
d8621e611a
BackboneSpineNoHead takes ref
2020-09-26 21:25:04 -06:00
James Betker
5a27187c59
More mods to accomodate new dataset
2020-09-25 22:45:57 -06:00
James Betker
254cb1e915
More dataset integration work
2020-09-25 22:19:38 -06:00
James Betker
6d0490a0e6
Tecogan implementation work
2020-09-25 16:38:23 -06:00
James Betker
ce4613ecb9
Finish up single_image_dataset work
...
Sweet!
2020-09-25 16:37:54 -06:00
James Betker
1cf73c2cce
Fix dataset for a val set that includes lq
2020-09-24 18:01:07 -06:00
James Betker
ea565b7eaf
More fixes
2020-09-24 17:51:52 -06:00
James Betker
553917a8d1
Fix torchvision import bug
2020-09-24 17:38:34 -06:00
James Betker
58886109d4
Update how spsr arches do attention to conform with sgsr
2020-09-24 16:53:54 -06:00
James Betker
9a50a7966d
SiLU doesnt support inplace
2020-09-23 21:09:13 -06:00
James Betker
eda0eadba2
Use custom SiLU
...
Torch didnt have this before 1.7
2020-09-23 21:05:06 -06:00
James Betker
05963157c1
Several things
...
- Fixes to 'after' and 'before' defs for steps (turns out they werent working)
- Feature nets take in a list of layers to extract. Not fully implemented yet.
- Fixes bugs with RAGAN
- Allows real input into generator gan to not be detached by param
2020-09-23 11:56:36 -06:00
James Betker
4ab989e015
try again..
2020-09-22 18:27:52 -06:00
James Betker
3b6c957194
Fix? it again?
2020-09-22 18:25:59 -06:00
James Betker
7b60d9e0d8
Fix? cosine loss
2020-09-22 18:18:35 -06:00
James Betker
2e18c4c22d
Add CosineEmbeddingLoss to F
2020-09-22 17:10:29 -06:00
James Betker
f40beb5460
Add 'before' and 'after' defs to injections, steps and optimizers
2020-09-22 17:03:22 -06:00
James Betker
419f77ec19
Some new backbones
2020-09-21 12:36:49 -06:00
James Betker
9429544a60
Spinenet: implementation without 4x downsampling right off the bat
2020-09-21 12:36:30 -06:00
James Betker
384e3d54cc
Extract images into jpg, have a multiplier & size threshold
2020-09-21 12:36:03 -06:00
James Betker
bde35ced47
Fix recursive detach
2020-09-20 19:08:13 -06:00
James Betker
53a5657850
Fix SSGR
2020-09-20 19:07:15 -06:00
James Betker
17c569ea62
Add geometric loss
2020-09-20 16:24:23 -06:00
James Betker
17dd99b29b
Fix bug with discriminator noise addition
...
It wasn't using the scale and was applying the noise to the
underlying state variable.
2020-09-20 12:00:27 -06:00
James Betker
dab8ab8a8f
Offer option to configure the size of the normal distribution that the target size is drawn from
2020-09-20 11:59:31 -06:00
James Betker
3138f98fbc
Allow discriminator noise to be injected at the loss level, cleans up configs
2020-09-19 21:47:52 -06:00