James Betker
4d8064c32c
Modifications to allow partially trained stylegan discriminators to be used
2021-01-03 16:37:18 -07:00
James Betker
ce6524184c
Do the last commit but in a better way
2021-01-02 22:24:12 -07:00
James Betker
edf9c38198
Make ExtensibleTrainer set the starting step for the LR scheduler
2021-01-02 22:22:34 -07:00
James Betker
bdbab65082
Allow optimizers to train separate param groups, add higher dimensional VGG discriminator
...
Did this to support training 512x512px networks off of a pretrained 256x256 network.
2021-01-02 15:10:06 -07:00
James Betker
193cdc6636
Move discriminators to the create_model paradigm
...
Also cleans up a lot of old discriminator models that I have no intention
of using again.
2021-01-01 15:56:09 -07:00
James Betker
9864fe4c04
Fix for train.py
2021-01-01 11:59:00 -07:00
James Betker
0eb1f4dd67
Revert "Get rid of CUDA_VISIBLE_DEVICES"
...
It is actually necessary for training in distributed mode. Only
do it then.
2020-12-31 10:31:40 -07:00
James Betker
1de1fa30ac
Disable refs and centers altogether in single_image_dataset
...
I suspect that this might be a cause of failures on parallel datasets.
Plus it is unnecessary computation.
2020-12-31 10:13:24 -07:00
James Betker
8f0984cacf
Add sr_fid evaluator
2020-12-30 20:18:58 -07:00
James Betker
a777c1e4f9
Misc script fixes
2020-12-29 20:25:09 -07:00
James Betker
3fd627fc62
Mods to support image classification & filtering
2020-12-26 13:49:27 -07:00
James Betker
29db7c7a02
Further mods to BYOL
2020-12-24 09:28:41 -07:00
James Betker
1bbcb96ee8
Implement a few changes to support training BYOL networks
2020-12-23 10:50:23 -07:00
James Betker
2437b33e74
Fix srflow_latent_space_playground bug
2020-12-22 15:42:38 -07:00
James Betker
e82f4552db
Update other docs with dumb config options
2020-12-18 16:21:28 -07:00
James Betker
5640e4efe4
More refactoring
2020-12-18 09:18:34 -07:00
James Betker
a8179ff53c
Image label work
2020-12-18 08:53:18 -07:00
James Betker
fb2cfc795b
Update requirements, add image_patch_classifier tool
2020-12-16 09:42:50 -07:00
James Betker
e5a3e6b9b5
srflow latent space misc
2020-12-14 23:59:49 -07:00
James Betker
ec0ee25f4b
Structural latents checkpoint
2020-12-11 12:01:09 -07:00
James Betker
a5630d282f
Get rid of 2nd trainer
2020-12-10 09:57:38 -07:00
James Betker
11155aead4
Directly use dataset keys
...
This has been a long time coming. Cleans up messy "GT" nomenclature and simplifies ExtensibleTraner.feed_data
2020-12-04 20:14:53 -07:00
James Betker
8a83b1c716
Go back to apex DDP, fix distributed bugs
2020-12-04 16:39:21 -07:00
James Betker
8a00f15746
Implement FlowGaussianNll evaluator
2020-12-02 14:09:54 -07:00
James Betker
2e0bbda640
Remove unused archs
2020-12-01 11:10:48 -07:00
James Betker
da604752e6
Misc RRDB changes
2020-11-29 12:21:31 -07:00
James Betker
a1d4c9f83c
multires rrdb work
2020-11-28 14:35:46 -07:00
James Betker
ef8d5f88c1
Bring split gaussian nll out of split so it can be computed accurately with the rest of the nll component
2020-11-27 13:30:21 -07:00
James Betker
fd356580c0
Play with lambdas
2020-11-26 20:30:55 -07:00
James Betker
45a489110f
Fix datasets
2020-11-26 11:50:38 -07:00
James Betker
f6098155cd
Mods to tecogan to allow use of embeddings as input
2020-11-24 09:24:02 -07:00
James Betker
b10bcf6436
Rework stylegan_for_sr to incorporate structure as an adain block
2020-11-23 11:31:11 -07:00
James Betker
5ccdbcefe3
srflow_orig integration
2020-11-19 23:47:24 -07:00
James Betker
d7877d0a36
Fixes to teco losses and translational losses
2020-11-19 11:35:05 -07:00
James Betker
6b679e2b51
Make grad_penalty available to classical discs
2020-11-17 18:31:40 -07:00
James Betker
8a19c9ae15
Add additive mode to rrdb
2020-11-16 20:45:09 -07:00
James Betker
125cb16dce
Add a FID evaluator for stylegan with structural guidance
2020-11-14 20:16:07 -07:00
James Betker
ec621c69b5
Fix train bug
2020-11-14 09:29:08 -07:00
James Betker
a07e1a7292
Add separate Evaluator module and FID evaluator
2020-11-13 11:03:54 -07:00
James Betker
fc55bdb24e
Mods to how wandb are integrated
2020-11-12 15:45:25 -07:00
James Betker
db9e9e28a0
Fix an issue where GPU0 was always being used in non-ddp
...
Frankly, I don't understand how this has ever worked. WTF.
2020-11-12 15:43:01 -07:00
James Betker
88f349bdf1
Enable usage of wandb
2020-11-11 21:48:56 -07:00
James Betker
6a2fd5f7d0
Lots of new discriminator nets
2020-11-10 16:06:54 -07:00
James Betker
0cf52ef52c
latent work
2020-11-06 20:38:23 -07:00
James Betker
74738489b9
Fixes and additional support for progressive zoom
2020-10-30 09:59:54 -06:00
James Betker
607ff3c67c
RRDB with bypass
2020-10-29 09:39:45 -06:00
James Betker
da53090ce6
More adjustments to support distributed training with teco & on multi_modal_train
2020-10-27 20:58:03 -06:00
James Betker
2a3eec8fd7
Fix some distributed training snafus
2020-10-27 15:24:05 -06:00
James Betker
ff58c6484a
Fixes to unified chunk datasets to support stereoscopic training
2020-10-26 11:12:22 -06:00
James Betker
9c3d059ef0
Updates to be able to train flownet2 in ExtensibleTrainer
...
Only supports basic losses for now, though.
2020-10-24 11:56:39 -06:00
James Betker
8636492db0
Copy train.py mods to train2
2020-10-22 17:16:36 -06:00
James Betker
e9c0b9f0fd
More adjustments to support multi-modal training
...
Specifically - looks like at least MSE loss cannot handle autocasted tensors
2020-10-22 16:49:34 -06:00
James Betker
76789a456f
Class-ify train.py and workon multi-modal trainer
2020-10-22 16:15:31 -06:00
James Betker
3e3d2af1f3
Add multi-modal trainer
2020-10-22 13:27:32 -06:00
James Betker
5753e77d67
ChainedGen: Output debugging information on blocks
2020-10-21 16:36:23 -06:00
James Betker
d8c6a4bbb8
Misc
2020-10-20 12:56:52 -06:00
James Betker
331c40f0c8
Allow starting step to be forced
...
Useful for testing purposes or to force a validation.
2020-10-19 15:23:04 -06:00
James Betker
981d64413b
Support validation over a custom injector
...
Also re-enable PSNR
2020-10-19 11:01:56 -06:00
James Betker
9ead2c0a08
Multiscale training in!
2020-10-17 22:54:12 -06:00
James Betker
d856378b2e
Add ChainedGenWithStructure
2020-10-16 20:44:36 -06:00
James Betker
6f8705e8cb
SSGSimpler network
2020-10-15 17:18:44 -06:00
James Betker
24792bdb4f
Codebase cleanup
...
Removed a lot of legacy stuff I have no intent on using again.
Plan is to shape this repo into something more extensible (get it? hah!)
2020-10-13 20:56:39 -06:00
James Betker
17d78195ee
Mods to SRG to support returning switch logits
2020-10-13 20:46:37 -06:00
James Betker
2bc5701b10
misc
2020-10-12 10:21:25 -06:00
James Betker
b2c4b2a16d
Move gpu_ids out of if statement
2020-10-06 20:40:20 -06:00
James Betker
0e3ea63a14
Misc
2020-10-05 18:01:50 -06:00
James Betker
ffd069fd97
Lots of SSG work
...
- Checkpointed pretty much the entire model - enabling recurrent inputs
- Added two new models for test - adding depth (again) and removing SPSR (in lieu of the new losses)
2020-10-04 20:48:08 -06:00
James Betker
fc396baf1a
Move loaded_options to util
...
Doesn't seem to work with python 3.6
2020-10-03 20:29:06 -06:00
James Betker
3cbb9ecd45
Misc
2020-10-03 16:15:42 -06:00
James Betker
21d3bb83b2
Use tqdm reporting with validation
2020-10-03 11:16:39 -06:00
James Betker
6c9718ad64
Don't log if you aren't 0 rank
2020-10-03 11:14:13 -06:00
James Betker
19a4075e1e
Allow checkpointing to be disabled in the options file
...
Also makes options a global variable for usage in utils.
2020-10-03 11:03:28 -06:00
James Betker
c9a9e5c525
Prompt user for gpu_id if multiple gpus are detected
2020-10-01 17:24:50 -06:00
James Betker
0b5a033503
spsr7 + cleanup
...
SPSR7 adds ref onto spsr6, makes more "common sense" mods.
2020-09-29 16:59:26 -06:00
James Betker
eb12b5f887
Misc
2020-09-26 21:27:17 -06:00
James Betker
254cb1e915
More dataset integration work
2020-09-25 22:19:38 -06:00
James Betker
f211575e9d
Save models before validation
...
Validation often fails with OOM, wasting hours of training time.
Save models first.
2020-09-16 08:17:17 -06:00
James Betker
c833bd1eac
Misc changes
2020-09-15 20:57:59 -06:00
James Betker
747ded2bf7
Fixes to the spsr3
...
Some lessons learned:
- Biases are fairly important as a relief valve. They dont need to be everywhere, but
most computationally heavy branches should have a bias.
- GroupNorm in SPSR is not a great idea. Since image gradients are represented
in this model, normal means and standard deviations are not applicable. (imggrad
has a high representation of 0).
- Don't fuck with the mainline of any generative model. As much as possible, all
additions should be done through residual connections. Never pollute the mainline
with reference data, do that in branches. It basically leaves the mode untrainable.
2020-09-09 15:28:14 -06:00
James Betker
c04f244802
More mods
2020-09-08 20:36:27 -06:00
James Betker
e6207d4c50
SPSR3 work
...
SPSR3 is meant to fix whatever is causing the switching units
inside of the newer SPSR architectures to fail and basically
not use the multiplexers.
2020-09-08 15:14:23 -06:00
James Betker
a18ece62ee
Add updated spsr net for test
2020-09-07 17:01:48 -06:00
James Betker
e8613041c0
Add novograd optimizer
2020-09-06 17:27:08 -06:00
James Betker
6657a406ac
Mods needed to support training a corruptor again:
...
- Allow original SPSRNet to have a specifiable block increment
- Cleanup
- Bug fixes in code that hasnt been touched in awhile.
2020-09-04 15:33:39 -06:00
James Betker
886d59d5df
Misc fixes & adjustments
2020-09-01 07:58:11 -06:00
James Betker
0a9b85f239
Fix vgg_gn input_img_factor
2020-08-31 09:50:30 -06:00
James Betker
4b4d08bdec
Enable testing in ExtensibleTrainer, fix it in SRGAN_model
...
Also compute fea loss for this.
2020-08-31 09:41:48 -06:00
James Betker
623f3b99b2
Stupid pathing..
2020-08-26 17:58:24 -06:00
James Betker
80aa83bfd2
Try copytree for tb_logger again.
2020-08-26 17:55:02 -06:00
James Betker
b593d8e7c3
Save tb_logger to alt_path
2020-08-26 17:45:07 -06:00
James Betker
f35b3ad28f
Fix val behavior for ExtensibleTrainer
2020-08-26 08:44:22 -06:00
James Betker
19487d9bbd
Fix distributed launch for large distributed runs
2020-08-25 15:42:59 -06:00
James Betker
a65b07607c
Reference network
2020-08-25 11:56:59 -06:00
James Betker
f9276007a8
More fixes to corrupt_fea
2020-08-23 17:52:18 -06:00
James Betker
dffc15184d
More ExtensibleTrainer work
...
It runs now, just need to debug it to reach performance parity with SRGAN. Sweet.
2020-08-23 17:22:45 -06:00
James Betker
afdd93fbe9
Grey feature
2020-08-22 13:41:38 -06:00
James Betker
40bb0597bb
misc
2020-08-18 08:50:24 -06:00
James Betker
0c98c61f4a
Enable start_step to be specified
2020-08-15 18:34:59 -06:00
James Betker
2d205f52ac
Unite spsr_arch switched gens
...
Found a pretty good basis model.
2020-08-12 17:04:45 -06:00
James Betker
bdaa67deb7
Misc
2020-08-12 08:46:15 -06:00