Commit Graph

39 Commits

Author SHA1 Message Date
James Betker
ea56eb61f0 Fix DDP errors for discriminator
- Don't define training_net in define_optimizers - this drops the shell and leads to problems downstream
- Get rid of support for multiple training nets per opt. This was half baked and needs a better solution if needed
   downstream.
2020-12-07 12:50:57 -07:00
James Betker
f6098155cd Mods to tecogan to allow use of embeddings as input 2020-11-24 09:24:02 -07:00
James Betker
5ccdbcefe3 srflow_orig integration 2020-11-19 23:47:24 -07:00
James Betker
2d3449d7a5 stylegan2 in ml art school! 2020-11-12 15:42:05 -07:00
James Betker
fd6cdba88f RRDB with latent 2020-11-05 10:04:17 -07:00
James Betker
df47d6cbbb More work in support of training flow networks in tandem with generators 2020-11-04 18:07:48 -07:00
James Betker
658a267bab More work on SSIM/PSNR approximators
- Add a network that accomodates this style of approximator while retaining structure
- Migrate to SSIM approximation
- Add a tool to visualize how these approximators are working
- Fix some issues that came up while doign this work
2020-11-03 08:09:58 -07:00
James Betker
515905e904 Add a min_loss that is DDP compatible 2020-10-28 15:46:59 -06:00
James Betker
da53090ce6 More adjustments to support distributed training with teco & on multi_modal_train 2020-10-27 20:58:03 -06:00
James Betker
2a3eec8fd7 Fix some distributed training snafus 2020-10-27 15:24:05 -06:00
James Betker
15e00e9014 Finish integration with autocast
Note: autocast is broken when also using checkpoint(). Overcome this by modifying
torch's checkpoint() function in place to also use autocast.
2020-10-22 14:39:19 -06:00
James Betker
d7ee14f721 Move to torch.cuda.amp (not working)
Running into OOM errors, needs diagnosing. Checkpointing here.
2020-10-22 13:58:05 -06:00
James Betker
24792bdb4f Codebase cleanup
Removed a lot of legacy stuff I have no intent on using again.
Plan is to shape this repo into something more extensible (get it? hah!)
2020-10-13 20:56:39 -06:00
James Betker
8014f050ac Clear metrics properly
Holy cow, what a PITA bug.
2020-10-13 10:07:49 -06:00
James Betker
8197fd646f Don't accumulate losses for metrics when the loss isn't a tensor 2020-10-03 11:03:55 -06:00
James Betker
39865ca3df TOTAL_loss, dumbo 2020-10-02 21:06:10 -06:00
James Betker
4e44fcd655 Loss accumulator fix 2020-10-02 20:55:33 -06:00
James Betker
567b4d50a4 ExtensibleTrainer - don't compute backward when there is no loss 2020-10-02 20:54:06 -06:00
James Betker
dc8f3b24de Don't let duplicate keys be used for injectors and losses 2020-09-29 16:59:44 -06:00
James Betker
f9b83176f1 Fix bugs in extensibletrainer 2020-09-28 22:09:42 -06:00
James Betker
31641d7f63 Add ImagePatchInjector and TranslationalLoss 2020-09-26 21:25:32 -06:00
James Betker
6d0490a0e6 Tecogan implementation work 2020-09-25 16:38:23 -06:00
James Betker
f40beb5460 Add 'before' and 'after' defs to injections, steps and optimizers 2020-09-22 17:03:22 -06:00
James Betker
e9a39bfa14 Recursively detach all outputs, even if they are nested in data structures 2020-09-19 21:47:34 -06:00
James Betker
9a17ade550 Some convenience adjustments to ExtensibleTrainer 2020-09-17 21:05:32 -06:00
James Betker
5b85f891af Only log the name of the first network in the total_loss training set 2020-09-12 16:07:09 -06:00
James Betker
fb595e72a4 Supporting infrastructure in ExtensibleTrainer to train spsr4
Need to be able to train 2 nets in one step: the backbone will be entirely separate
with its own optimizer (for an extremely low LR).

This functionality was already present, just not implemented correctly.
2020-09-11 22:57:06 -06:00
James Betker
5189b11dac Add combined dataset for training across multiple datasets 2020-09-11 08:44:06 -06:00
James Betker
3027e6e27d Enable amp to be disabled 2020-09-09 10:45:59 -06:00
James Betker
c04f244802 More mods 2020-09-08 20:36:27 -06:00
James Betker
e8613041c0 Add novograd optimizer 2020-09-06 17:27:08 -06:00
James Betker
21ae135f23 Allow Novograd to be used as an optimizer 2020-09-05 16:50:13 -06:00
James Betker
0dfd8eaf3b Support injectors that run in eval only 2020-09-05 07:59:45 -06:00
James Betker
4b4d08bdec Enable testing in ExtensibleTrainer, fix it in SRGAN_model
Also compute fea loss for this.
2020-08-31 09:41:48 -06:00
James Betker
dffc15184d More ExtensibleTrainer work
It runs now, just need to debug it to reach performance parity with SRGAN. Sweet.
2020-08-23 17:22:45 -06:00
James Betker
e59e712e39 More ExtensibleTrainer work 2020-08-22 13:08:33 -06:00
James Betker
f40545f235 ExtensibleTrainer work 2020-08-22 08:24:34 -06:00
James Betker
74cdaa2226 Some work on extensible trainer 2020-08-18 08:49:32 -06:00
James Betker
ab04ca1778 Extensible trainer (in progress) 2020-08-12 08:45:23 -06:00