Commit Graph

12 Commits

Author SHA1 Message Date
James Betker
fba29d7dcc Move to apex distributeddataparallel and add switch all_reduce
Torch's distributed_data_parallel is missing "delay_allreduce", which is
necessary to get gradient checkpointing to work with recurrent models.
2020-10-08 11:20:05 -06:00
James Betker
922b1d76df Don't record visuals when not on rank 0 2020-10-03 11:10:03 -06:00
James Betker
d0321ca5de Don't load amp state dict if amp is disabled 2020-09-14 15:21:42 -06:00
James Betker
ec2a795d53 Fix multistep optimizer (feeding from wrong config params) 2020-08-04 16:42:58 -06:00
James Betker
e37726f302 Add feature_model for training custom feature nets 2020-07-31 11:20:39 -06:00
James Betker
61364ec7d0 Fix inverse temperature curve logic and add upsample factor 2020-06-19 09:18:30 -06:00
James Betker
efc80f041c Save & load amp state 2020-06-18 11:38:48 -06:00
James Betker
f1a1fd14b1 Introduce (untested) colab mode 2020-06-01 15:09:52 -06:00
James Betker
b95c4087d1 Allow an alt_path for saving models and states 2020-05-16 09:10:51 -06:00
James Betker
c8ab89d243 Add model swapout
Model swapout is a feature where, at specified intervals,
a random D and G model will be swapped in place for the
one being trained. After a short period of time, this model
is swapped back out. This is intended to increase training
diversity.
2020-05-13 16:53:38 -06:00
James Betker
4f6d3f0dfb Enable AMP optimizations & write sample train images to folder. 2020-04-21 16:28:06 -06:00
XintaoWang
037933ba66 mmsr 2019-08-23 21:42:47 +08:00