James Betker
fba29d7dcc
Move to apex distributeddataparallel and add switch all_reduce
...
Torch's distributed_data_parallel is missing "delay_allreduce", which is
necessary to get gradient checkpointing to work with recurrent models.
2020-10-08 11:20:05 -06:00
James Betker
922b1d76df
Don't record visuals when not on rank 0
2020-10-03 11:10:03 -06:00
James Betker
d0321ca5de
Don't load amp state dict if amp is disabled
2020-09-14 15:21:42 -06:00
James Betker
ec2a795d53
Fix multistep optimizer (feeding from wrong config params)
2020-08-04 16:42:58 -06:00
James Betker
e37726f302
Add feature_model for training custom feature nets
2020-07-31 11:20:39 -06:00
James Betker
61364ec7d0
Fix inverse temperature curve logic and add upsample factor
2020-06-19 09:18:30 -06:00
James Betker
efc80f041c
Save & load amp state
2020-06-18 11:38:48 -06:00
James Betker
f1a1fd14b1
Introduce (untested) colab mode
2020-06-01 15:09:52 -06:00
James Betker
b95c4087d1
Allow an alt_path for saving models and states
2020-05-16 09:10:51 -06:00
James Betker
c8ab89d243
Add model swapout
...
Model swapout is a feature where, at specified intervals,
a random D and G model will be swapped in place for the
one being trained. After a short period of time, this model
is swapped back out. This is intended to increase training
diversity.
2020-05-13 16:53:38 -06:00
James Betker
4f6d3f0dfb
Enable AMP optimizations & write sample train images to folder.
2020-04-21 16:28:06 -06:00
XintaoWang
037933ba66
mmsr
2019-08-23 21:42:47 +08:00