Commit Graph

20 Commits

Author SHA1 Message Date
James Betker
8a83b1c716 Go back to apex DDP, fix distributed bugs 2020-12-04 16:39:21 -07:00
James Betker
5ccdbcefe3 srflow_orig integration 2020-11-19 23:47:24 -07:00
James Betker
f133243ac8 Extra logging for teco_resgen 2020-10-28 15:21:22 -06:00
James Betker
231137ab0a Revert RRDB back to original model 2020-10-27 10:25:31 -06:00
James Betker
1ce863849a Remove temporary base_model change 2020-10-26 11:13:01 -06:00
James Betker
629b968901 ChainedGen 4x alteration
Increases conv window for teco_recurrent in the 4x case so all data
can be used.

base_model changes should be temporary.
2020-10-26 10:54:51 -06:00
James Betker
9c3d059ef0 Updates to be able to train flownet2 in ExtensibleTrainer
Only supports basic losses for now, though.
2020-10-24 11:56:39 -06:00
James Betker
d1c63ae339 Go back to torch's DDP
Apex was having some weird crashing issues.
2020-10-16 20:47:35 -06:00
James Betker
fba29d7dcc Move to apex distributeddataparallel and add switch all_reduce
Torch's distributed_data_parallel is missing "delay_allreduce", which is
necessary to get gradient checkpointing to work with recurrent models.
2020-10-08 11:20:05 -06:00
James Betker
922b1d76df Don't record visuals when not on rank 0 2020-10-03 11:10:03 -06:00
James Betker
d0321ca5de Don't load amp state dict if amp is disabled 2020-09-14 15:21:42 -06:00
James Betker
ec2a795d53 Fix multistep optimizer (feeding from wrong config params) 2020-08-04 16:42:58 -06:00
James Betker
e37726f302 Add feature_model for training custom feature nets 2020-07-31 11:20:39 -06:00
James Betker
61364ec7d0 Fix inverse temperature curve logic and add upsample factor 2020-06-19 09:18:30 -06:00
James Betker
efc80f041c Save & load amp state 2020-06-18 11:38:48 -06:00
James Betker
f1a1fd14b1 Introduce (untested) colab mode 2020-06-01 15:09:52 -06:00
James Betker
b95c4087d1 Allow an alt_path for saving models and states 2020-05-16 09:10:51 -06:00
James Betker
c8ab89d243 Add model swapout
Model swapout is a feature where, at specified intervals,
a random D and G model will be swapped in place for the
one being trained. After a short period of time, this model
is swapped back out. This is intended to increase training
diversity.
2020-05-13 16:53:38 -06:00
James Betker
4f6d3f0dfb Enable AMP optimizations & write sample train images to folder. 2020-04-21 16:28:06 -06:00
XintaoWang
037933ba66 mmsr 2019-08-23 21:42:47 +08:00