forked from mrq/DL-Art-School
fba29d7dcc
Torch's distributed_data_parallel is missing "delay_allreduce", which is necessary to get gradient checkpointing to work with recurrent models. |
||
---|---|---|
.. | ||
archs | ||
experiments | ||
flownet2@2e9e010c98 | ||
layers | ||
steps | ||
__init__.py | ||
base_model.py | ||
ExtensibleTrainer.py | ||
feature_model.py | ||
loss.py | ||
lr_scheduler.py | ||
networks.py | ||
novograd.py | ||
SR_model.py | ||
SRGAN_model.py |