fba29d7dcc
Torch's distributed_data_parallel is missing "delay_allreduce", which is necessary to get gradient checkpointing to work with recurrent models. |
||
---|---|---|
.. | ||
__init__.py | ||
arch_util.py | ||
discriminator_vgg_arch.py | ||
DiscriminatorResnet_arch_passthrough.py | ||
DiscriminatorResnet_arch.py | ||
feature_arch.py | ||
ProgressiveSrg_arch.py | ||
rcan.py | ||
ResGen_arch.py | ||
RRDBNet_arch.py | ||
spinenet_arch.py | ||
SPSR_arch.py | ||
SPSR_util.py | ||
SRResNet_arch.py | ||
StructuredSwitchedGenerator.py | ||
SwitchedResidualGenerator_arch.py |