forked from ecker/DL-Art-School
Torch's distributed_data_parallel is missing "delay_allreduce", which is necessary to get gradient checkpointing to work with recurrent models. |
||
|---|---|---|
| .. | ||
| .idea | ||
| data | ||
| data_scripts | ||
| metrics | ||
| models | ||
| options | ||
| scripts | ||
| switched_conv@004dda04e3 | ||
| temp | ||
| utils | ||
| process_video.py | ||
| recover_tensorboard_log.py | ||
| requirements.txt | ||
| run_scripts.sh | ||
| test.py | ||
| train2.py | ||
| train.py | ||