- Added LARS and SGD optimizer variants that support turning off certain
features for BN and bias layers
- Added a variant of pytorch's resnet model that supports gradient checkpointing.
- Modify the trainer infrastructure to support above
- Fix bug with BYOL (should have been nonfunctional)