6676c89c0e
I sucked off the hyptothetical wizard again, just using BNB's ADAM optimizer nets HUGE savings, but I don't know the output costs, will need to test
2023-02-23 02:42:17 +00:00
4427d7fb84
initial conversion (errors out)
2023-02-22 23:07:05 +00:00
James Betker
47b34f5cb9
mup work checkin
2022-06-09 21:15:09 -06:00
James Betker
ab5acead0e
add exp loss for diffusion models
2022-05-15 21:50:38 -06:00
James Betker
2a29a71c37
attempt to force meaningful codes by adding a surrogate loss
2022-03-26 08:31:40 -06:00
James Betker
8b376e63d9
More improvements
2022-03-16 10:16:34 -06:00
James Betker
e045fb0ad7
fix clip grad norm with scaler
2022-03-13 16:28:23 -06:00
James Betker
3e5da71b16
add grad scaler scale to metrics
2022-03-08 15:52:42 -07:00
James Betker
6873ad6660
Support functionality
2022-03-03 21:52:16 -07:00
James Betker
f458f5d8f1
abort early if losses reach nan too much, and save the model
2022-02-24 20:55:30 -07:00
James Betker
18dc62453f
Don't step if NaN losses are encountered.
2022-02-24 17:45:08 -07:00
James Betker
03752c1cd6
Report NaN
2022-02-22 23:09:37 -07:00
James Betker
79e8f36d30
Convert CLIP models into new folder
2022-02-15 20:53:07 -07:00
James Betker
8f767b8b4f
...
2022-02-15 07:08:17 -07:00
James Betker
29e07913a8
Fix
2022-02-15 06:58:11 -07:00
James Betker
dd585df772
LAMB optimizer
2022-02-15 06:48:13 -07:00
James Betker
5175b7d91a
training sweeper checkin
2022-02-11 10:46:37 -07:00
James Betker
fbea6e8eac
Adjustments to diffusion networks
2022-01-30 16:14:06 -07:00
James Betker
e420df479f
Allow steps to specify which state keys to carry forward (reducing memory utilization)
2022-01-24 11:01:27 -07:00
James Betker
ce929a6b3f
Allow grad scaler to be enabled even in fp32 mode
2022-01-21 23:13:24 -07:00
James Betker
894d245062
More zero_grad fixes
2022-01-08 20:31:19 -07:00
James Betker
2a9a25e6e7
Fix likely defective nan grad recovery
2022-01-08 18:24:58 -07:00
James Betker
64cb4a92db
Support adamw_zero
2021-12-25 21:32:01 -07:00
James Betker
e7957e4897
Make loss accumulator for logs accumulate better
2021-12-12 22:23:17 -07:00
James Betker
79367f753d
Fix error & add nonfinite warning
2021-11-09 23:58:41 -07:00
James Betker
87364b890f
Add custom clip_grad_norm that prints out the param names in error.
2021-11-01 11:12:20 -06:00
James Betker
82fc69abfa
Add "pure" evaluator
...
Which simply computes the training loss against an eval dataset
2021-08-09 14:58:35 -06:00
James Betker
4c98b9703f
Get dalle-style TTS to "work"
2021-08-03 21:08:27 -06:00
James Betker
965f6e6b52
Fixes to weight_decay in adamw
2021-07-31 15:58:41 -06:00
James Betker
1ff434218e
tacotron2, ready for prime time!
2021-07-08 22:13:44 -06:00
James Betker
6fd16ea9c8
Add meta-anomaly detection, colorjitter augmentation
2021-06-29 13:41:55 -06:00
James Betker
a57ed8e960
Various mods to support better jpeg image filtering
2021-06-25 13:16:15 -06:00
James Betker
9cfe840872
Attempt to fix syncing multiple times when doing gradient accumulation
2021-06-13 14:30:30 -06:00
James Betker
3e3ad7825f
Add support for training an EMA network alongside the main networks
2021-06-12 21:01:41 -06:00
James Betker
65c474eecf
Various changes to fix testing
2021-06-11 15:31:10 -06:00
James Betker
692e9c417b
Support diffusion unet
2021-06-06 13:57:22 -06:00
James Betker
80d4404367
A few fixes:
...
- Output better prediction of xstart from eps
- Support LossAwareSampler
- Support AdamW
2021-06-05 13:40:32 -06:00
James Betker
34f8c8641f
Support training imagenet classifier
2021-01-11 20:09:16 -07:00
James Betker
edf9c38198
Make ExtensibleTrainer set the starting step for the LR scheduler
2021-01-02 22:22:34 -07:00
James Betker
bdbab65082
Allow optimizers to train separate param groups, add higher dimensional VGG discriminator
...
Did this to support training 512x512px networks off of a pretrained 256x256 network.
2021-01-02 15:10:06 -07:00
James Betker
9c53314ea2
Add gradient penalty visual debug
2020-12-30 09:51:59 -07:00
James Betker
63cf3d3126
Injector auto-registration
...
I love it!
2020-12-29 20:58:02 -07:00
James Betker
036684893e
Add LARS optimizer & support for BYOL idiosyncrasies
...
- Added LARS and SGD optimizer variants that support turning off certain
features for BN and bias layers
- Added a variant of pytorch's resnet model that supports gradient checkpointing.
- Modify the trainer infrastructure to support above
- Fix bug with BYOL (should have been nonfunctional)
2020-12-23 20:33:43 -07:00
James Betker
5640e4efe4
More refactoring
2020-12-18 09:18:34 -07:00