Commit Graph

67 Commits

Author SHA1 Message Date
mrq
6676c89c0e I sucked off the hyptothetical wizard again, just using BNB's ADAM optimizer nets HUGE savings, but I don't know the output costs, will need to test 2023-02-23 02:42:17 +00:00
mrq
4427d7fb84 initial conversion (errors out) 2023-02-22 23:07:05 +00:00
James Betker
a1bbde8a43 few things 2022-07-26 11:52:03 -06:00
James Betker
78bba690de auto grad "lr" scaling 2022-07-08 00:38:25 -06:00
James Betker
7a36668870 whoops! 2022-06-12 21:11:34 -06:00
James Betker
efabcf5008 When ema is on CPU, only update every 10 steps. 2022-06-12 18:34:58 -06:00
James Betker
3db862dd32 adf update 2022-05-27 09:25:53 -06:00
James Betker
7213ad2b89 Do grad reduction 2022-05-17 17:59:40 -06:00
James Betker
07731d5491 Fix ET 2022-03-24 21:20:22 -06:00
James Betker
963f0e9cee fix unscaler 2022-03-22 11:40:02 -06:00
James Betker
428911cd4d flat diffusion network 2022-03-17 10:53:56 -06:00
James Betker
8b376e63d9 More improvements 2022-03-16 10:16:34 -06:00
James Betker
6000580e2e df 2022-03-04 13:47:00 -07:00
James Betker
e1052a5e32 Move log consensus to train for efficiency 2022-03-04 13:41:32 -07:00
James Betker
ce6dfdf255 Distributed "fixes" 2022-03-04 12:46:41 -07:00
James Betker
3ff878ae85 Accumulate loss & grad_norm metrics from all entities within a distributed graph 2022-03-04 12:01:16 -07:00
James Betker
f490eaeba7 Shuffle optimizer states back and forth between cpu memory during steps 2022-03-04 10:38:51 -07:00
James Betker
3c242403f5 adjust location of pre-optimizer step so I can visualize the new grad norms 2022-03-04 08:56:42 -07:00
James Betker
6873ad6660 Support functionality 2022-03-03 21:52:16 -07:00
James Betker
70fa780edb Add mechanism to export grad norms 2022-03-01 20:19:52 -07:00
James Betker
bcba65c539 DataParallel Fix 2022-02-19 20:36:35 -07:00
James Betker
34001ad765 et 2022-02-18 18:52:33 -07:00
James Betker
2bdb515068 A few mods to make wav2vec2 trainable with DDP on DLAS 2022-02-15 06:28:54 -07:00
James Betker
52b61b9f77 Update scripts and attempt to figure out how UnifiedVoice could be used to produce CTC codes 2022-02-13 20:48:06 -07:00
James Betker
15fd60aad3 Allow EMA training to be disabled 2022-02-12 20:00:23 -07:00
James Betker
3d946356f8 batch_size_optimizer works. sweet! no more tuning batch sizes. 2022-02-09 14:26:23 -07:00
James Betker
18938248e4 Add batch_size_optimizer support 2022-02-08 23:51:31 -07:00
James Betker
dfef34ba39 Load ema to cpu memory if specified 2022-01-24 15:08:29 -07:00
James Betker
3e16c509f6 Misc fixes 2022-01-24 14:31:43 -07:00
James Betker
62475005e4 Sort data items in descending order, which I suspect will improve performance because we will hit GC less 2022-01-23 19:05:32 -07:00
James Betker
bcd8cc51e1 Enable collated data for diffusion purposes 2022-01-19 00:35:08 -07:00
James Betker
f4484fd155 Add "dataset_debugger" support
This allows the datasets themselves compile statistics and report them
via tensorboard and wandb.
2022-01-06 12:38:20 -07:00
James Betker
776a7abfcc Support torch DDP _set_static_graph 2021-12-25 21:20:06 -07:00
James Betker
32cfcf3684 Turn off optimization in find_faulty_files 2021-12-09 09:02:09 -07:00
James Betker
ee9b199d2b Build in capacity to revert & resume networks that encounter a NaN
I'm increasingly seeing issues where something like this can be useful. In many (most?)
cases it's just a waste of compute, though. Still, better than a cold computer for a whole
night.
2021-11-01 16:14:59 -06:00
James Betker
b404a3b747 Revert recent changes to extr 2021-10-30 20:48:06 -06:00
James Betker
e9dc37f19c Mod trainer to copy config file into experiments root 2021-10-30 17:00:24 -06:00
James Betker
5c8d266d4f chk 2021-09-17 09:15:36 -06:00
James Betker
94899d88f3 Fix overuse of checkpointing 2021-09-16 23:00:28 -06:00
James Betker
3e073cff85 Set kernel_size in diffusion_vocoder 2021-09-01 08:33:46 -06:00
James Betker
04d14b3acc No batch factors for eval 2021-08-09 16:02:01 -06:00
James Betker
82fc69abfa Add "pure" evaluator
Which simply computes the training loss against an eval dataset
2021-08-09 14:58:35 -06:00
James Betker
1ff434218e tacotron2, ready for prime time! 2021-07-08 22:13:44 -06:00
James Betker
6fd16ea9c8 Add meta-anomaly detection, colorjitter augmentation 2021-06-29 13:41:55 -06:00
James Betker
5b4f86293f Add FID evaluator for diffusion models 2021-06-14 09:14:30 -06:00
James Betker
9cfe840872 Attempt to fix syncing multiple times when doing gradient accumulation 2021-06-13 14:30:30 -06:00
James Betker
3e3ad7825f Add support for training an EMA network alongside the main networks 2021-06-12 21:01:41 -06:00
James Betker
696f320820 Get rid of feature networks 2021-06-11 20:50:07 -06:00
James Betker
65c474eecf Various changes to fix testing 2021-06-11 15:31:10 -06:00
James Betker
2ad2b56438 Don't do wandb except on rank 0 2021-06-06 16:52:07 -06:00