DL-Art-School

Author	SHA1	Message	Date
James Betker	da53090ce6	More adjustments to support distributed training with teco & on multi_modal_train	2020-10-27 20:58:03 -06:00
James Betker	2a3eec8fd7	Fix some distributed training snafus	2020-10-27 15:24:05 -06:00
James Betker	15e00e9014	Finish integration with autocast Note: autocast is broken when also using checkpoint(). Overcome this by modifying torch's checkpoint() function in place to also use autocast.	2020-10-22 14:39:19 -06:00
James Betker	d7ee14f721	Move to torch.cuda.amp (not working) Running into OOM errors, needs diagnosing. Checkpointing here.	2020-10-22 13:58:05 -06:00
James Betker	24792bdb4f	Codebase cleanup Removed a lot of legacy stuff I have no intent on using again. Plan is to shape this repo into something more extensible (get it? hah!)	2020-10-13 20:56:39 -06:00
James Betker	8014f050ac	Clear metrics properly Holy cow, what a PITA bug.	2020-10-13 10:07:49 -06:00
James Betker	8197fd646f	Don't accumulate losses for metrics when the loss isn't a tensor	2020-10-03 11:03:55 -06:00
James Betker	39865ca3df	TOTAL_loss, dumbo	2020-10-02 21:06:10 -06:00
James Betker	4e44fcd655	Loss accumulator fix	2020-10-02 20:55:33 -06:00
James Betker	567b4d50a4	ExtensibleTrainer - don't compute backward when there is no loss	2020-10-02 20:54:06 -06:00
James Betker	dc8f3b24de	Don't let duplicate keys be used for injectors and losses	2020-09-29 16:59:44 -06:00
James Betker	f9b83176f1	Fix bugs in extensibletrainer	2020-09-28 22:09:42 -06:00
James Betker	31641d7f63	Add ImagePatchInjector and TranslationalLoss	2020-09-26 21:25:32 -06:00
James Betker	6d0490a0e6	Tecogan implementation work	2020-09-25 16:38:23 -06:00
James Betker	f40beb5460	Add 'before' and 'after' defs to injections, steps and optimizers	2020-09-22 17:03:22 -06:00
James Betker	e9a39bfa14	Recursively detach all outputs, even if they are nested in data structures	2020-09-19 21:47:34 -06:00
James Betker	9a17ade550	Some convenience adjustments to ExtensibleTrainer	2020-09-17 21:05:32 -06:00
James Betker	5b85f891af	Only log the name of the first network in the total_loss training set	2020-09-12 16:07:09 -06:00
James Betker	fb595e72a4	Supporting infrastructure in ExtensibleTrainer to train spsr4 Need to be able to train 2 nets in one step: the backbone will be entirely separate with its own optimizer (for an extremely low LR). This functionality was already present, just not implemented correctly.	2020-09-11 22:57:06 -06:00
James Betker	5189b11dac	Add combined dataset for training across multiple datasets	2020-09-11 08:44:06 -06:00
James Betker	3027e6e27d	Enable amp to be disabled	2020-09-09 10:45:59 -06:00
James Betker	c04f244802	More mods	2020-09-08 20:36:27 -06:00
James Betker	e8613041c0	Add novograd optimizer	2020-09-06 17:27:08 -06:00
James Betker	21ae135f23	Allow Novograd to be used as an optimizer	2020-09-05 16:50:13 -06:00
James Betker	0dfd8eaf3b	Support injectors that run in eval only	2020-09-05 07:59:45 -06:00
James Betker	4b4d08bdec	Enable testing in ExtensibleTrainer, fix it in SRGAN_model Also compute fea loss for this.	2020-08-31 09:41:48 -06:00
James Betker	dffc15184d	More ExtensibleTrainer work It runs now, just need to debug it to reach performance parity with SRGAN. Sweet.	2020-08-23 17:22:45 -06:00
James Betker	e59e712e39	More ExtensibleTrainer work	2020-08-22 13:08:33 -06:00
James Betker	f40545f235	ExtensibleTrainer work	2020-08-22 08:24:34 -06:00
James Betker	74cdaa2226	Some work on extensible trainer	2020-08-18 08:49:32 -06:00
James Betker	ab04ca1778	Extensible trainer (in progress)	2020-08-12 08:45:23 -06:00

31 Commits