James Betker
767f963392
Try to squeeze a bit more performance out of this arch
2022-07-20 23:51:11 -06:00
James Betker
b9d0f7e6de
simplify parameterization a bit
2022-07-20 23:41:54 -06:00
James Betker
ee8ceed6da
rework tfd13 further
...
- use a gated activation layer for both attention & convs
- add a relativistic learned position bias. I believe this is similar to the T5 position encodings but it is simpler and learned
- get rid of prepending to the attention matrix - this doesn't really work that well. the model eventually learns to attend one of its heads to these blocks but why not just concat if it is doing that?
2022-07-20 23:28:29 -06:00
James Betker
40427de8e3
update tfd13 for inference
2022-07-20 21:51:25 -06:00
James Betker
dbebe18602
Fix ts=0 with new formulation
2022-07-20 12:12:33 -06:00
James Betker
82bd62019f
diffuse the cascaded prior for continuous sr model
2022-07-20 11:54:09 -06:00
James Betker
b0e3be0a17
transition to nearest interpolation mode for downsampling
2022-07-20 10:56:17 -06:00
James Betker
15decfdb98
misc
2022-07-20 10:19:02 -06:00
James Betker
fc0b291b21
do masking up proper
2022-07-19 16:32:17 -06:00
James Betker
c00398e955
scope attention in tfd13 as well
2022-07-19 14:59:43 -06:00
James Betker
6b1cfe8e66
ugh
2022-07-19 11:14:20 -06:00
James Betker
eecb534e66
a few fixes to multiresolution sr
2022-07-19 11:11:15 -06:00
James Betker
df27b98730
ddp doesnt like dropout on checkpointed values
2022-07-18 17:17:04 -06:00
James Betker
c959e530cb
good ole ddp..
2022-07-18 17:13:45 -06:00
James Betker
cf57c352c8
Another fix
2022-07-18 17:09:13 -06:00
James Betker
83a4ef4149
default to use input for conditioning & add preprocessed input to GDI
2022-07-18 17:01:19 -06:00
James Betker
1b4d9567f3
tfd13 for multi-resolution superscaling
2022-07-18 16:36:22 -06:00