DL-Art-School/codes
James Betker ee8ceed6da rework tfd13 further
- use a gated activation layer for both attention & convs
- add a relativistic learned position bias. I believe this is similar to the T5 position encodings but it is simpler and learned
- get rid of prepending to the attention matrix - this doesn't really work that well. the model eventually learns to attend one of its heads to these blocks but why not just concat if it is doing that?
2022-07-20 23:28:29 -06:00
..
.idea
data
models rework tfd13 further 2022-07-20 23:28:29 -06:00
scripts
trainer rework tfd13 further 2022-07-20 23:28:29 -06:00
utils
multi_modal_train.py
process_video.py
requirements.txt
sweep.py
test.py
train.py
use_discriminator_as_filter.py