forked from mrq/DL-Art-School
ee8ceed6da
- use a gated activation layer for both attention & convs - add a relativistic learned position bias. I believe this is similar to the T5 position encodings but it is simpler and learned - get rid of prepending to the attention matrix - this doesn't really work that well. the model eventually learns to attend one of its heads to these blocks but why not just concat if it is doing that? |
||
---|---|---|
.. | ||
__init__.py | ||
audio_diffusion_fid.py | ||
eval_wer.py | ||
evaluator.py | ||
fid.py | ||
flow_gaussian_nll.py | ||
mel_evaluator.py | ||
music_diffusion_fid.py | ||
single_point_pair_contrastive_eval.py | ||
sr_diffusion_fid.py | ||
sr_fid.py | ||
sr_style.py |