forked from mrq/DL-Art-School
- use a gated activation layer for both attention & convs - add a relativistic learned position bias. I believe this is similar to the T5 position encodings but it is simpler and learned - get rid of prepending to the attention matrix - this doesn't really work that well. the model eventually learns to attend one of its heads to these blocks but why not just concat if it is doing that? |
||
---|---|---|
.. | ||
__init__.py | ||
cheater_gen_ar.py | ||
diffwave.py | ||
encoders.py | ||
flat_diffusion.py | ||
gpt_music.py | ||
gpt_music2.py | ||
instrument_quantizer.py | ||
m2v_code_to_mel.py | ||
mel2vec_codes_gpt.py | ||
music_quantizer.py | ||
music_quantizer2.py | ||
tfdpc_v5.py | ||
transformer_diffusion12.py | ||
transformer_diffusion13.py | ||
transformer_diffusion14.py | ||
unet_diffusion_music_codes.py | ||
unet_diffusion_waveform_gen_simple.py | ||
unet_diffusion_waveform_gen.py | ||
unet_diffusion_waveform_gen3.py |