Commit Graph

21 Commits

Author SHA1 Message Date
mrq
6676c89c0e I sucked off the hyptothetical wizard again, just using BNB's ADAM optimizer nets HUGE savings, but I don't know the output costs, will need to test 2023-02-23 02:42:17 +00:00
mrq
4427d7fb84 initial conversion (errors out) 2023-02-22 23:07:05 +00:00
James Betker
f8108cfdb2 update environment and fix a bunch of deps 2022-07-24 23:43:25 -06:00
James Betker
02ebda42f2 #yolo 2022-07-21 00:43:03 -06:00
James Betker
ee8ceed6da rework tfd13 further
- use a gated activation layer for both attention & convs
- add a relativistic learned position bias. I believe this is similar to the T5 position encodings but it is simpler and learned
- get rid of prepending to the attention matrix - this doesn't really work that well. the model eventually learns to attend one of its heads to these blocks but why not just concat if it is doing that?
2022-07-20 23:28:29 -06:00
James Betker
c00398e955 scope attention in tfd13 as well 2022-07-19 14:59:43 -06:00
James Betker
b157b28c7b tfd14
hopefully this helps address the positional dependencies of tfd12
2022-07-19 13:30:05 -06:00
James Betker
1b4d9567f3 tfd13 for multi-resolution superscaling 2022-07-18 16:36:22 -06:00
James Betker
15831b2576 some stuff 2022-07-13 21:26:25 -06:00
James Betker
28d95e3141 gptmusic work 2022-06-16 15:09:47 -06:00
James Betker
c61cd64bc9 network updates 2022-06-08 09:26:59 -06:00
James Betker
49568ee16f some updates 2022-06-06 09:13:47 -06:00
James Betker
1f521d6a1d add reconstruction loss to m2v 2022-05-23 09:28:41 -06:00
James Betker
47662b9ec5 some random crap 2022-05-04 20:29:23 -06:00
James Betker
d186414566 More spring cleaning 2022-03-16 12:04:00 -06:00
James Betker
8ada52ccdc Update LR layers to checkpoint better 2022-01-22 08:22:57 -07:00
James Betker
f2a31702b5 Clean stuff up, move more things into arch_util 2021-10-20 21:19:25 -06:00
James Betker
a6f0f854b9 Fix codes when inferring from dvae 2021-10-17 22:51:17 -06:00
James Betker
398185e109 More work on wave-diffusion 2021-07-27 05:36:17 -06:00
James Betker
4328c2f713 Change default ReLU slope to .2 BREAKS COMPATIBILITY
This conforms my ConvGnLelu implementation with the generally accepted negative_slope=.2. I have no idea where I got .1. This will break backwards compatibility with some older models but will likely improve their performance when freshly trained. I did some auditing to find what these models might be, and I am not actively using any of them, so probably OK.
2020-12-19 08:28:03 -07:00
James Betker
5640e4efe4 More refactoring 2020-12-18 09:18:34 -07:00