Commit Graph

2096 Commits

Author SHA1 Message Date
mrq
1433b7c0ea working Embedding override 2023-02-23 07:28:27 +00:00
mrq
fd66c4104b ugh 2023-02-23 07:18:07 +00:00
mrq
7bcedca771 I guess I can't easily toggle it outside of here, but it works 2023-02-23 07:02:06 +00:00
mrq
58600274ac Disabling bitsandbytes optimization as default for now, in the off chance that it actually produces garbage (which shouldn't happen, there's no chance, if training at float16 from a model at float16 works fine, then this has to work) 2023-02-23 03:22:59 +00:00
mrq
6676c89c0e I sucked off the hyptothetical wizard again, just using BNB's ADAM optimizer nets HUGE savings, but I don't know the output costs, will need to test 2023-02-23 02:42:17 +00:00
mrq
4427d7fb84 initial conversion (errors out) 2023-02-22 23:07:05 +00:00
mrq
6c284ef8ec oops 2023-02-18 03:27:04 +00:00
mrq
8db762fa17 thought I copied this over 2023-02-18 03:15:44 +00:00
mrq
73d9c3bd46 set output folder to be sane with the cwd as a reference point 2023-02-18 02:01:09 +00:00
mrq
5ecf7da881 Fix later 2023-02-17 20:49:29 +00:00
mrq
e3e8801e5f Fix I thought wasn't needed since it literally worked without it earlier 2023-02-17 20:41:20 +00:00
mrq
535549c3f3 add some snark about the kludge I had to fix, and the kludge I used to fix it 2023-02-17 19:20:19 +00:00
mrq
a09cf98c7f more cleanup, pip-ifying won't work, got an alternative 2023-02-17 15:47:55 +00:00
mrq
6afa2c299e break if your dataset size is smaller than your batch size 2023-02-17 04:08:27 +00:00
mrq
94d0f16608 Necessary fixes to get it to work 2023-02-17 02:03:00 +00:00
mrq
49e23b226b pip-ify 2023-02-17 00:33:50 +00:00
James Betker
f31a333c4f more sampling fixes 2022-10-10 20:11:28 -06:00
James Betker
5d172fbf7e Fix eval 2022-10-10 14:22:36 -06:00
James Betker
9502e0755e ugh 2022-10-10 12:15:51 -06:00
James Betker
fce2c8f5db and listify them 2022-10-10 12:13:49 -06:00
James Betker
3cf78e3c44 train mel head even when not 2022-10-10 12:10:56 -06:00
James Betker
cc74a43675 Checkin 2022-10-10 11:30:20 -06:00
James Betker
3cb14123bc glc fix 2022-07-29 11:24:36 -06:00
James Betker
4ddd01a7fb support generating cheaters from the new cheater network 2022-07-29 09:19:20 -06:00
James Betker
27a9b1b750 rename perplexity->log perplexity 2022-07-28 09:48:40 -06:00
James Betker
1d68624828 fix some imports.. 2022-07-28 02:35:32 -06:00
James Betker
cfe907f13f i like this better 2022-07-28 02:33:23 -06:00
James Betker
d44ed5d12d probably too harsh on ninfs 2022-07-28 01:33:54 -06:00
James Betker
4509cfc705 track logperp for diffusion evals 2022-07-28 01:30:44 -06:00
James Betker
19eb939ccf gd perplexity
# Conflicts:
#	codes/trainer/eval/music_diffusion_fid.py
2022-07-28 00:25:05 -06:00
James Betker
a1bbde8a43 few things 2022-07-26 11:52:03 -06:00
James Betker
f8108cfdb2 update environment and fix a bunch of deps 2022-07-24 23:43:25 -06:00
James Betker
45afefabed fix booboo 2022-07-24 18:00:14 -06:00
James Betker
cc62ba9cba few more tfd13 things 2022-07-24 17:39:33 -06:00
James Betker
f3d967dbf5 remove eta from mdf 2022-07-24 17:21:20 -06:00
James Betker
76464ca063 some fixes to mdf to support new archs 2022-07-21 10:55:50 -06:00
James Betker
13c263e9fb go all in on m2wv3 2022-07-21 00:51:27 -06:00
James Betker
24a78bd7d1 update tfd14 too 2022-07-21 00:45:33 -06:00
James Betker
02ebda42f2 #yolo 2022-07-21 00:43:03 -06:00
James Betker
b92ff8de78 misc 2022-07-20 23:59:32 -06:00
James Betker
a1743d26aa Revert "Try to squeeze a bit more performance out of this arch"
This reverts commit 767f963392.
2022-07-20 23:57:56 -06:00
James Betker
767f963392 Try to squeeze a bit more performance out of this arch 2022-07-20 23:51:11 -06:00
James Betker
b9d0f7e6de simplify parameterization a bit 2022-07-20 23:41:54 -06:00
James Betker
ee8ceed6da rework tfd13 further
- use a gated activation layer for both attention & convs
- add a relativistic learned position bias. I believe this is similar to the T5 position encodings but it is simpler and learned
- get rid of prepending to the attention matrix - this doesn't really work that well. the model eventually learns to attend one of its heads to these blocks but why not just concat if it is doing that?
2022-07-20 23:28:29 -06:00
James Betker
40427de8e3 update tfd13 for inference 2022-07-20 21:51:25 -06:00
James Betker
dbebe18602 Fix ts=0 with new formulation 2022-07-20 12:12:33 -06:00
James Betker
82bd62019f diffuse the cascaded prior for continuous sr model 2022-07-20 11:54:09 -06:00
James Betker
b0e3be0a17 transition to nearest interpolation mode for downsampling 2022-07-20 10:56:17 -06:00
James Betker
7b3fc79737 iq checkin 2022-07-20 10:19:32 -06:00
James Betker
9a37f3ba42 reminder to future self 2022-07-20 10:19:15 -06:00