Commit Graph

2137 Commits

Author SHA1 Message Date
mrq
0f04206aa2 added ability to toggle some settings with envvars for later testing without needing to manually edit this file (and some other things like disabling it when a user requests it in the future) 2023-02-24 23:08:56 +00:00
mrq
1433b7c0ea working Embedding override 2023-02-23 07:28:27 +00:00
mrq
94aefa3e4c silence 2023-02-23 07:25:09 +00:00
mrq
fd66c4104b ugh 2023-02-23 07:18:07 +00:00
mrq
7bcedca771 I guess I can't easily toggle it outside of here, but it works 2023-02-23 07:02:06 +00:00
mrq
0ef8ab6872 shut up 2023-02-23 06:12:27 +00:00
mrq
58600274ac Disabling bitsandbytes optimization as default for now, in the off chance that it actually produces garbage (which shouldn't happen, there's no chance, if training at float16 from a model at float16 works fine, then this has to work) 2023-02-23 03:22:59 +00:00
mrq
918473807f Merge pull request 'bitsandbytes' (#2) from bitsandbytes into master
Reviewed-on: #2
2023-02-23 03:16:25 +00:00
mrq
6676c89c0e I sucked off the hyptothetical wizard again, just using BNB's ADAM optimizer nets HUGE savings, but I don't know the output costs, will need to test 2023-02-23 02:42:17 +00:00
mrq
01c0941a40 binaries 2023-02-22 23:09:27 +00:00
mrq
4427d7fb84 initial conversion (errors out) 2023-02-22 23:07:05 +00:00
mrq
6c284ef8ec oops 2023-02-18 03:27:04 +00:00
mrq
8db762fa17 thought I copied this over 2023-02-18 03:15:44 +00:00
mrq
73d9c3bd46 set output folder to be sane with the cwd as a reference point 2023-02-18 02:01:09 +00:00
mrq
5ecf7da881 Fix later 2023-02-17 20:49:29 +00:00
mrq
e3e8801e5f Fix I thought wasn't needed since it literally worked without it earlier 2023-02-17 20:41:20 +00:00
mrq
535549c3f3 add some snark about the kludge I had to fix, and the kludge I used to fix it 2023-02-17 19:20:19 +00:00
mrq
a09cf98c7f more cleanup, pip-ifying won't work, got an alternative 2023-02-17 15:47:55 +00:00
mrq
6afa2c299e break if your dataset size is smaller than your batch size 2023-02-17 04:08:27 +00:00
mrq
94d0f16608 Necessary fixes to get it to work 2023-02-17 02:03:00 +00:00
mrq
49e23b226b pip-ify 2023-02-17 00:33:50 +00:00
James Betker
f31a333c4f more sampling fixes 2022-10-10 20:11:28 -06:00
James Betker
5d172fbf7e Fix eval 2022-10-10 14:22:36 -06:00
James Betker
9502e0755e ugh 2022-10-10 12:15:51 -06:00
James Betker
fce2c8f5db and listify them 2022-10-10 12:13:49 -06:00
James Betker
3cf78e3c44 train mel head even when not 2022-10-10 12:10:56 -06:00
James Betker
cc74a43675 Checkin 2022-10-10 11:30:20 -06:00
James Betker
3cb14123bc glc fix 2022-07-29 11:24:36 -06:00
James Betker
4ddd01a7fb support generating cheaters from the new cheater network 2022-07-29 09:19:20 -06:00
James Betker
27a9b1b750 rename perplexity->log perplexity 2022-07-28 09:48:40 -06:00
James Betker
1d68624828 fix some imports.. 2022-07-28 02:35:32 -06:00
James Betker
cfe907f13f i like this better 2022-07-28 02:33:23 -06:00
James Betker
d44ed5d12d probably too harsh on ninfs 2022-07-28 01:33:54 -06:00
James Betker
4509cfc705 track logperp for diffusion evals 2022-07-28 01:30:44 -06:00
James Betker
19eb939ccf gd perplexity
# Conflicts:
#	codes/trainer/eval/music_diffusion_fid.py
2022-07-28 00:25:05 -06:00
James Betker
a1bbde8a43 few things 2022-07-26 11:52:03 -06:00
James Betker
f8108cfdb2 update environment and fix a bunch of deps 2022-07-24 23:43:25 -06:00
James Betker
45afefabed fix booboo 2022-07-24 18:00:14 -06:00
James Betker
cc62ba9cba few more tfd13 things 2022-07-24 17:39:33 -06:00
James Betker
f3d967dbf5 remove eta from mdf 2022-07-24 17:21:20 -06:00
James Betker
76464ca063 some fixes to mdf to support new archs 2022-07-21 10:55:50 -06:00
James Betker
13c263e9fb go all in on m2wv3 2022-07-21 00:51:27 -06:00
James Betker
24a78bd7d1 update tfd14 too 2022-07-21 00:45:33 -06:00
James Betker
02ebda42f2 #yolo 2022-07-21 00:43:03 -06:00
James Betker
b92ff8de78 misc 2022-07-20 23:59:32 -06:00
James Betker
a1743d26aa Revert "Try to squeeze a bit more performance out of this arch"
This reverts commit 767f963392.
2022-07-20 23:57:56 -06:00
James Betker
767f963392 Try to squeeze a bit more performance out of this arch 2022-07-20 23:51:11 -06:00
James Betker
b9d0f7e6de simplify parameterization a bit 2022-07-20 23:41:54 -06:00
James Betker
ee8ceed6da rework tfd13 further
- use a gated activation layer for both attention & convs
- add a relativistic learned position bias. I believe this is similar to the T5 position encodings but it is simpler and learned
- get rid of prepending to the attention matrix - this doesn't really work that well. the model eventually learns to attend one of its heads to these blocks but why not just concat if it is doing that?
2022-07-20 23:28:29 -06:00
James Betker
40427de8e3 update tfd13 for inference 2022-07-20 21:51:25 -06:00