Commit Graph

849 Commits

Author SHA1 Message Date
mrq
caad99ab78 fix for bsz>1 because I forgot the old implementation implicitly handles this 2025-04-02 17:17:37 -05:00
mrq
068dbdb785 ugh 2025-04-02 17:05:16 -05:00
mrq
0e995dbf2c is this my last cope (falling back to explicit duration prediction, as this regression just won't go away) (also the smaller model was lobotomized because of my ROCm setup having a botched SDPA for who knows why) 2025-04-02 17:01:24 -05:00
mrq
7a0956863d oops 2025-03-31 21:11:43 -05:00
mrq
a1184586ef should never have trusted mse_loss, it never works 2025-03-31 20:59:13 -05:00
mrq
99f251c768 slight tweaks to condition-less NS/SR 2025-03-30 10:37:40 -05:00
mrq
478aea0e8c tweaks 2025-03-28 19:49:54 -05:00
mrq
6ae282e090 re-added noise dataloader sampler whatever for the old implementation's other tasks that require it 2025-03-28 15:07:06 -05:00
mrq
90b3509404 I'll just cope and say I cannot apply segmented attention masks to the smaller model as it's too trained on not doing it, and the regression came from dumb python aliasing rules 2025-03-27 13:27:51 -05:00
mrq
2fd82a7a22 cannot get segmented mask to actually work without gradients exploding (need to find a different way to do duration prediction...) 2025-03-27 00:51:41 -05:00
mrq
4d777b5618 add remark that segmented attention actually might be broken (for some reason this only emerged recently, need to investigate) 2025-03-26 12:08:47 -05:00
mrq
09e9438941 ugh 2025-03-25 23:24:01 -05:00
mrq
8641c87611 nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression) 2025-03-25 23:06:16 -05:00
mrq
aa8b32d97e added more notes (although I could have sworn I have had more notes that i can't recall) 2025-03-25 18:53:06 -05:00
mrq
df5b870908 added remark about not using sliding attention 2025-03-22 12:44:34 -05:00
mrq
02a8bcbe29 fixed errant index error (although it makes me wonder if my segmented masking is still flawed) 2025-03-21 23:41:34 -05:00
mrq
d1d91295b3 add segmented sliding attention, also found a bug with prom-less segments in the attention mask generation......... 2025-03-21 19:05:49 -05:00
mrq
589cfb0e18 yuge speedup because of a dumb oversight 2025-03-20 17:39:41 -05:00
mrq
8068f24e35 cleaned up parallel nar, i think it's slightly faster but even the smallest model is still slower than ar+nar-len-llama-8... 2025-03-20 15:56:15 -05:00
mrq
9a7458cf17 fixed inferencing since I did delete the len_emb, some more notes on the model since it seems I just had bad experimental settings 2025-03-19 22:41:48 -05:00
mrq
61de653ad9 now causal training should work again 2025-03-19 14:20:19 -05:00
mrq
85b9dd47c1 ugh 2025-03-19 13:31:50 -05:00
mrq
81acd565b3 re-enable these 2025-03-18 20:59:33 -05:00
mrq
5479d2eacc more tweaks to the new implementation (properly trim the len stuff to save some params, decoder to d_ffn expansion to 2 to maybe also make it faster, etc.) 2025-03-18 19:34:37 -05:00
mrq
9a8a8e3195 off by one bateman 2025-03-18 08:40:43 -05:00
mrq
0280e72257 ugh 2025-03-17 21:49:45 -05:00
mrq
b0dba9db07 this may bite me in the ass 2025-03-17 21:46:50 -05:00
mrq
2dfef693c4 comments for clarity 2025-03-16 11:30:23 -05:00
mrq
c5475ebc91 another dataloader optimization 2025-03-15 20:18:58 -05:00
mrq
bee2688dea ugh 2025-03-15 16:50:21 -05:00
mrq
2053580838 updated dataloader to hopefully reduce RAM usage 2025-03-15 13:14:37 -05:00
mrq
9cfbf94b1c config-ify the len_loss_factor 2025-03-14 20:30:48 -05:00
mrq
ca8cc15271 more tweaks (vall_e.webui --yaml still breaks things, --model needs to deduce what audio backend now that im supporting other ones again // added easy top-sampler settings back for new implementation) 2025-03-14 20:18:25 -05:00
mrq
6ee505cffd fixed dac 2025-03-12 23:17:27 -05:00
mrq
ba5f3d19b4 use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes) 2025-03-12 22:47:19 -05:00
mrq
2ccf1b5740 actually do duration prediction 2025-03-11 22:14:54 -05:00
mrq
5c512717a6 len prediction for new model (and remove logit normalization since it kills inferencing) 2025-03-11 20:33:09 -05:00
mrq
5f98543d4d ughh 2025-03-10 21:18:57 -05:00
mrq
8ac03aac8a ugh 2025-03-10 21:14:56 -05:00
mrq
5670fcb23f hopefully the final tweaks needed for this bastard of a model 2025-03-10 20:59:11 -05:00
mrq
00d1fed217 another optimization (within the dataloader because the similar utterance sampler was mondo slow) 2025-03-08 17:10:50 -06:00
mrq
5e9d1a5302 one more time one more time (this normalization isn't a spook) 2025-03-07 19:32:42 -06:00
mrq
93044829af one more time (could have sworn i tested it with batch size > 1) 2025-03-07 19:14:33 -06:00
mrq
6cea840710 oops 2025-03-07 18:57:25 -06:00
mrq
dbd34b6430 add specialized calc_loss because schizo 2025-03-07 18:44:11 -06:00
mrq
8d848ed549 handle case of dropping cond for segment mask 2025-03-07 14:11:58 -06:00
mrq
89e52b9877 ugh 2025-03-07 13:55:57 -06:00
mrq
6afc2b7526 gut feeling to change the attention mask 2025-03-07 13:51:59 -06:00
mrq
91ede71cf0 ugh 2025-03-06 17:19:27 -06:00
mrq
2dd80a03ff stuff for interfacing with the loss scaler value (because I want to cap it) 2025-03-06 17:07:29 -06:00