Commit Graph

846 Commits

Author SHA1 Message Date
mrq
7a0956863d oops 2025-03-31 21:11:43 -05:00
mrq
a1184586ef should never have trusted mse_loss, it never works 2025-03-31 20:59:13 -05:00
mrq
99f251c768 slight tweaks to condition-less NS/SR 2025-03-30 10:37:40 -05:00
mrq
478aea0e8c tweaks 2025-03-28 19:49:54 -05:00
mrq
6ae282e090 re-added noise dataloader sampler whatever for the old implementation's other tasks that require it 2025-03-28 15:07:06 -05:00
mrq
90b3509404 I'll just cope and say I cannot apply segmented attention masks to the smaller model as it's too trained on not doing it, and the regression came from dumb python aliasing rules 2025-03-27 13:27:51 -05:00
mrq
2fd82a7a22 cannot get segmented mask to actually work without gradients exploding (need to find a different way to do duration prediction...) 2025-03-27 00:51:41 -05:00
mrq
4d777b5618 add remark that segmented attention actually might be broken (for some reason this only emerged recently, need to investigate) 2025-03-26 12:08:47 -05:00
mrq
09e9438941 ugh 2025-03-25 23:24:01 -05:00
mrq
8641c87611 nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression) 2025-03-25 23:06:16 -05:00
mrq
aa8b32d97e added more notes (although I could have sworn I have had more notes that i can't recall) 2025-03-25 18:53:06 -05:00
mrq
df5b870908 added remark about not using sliding attention 2025-03-22 12:44:34 -05:00
mrq
02a8bcbe29 fixed errant index error (although it makes me wonder if my segmented masking is still flawed) 2025-03-21 23:41:34 -05:00
mrq
d1d91295b3 add segmented sliding attention, also found a bug with prom-less segments in the attention mask generation......... 2025-03-21 19:05:49 -05:00
mrq
589cfb0e18 yuge speedup because of a dumb oversight 2025-03-20 17:39:41 -05:00
mrq
8068f24e35 cleaned up parallel nar, i think it's slightly faster but even the smallest model is still slower than ar+nar-len-llama-8... 2025-03-20 15:56:15 -05:00
mrq
9a7458cf17 fixed inferencing since I did delete the len_emb, some more notes on the model since it seems I just had bad experimental settings 2025-03-19 22:41:48 -05:00
mrq
61de653ad9 now causal training should work again 2025-03-19 14:20:19 -05:00
mrq
85b9dd47c1 ugh 2025-03-19 13:31:50 -05:00
mrq
81acd565b3 re-enable these 2025-03-18 20:59:33 -05:00
mrq
5479d2eacc more tweaks to the new implementation (properly trim the len stuff to save some params, decoder to d_ffn expansion to 2 to maybe also make it faster, etc.) 2025-03-18 19:34:37 -05:00
mrq
9a8a8e3195 off by one bateman 2025-03-18 08:40:43 -05:00
mrq
0280e72257 ugh 2025-03-17 21:49:45 -05:00
mrq
b0dba9db07 this may bite me in the ass 2025-03-17 21:46:50 -05:00
mrq
2dfef693c4 comments for clarity 2025-03-16 11:30:23 -05:00
mrq
c5475ebc91 another dataloader optimization 2025-03-15 20:18:58 -05:00
mrq
bee2688dea ugh 2025-03-15 16:50:21 -05:00
mrq
2053580838 updated dataloader to hopefully reduce RAM usage 2025-03-15 13:14:37 -05:00
mrq
9cfbf94b1c config-ify the len_loss_factor 2025-03-14 20:30:48 -05:00
mrq
ca8cc15271 more tweaks (vall_e.webui --yaml still breaks things, --model needs to deduce what audio backend now that im supporting other ones again // added easy top-sampler settings back for new implementation) 2025-03-14 20:18:25 -05:00
mrq
6ee505cffd fixed dac 2025-03-12 23:17:27 -05:00
mrq
ba5f3d19b4 use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes) 2025-03-12 22:47:19 -05:00
mrq
2ccf1b5740 actually do duration prediction 2025-03-11 22:14:54 -05:00
mrq
5c512717a6 len prediction for new model (and remove logit normalization since it kills inferencing) 2025-03-11 20:33:09 -05:00
mrq
5f98543d4d ughh 2025-03-10 21:18:57 -05:00
mrq
8ac03aac8a ugh 2025-03-10 21:14:56 -05:00
mrq
5670fcb23f hopefully the final tweaks needed for this bastard of a model 2025-03-10 20:59:11 -05:00
mrq
00d1fed217 another optimization (within the dataloader because the similar utterance sampler was mondo slow) 2025-03-08 17:10:50 -06:00
mrq
5e9d1a5302 one more time one more time (this normalization isn't a spook) 2025-03-07 19:32:42 -06:00
mrq
93044829af one more time (could have sworn i tested it with batch size > 1) 2025-03-07 19:14:33 -06:00
mrq
6cea840710 oops 2025-03-07 18:57:25 -06:00
mrq
dbd34b6430 add specialized calc_loss because schizo 2025-03-07 18:44:11 -06:00
mrq
8d848ed549 handle case of dropping cond for segment mask 2025-03-07 14:11:58 -06:00
mrq
89e52b9877 ugh 2025-03-07 13:55:57 -06:00
mrq
6afc2b7526 gut feeling to change the attention mask 2025-03-07 13:51:59 -06:00
mrq
91ede71cf0 ugh 2025-03-06 17:19:27 -06:00
mrq
2dd80a03ff stuff for interfacing with the loss scaler value (because I want to cap it) 2025-03-06 17:07:29 -06:00
mrq
a30dffcca7 wandb additions (to-do eventually, upload samples as artifacts) 2025-03-06 15:44:40 -06:00
mrq
ec87308d75 final tweaks before training this meme 44khz model for the 3rd time 2025-03-06 15:31:15 -06:00
mrq
5cd71ef238 QoL so I can stop having to manually inject different configs 2025-03-06 14:48:14 -06:00