Commit Graph

874 Commits

Author SHA1 Message Date
mrq
0cca4eb943 disable this cringe precheck for now since it causes problems 2025-05-22 13:21:36 -05:00
mrq
f12746b091 allow defining the default model name through env var, register nemo-larger in the model name list thing 2025-05-21 16:50:59 -05:00
mrq
e46d7ef2cb warn and ignore export when lora training because the state dict exported during training is wrong 2025-05-20 23:38:10 -05:00
mrq
fee02f4153 added option to explicitly load a lora without having to lobotomize yourself with creating a yaml just to do so 2025-05-20 23:28:29 -05:00
mrq
5018ddb107 i dont know why this managed to escape my attention 2025-05-20 15:13:21 -05:00
mrq
b2b243e7e7 addresses #9 2025-05-05 13:03:44 -05:00
mrq
5fe01ffc6c more notes / re-enabled top-k/p samplers for new implementation 2025-04-19 14:04:34 -05:00
mrq
f8e1d110dc when you uhh when you for once use your main rig to test and forgot to and when you port things back over 2025-04-18 20:49:00 -05:00
mrq
d9e18037cc new implementation tweaks and fixes to make it actually better (there were a lot of badwrong things being done that harmed the output quality, will evaluate the model further) 2025-04-18 20:36:44 -05:00
mrq
98d1d8cb1e added some more notes, tweaks (RIP DAC, it's over) 2025-04-17 20:24:40 -05:00
mrq
9e27d2e02e huggingface zerogpu cringe 2025-04-16 15:25:45 -05:00
mrq
814146a5e0 more settings bloat because there seems to be instability with the encoder as-is 2025-04-12 12:53:44 -05:00
mrq
f144389920 the culprit was initializing the level_weights for killing newly trained models............. 2025-04-10 23:06:16 -05:00
mrq
6c6a34dd21 i can't be assed to test if the prior commit works so being explicit like this should help until i can be bothered to halt training just to test this 2025-04-07 23:13:35 -05:00
mrq
6d42c9ae23 how foolish of me, not having a softmax as float32 (maybe addresses an emergent regression where bfloat16 training shits the bed where float16+loss scaling doesnt) 2025-04-07 22:51:52 -05:00
mrq
d6cd848c32 goodbye nvidia/audio-codec-44khz, crossed fingers for DAC again 2025-04-06 21:05:29 -05:00
mrq
1e22519d94 diagnosed both hf/llama.cpp versions to probably just being a faulty export method (to-do: migrate vall_e.models.base to vall_e.export --hf) 2025-04-05 22:05:39 -05:00
mrq
c34763769a ugh 2025-04-05 18:58:25 -05:00
mrq
b6692ce3de ugh 2025-04-05 18:20:46 -05:00
mrq
4a909ceff8 temp fix for vall_e.cpp demask scoring regression 2025-04-05 11:04:26 -05:00
mrq
44260f7445 tweaks 2025-04-05 10:27:07 -05:00
mrq
0ede3bfc12 updated vall_e.cpp, but i could have sworn it worked much better than this...... 2025-04-05 01:22:51 -05:00
mrq
28d39ef962 should not be working late 2025-04-03 23:32:58 -05:00
mrq
bfe70e9d56 ugh 2025-04-03 23:26:00 -05:00
mrq
2e93438867 reintroduced sampler_type = speaker because I think this might salvage the nemo model to have better speaker similarities 2025-04-03 19:01:10 -05:00
mrq
caad99ab78 fix for bsz>1 because I forgot the old implementation implicitly handles this 2025-04-02 17:17:37 -05:00
mrq
068dbdb785 ugh 2025-04-02 17:05:16 -05:00
mrq
0e995dbf2c is this my last cope (falling back to explicit duration prediction, as this regression just won't go away) (also the smaller model was lobotomized because of my ROCm setup having a botched SDPA for who knows why) 2025-04-02 17:01:24 -05:00
mrq
7a0956863d oops 2025-03-31 21:11:43 -05:00
mrq
a1184586ef should never have trusted mse_loss, it never works 2025-03-31 20:59:13 -05:00
mrq
99f251c768 slight tweaks to condition-less NS/SR 2025-03-30 10:37:40 -05:00
mrq
478aea0e8c tweaks 2025-03-28 19:49:54 -05:00
mrq
6ae282e090 re-added noise dataloader sampler whatever for the old implementation's other tasks that require it 2025-03-28 15:07:06 -05:00
mrq
90b3509404 I'll just cope and say I cannot apply segmented attention masks to the smaller model as it's too trained on not doing it, and the regression came from dumb python aliasing rules 2025-03-27 13:27:51 -05:00
mrq
2fd82a7a22 cannot get segmented mask to actually work without gradients exploding (need to find a different way to do duration prediction...) 2025-03-27 00:51:41 -05:00
mrq
4d777b5618 add remark that segmented attention actually might be broken (for some reason this only emerged recently, need to investigate) 2025-03-26 12:08:47 -05:00
mrq
09e9438941 ugh 2025-03-25 23:24:01 -05:00
mrq
8641c87611 nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression) 2025-03-25 23:06:16 -05:00
mrq
aa8b32d97e added more notes (although I could have sworn I have had more notes that i can't recall) 2025-03-25 18:53:06 -05:00
mrq
df5b870908 added remark about not using sliding attention 2025-03-22 12:44:34 -05:00
mrq
02a8bcbe29 fixed errant index error (although it makes me wonder if my segmented masking is still flawed) 2025-03-21 23:41:34 -05:00
mrq
d1d91295b3 add segmented sliding attention, also found a bug with prom-less segments in the attention mask generation......... 2025-03-21 19:05:49 -05:00
mrq
589cfb0e18 yuge speedup because of a dumb oversight 2025-03-20 17:39:41 -05:00
mrq
8068f24e35 cleaned up parallel nar, i think it's slightly faster but even the smallest model is still slower than ar+nar-len-llama-8... 2025-03-20 15:56:15 -05:00
mrq
9a7458cf17 fixed inferencing since I did delete the len_emb, some more notes on the model since it seems I just had bad experimental settings 2025-03-19 22:41:48 -05:00
mrq
61de653ad9 now causal training should work again 2025-03-19 14:20:19 -05:00
mrq
85b9dd47c1 ugh 2025-03-19 13:31:50 -05:00
mrq
81acd565b3 re-enable these 2025-03-18 20:59:33 -05:00
mrq
5479d2eacc more tweaks to the new implementation (properly trim the len stuff to save some params, decoder to d_ffn expansion to 2 to maybe also make it faster, etc.) 2025-03-18 19:34:37 -05:00
mrq
9a8a8e3195 off by one bateman 2025-03-18 08:40:43 -05:00