Commit Graph

24 Commits

Author SHA1 Message Date
mrq
b2b243e7e7 addresses #9 2025-05-05 13:03:44 -05:00
mrq
d9e18037cc new implementation tweaks and fixes to make it actually better (there were a lot of badwrong things being done that harmed the output quality, will evaluate the model further) 2025-04-18 20:36:44 -05:00
mrq
f144389920 the culprit was initializing the level_weights for killing newly trained models............. 2025-04-10 23:06:16 -05:00
mrq
44260f7445 tweaks 2025-04-05 10:27:07 -05:00
mrq
0e995dbf2c is this my last cope (falling back to explicit duration prediction, as this regression just won't go away) (also the smaller model was lobotomized because of my ROCm setup having a botched SDPA for who knows why) 2025-04-02 17:01:24 -05:00
mrq
a1184586ef should never have trusted mse_loss, it never works 2025-03-31 20:59:13 -05:00
mrq
2fd82a7a22 cannot get segmented mask to actually work without gradients exploding (need to find a different way to do duration prediction...) 2025-03-27 00:51:41 -05:00
mrq
8641c87611 nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression) 2025-03-25 23:06:16 -05:00
mrq
8068f24e35 cleaned up parallel nar, i think it's slightly faster but even the smallest model is still slower than ar+nar-len-llama-8... 2025-03-20 15:56:15 -05:00
mrq
81acd565b3 re-enable these 2025-03-18 20:59:33 -05:00
mrq
b0dba9db07 this may bite me in the ass 2025-03-17 21:46:50 -05:00
mrq
ca8cc15271 more tweaks (vall_e.webui --yaml still breaks things, --model needs to deduce what audio backend now that im supporting other ones again // added easy top-sampler settings back for new implementation) 2025-03-14 20:18:25 -05:00
mrq
6ee505cffd fixed dac 2025-03-12 23:17:27 -05:00
mrq
ba5f3d19b4 use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes) 2025-03-12 22:47:19 -05:00
mrq
5c512717a6 len prediction for new model (and remove logit normalization since it kills inferencing) 2025-03-11 20:33:09 -05:00
mrq
8ac03aac8a ugh 2025-03-10 21:14:56 -05:00
mrq
93044829af one more time (could have sworn i tested it with batch size > 1) 2025-03-07 19:14:33 -06:00
mrq
5cd71ef238 QoL so I can stop having to manually inject different configs 2025-03-06 14:48:14 -06:00
mrq
1cd24f3381 a birdie tells me i should probably use a different optimizer (also preliminary support for native sparse attention but I don't know if I'll use it) 2025-03-04 14:53:02 -06:00
mrq
ddc49c89c5 the learning rate scheduler pill is a tough pill to swallow 2025-02-28 22:12:19 -06:00
mrq
a174c33db6 a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow) 2025-02-28 17:56:50 -06:00
mrq
93feb5660f do not like that 2025-02-27 23:59:56 -06:00
mrq
b8e9f3d785 maybe this will work 2025-02-27 20:42:12 -06:00
mrq
2ea387c08a segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working) 2025-02-26 21:26:13 -06:00