Commit Graph

21 Commits

Author SHA1 Message Date
mrq
5fe01ffc6c more notes / re-enabled top-k/p samplers for new implementation 2025-04-19 14:04:34 -05:00
mrq
d9e18037cc new implementation tweaks and fixes to make it actually better (there were a lot of badwrong things being done that harmed the output quality, will evaluate the model further) 2025-04-18 20:36:44 -05:00
mrq
98d1d8cb1e added some more notes, tweaks (RIP DAC, it's over) 2025-04-17 20:24:40 -05:00
mrq
6d42c9ae23 how foolish of me, not having a softmax as float32 (maybe addresses an emergent regression where bfloat16 training shits the bed where float16+loss scaling doesnt) 2025-04-07 22:51:52 -05:00
mrq
d6cd848c32 goodbye nvidia/audio-codec-44khz, crossed fingers for DAC again 2025-04-06 21:05:29 -05:00
mrq
2e93438867 reintroduced sampler_type = speaker because I think this might salvage the nemo model to have better speaker similarities 2025-04-03 19:01:10 -05:00
mrq
0e995dbf2c is this my last cope (falling back to explicit duration prediction, as this regression just won't go away) (also the smaller model was lobotomized because of my ROCm setup having a botched SDPA for who knows why) 2025-04-02 17:01:24 -05:00
mrq
6ae282e090 re-added noise dataloader sampler whatever for the old implementation's other tasks that require it 2025-03-28 15:07:06 -05:00
mrq
90b3509404 I'll just cope and say I cannot apply segmented attention masks to the smaller model as it's too trained on not doing it, and the regression came from dumb python aliasing rules 2025-03-27 13:27:51 -05:00
mrq
2fd82a7a22 cannot get segmented mask to actually work without gradients exploding (need to find a different way to do duration prediction...) 2025-03-27 00:51:41 -05:00
mrq
4d777b5618 add remark that segmented attention actually might be broken (for some reason this only emerged recently, need to investigate) 2025-03-26 12:08:47 -05:00
mrq
8641c87611 nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression) 2025-03-25 23:06:16 -05:00
mrq
aa8b32d97e added more notes (although I could have sworn I have had more notes that i can't recall) 2025-03-25 18:53:06 -05:00
mrq
df5b870908 added remark about not using sliding attention 2025-03-22 12:44:34 -05:00
mrq
9a7458cf17 fixed inferencing since I did delete the len_emb, some more notes on the model since it seems I just had bad experimental settings 2025-03-19 22:41:48 -05:00
mrq
81acd565b3 re-enable these 2025-03-18 20:59:33 -05:00
mrq
b0dba9db07 this may bite me in the ass 2025-03-17 21:46:50 -05:00
mrq
2dfef693c4 comments for clarity 2025-03-16 11:30:23 -05:00
mrq
9cfbf94b1c config-ify the len_loss_factor 2025-03-14 20:30:48 -05:00
mrq
ba5f3d19b4 use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes) 2025-03-12 22:47:19 -05:00
mrq
5c512717a6 len prediction for new model (and remove logit normalization since it kills inferencing) 2025-03-11 20:33:09 -05:00