|
5fe01ffc6c
|
more notes / re-enabled top-k/p samplers for new implementation
|
2025-04-19 14:04:34 -05:00 |
|
|
d9e18037cc
|
new implementation tweaks and fixes to make it actually better (there were a lot of badwrong things being done that harmed the output quality, will evaluate the model further)
|
2025-04-18 20:36:44 -05:00 |
|
|
98d1d8cb1e
|
added some more notes, tweaks (RIP DAC, it's over)
|
2025-04-17 20:24:40 -05:00 |
|
|
6d42c9ae23
|
how foolish of me, not having a softmax as float32 (maybe addresses an emergent regression where bfloat16 training shits the bed where float16+loss scaling doesnt)
|
2025-04-07 22:51:52 -05:00 |
|
|
d6cd848c32
|
goodbye nvidia/audio-codec-44khz, crossed fingers for DAC again
|
2025-04-06 21:05:29 -05:00 |
|
|
2e93438867
|
reintroduced sampler_type = speaker because I think this might salvage the nemo model to have better speaker similarities
|
2025-04-03 19:01:10 -05:00 |
|
|
0e995dbf2c
|
is this my last cope (falling back to explicit duration prediction, as this regression just won't go away) (also the smaller model was lobotomized because of my ROCm setup having a botched SDPA for who knows why)
|
2025-04-02 17:01:24 -05:00 |
|
|
6ae282e090
|
re-added noise dataloader sampler whatever for the old implementation's other tasks that require it
|
2025-03-28 15:07:06 -05:00 |
|
|
90b3509404
|
I'll just cope and say I cannot apply segmented attention masks to the smaller model as it's too trained on not doing it, and the regression came from dumb python aliasing rules
|
2025-03-27 13:27:51 -05:00 |
|
|
2fd82a7a22
|
cannot get segmented mask to actually work without gradients exploding (need to find a different way to do duration prediction...)
|
2025-03-27 00:51:41 -05:00 |
|
|
4d777b5618
|
add remark that segmented attention actually might be broken (for some reason this only emerged recently, need to investigate)
|
2025-03-26 12:08:47 -05:00 |
|
|
8641c87611
|
nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression)
|
2025-03-25 23:06:16 -05:00 |
|
|
aa8b32d97e
|
added more notes (although I could have sworn I have had more notes that i can't recall)
|
2025-03-25 18:53:06 -05:00 |
|
|
df5b870908
|
added remark about not using sliding attention
|
2025-03-22 12:44:34 -05:00 |
|
|
9a7458cf17
|
fixed inferencing since I did delete the len_emb, some more notes on the model since it seems I just had bad experimental settings
|
2025-03-19 22:41:48 -05:00 |
|
|
81acd565b3
|
re-enable these
|
2025-03-18 20:59:33 -05:00 |
|
|
b0dba9db07
|
this may bite me in the ass
|
2025-03-17 21:46:50 -05:00 |
|
|
2dfef693c4
|
comments for clarity
|
2025-03-16 11:30:23 -05:00 |
|
|
9cfbf94b1c
|
config-ify the len_loss_factor
|
2025-03-14 20:30:48 -05:00 |
|
|
ba5f3d19b4
|
use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes)
|
2025-03-12 22:47:19 -05:00 |
|
|
5c512717a6
|
len prediction for new model (and remove logit normalization since it kills inferencing)
|
2025-03-11 20:33:09 -05:00 |
|