|
d6cd848c32
|
goodbye nvidia/audio-codec-44khz, crossed fingers for DAC again
|
2025-04-06 21:05:29 -05:00 |
|
|
1e22519d94
|
diagnosed both hf/llama.cpp versions to probably just being a faulty export method (to-do: migrate vall_e.models.base to vall_e.export --hf)
|
2025-04-05 22:05:39 -05:00 |
|
|
c34763769a
|
ugh
|
2025-04-05 18:58:25 -05:00 |
|
|
b6692ce3de
|
ugh
|
2025-04-05 18:20:46 -05:00 |
|
|
4a909ceff8
|
temp fix for vall_e.cpp demask scoring regression
|
2025-04-05 11:04:26 -05:00 |
|
|
44260f7445
|
tweaks
|
2025-04-05 10:27:07 -05:00 |
|
|
0ede3bfc12
|
updated vall_e.cpp, but i could have sworn it worked much better than this......
|
2025-04-05 01:22:51 -05:00 |
|
|
28d39ef962
|
should not be working late
|
2025-04-03 23:32:58 -05:00 |
|
|
bfe70e9d56
|
ugh
|
2025-04-03 23:26:00 -05:00 |
|
|
2e93438867
|
reintroduced sampler_type = speaker because I think this might salvage the nemo model to have better speaker similarities
|
2025-04-03 19:01:10 -05:00 |
|
|
caad99ab78
|
fix for bsz>1 because I forgot the old implementation implicitly handles this
|
2025-04-02 17:17:37 -05:00 |
|
|
068dbdb785
|
ugh
|
2025-04-02 17:05:16 -05:00 |
|
|
0e995dbf2c
|
is this my last cope (falling back to explicit duration prediction, as this regression just won't go away) (also the smaller model was lobotomized because of my ROCm setup having a botched SDPA for who knows why)
|
2025-04-02 17:01:24 -05:00 |
|
|
7a0956863d
|
oops
|
2025-03-31 21:11:43 -05:00 |
|
|
a1184586ef
|
should never have trusted mse_loss, it never works
|
2025-03-31 20:59:13 -05:00 |
|
|
99f251c768
|
slight tweaks to condition-less NS/SR
|
2025-03-30 10:37:40 -05:00 |
|
|
478aea0e8c
|
tweaks
|
2025-03-28 19:49:54 -05:00 |
|
|
6ae282e090
|
re-added noise dataloader sampler whatever for the old implementation's other tasks that require it
|
2025-03-28 15:07:06 -05:00 |
|
|
90b3509404
|
I'll just cope and say I cannot apply segmented attention masks to the smaller model as it's too trained on not doing it, and the regression came from dumb python aliasing rules
|
2025-03-27 13:27:51 -05:00 |
|
|
2fd82a7a22
|
cannot get segmented mask to actually work without gradients exploding (need to find a different way to do duration prediction...)
|
2025-03-27 00:51:41 -05:00 |
|
|
4d777b5618
|
add remark that segmented attention actually might be broken (for some reason this only emerged recently, need to investigate)
|
2025-03-26 12:08:47 -05:00 |
|
|
09e9438941
|
ugh
|
2025-03-25 23:24:01 -05:00 |
|
|
8641c87611
|
nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression)
|
2025-03-25 23:06:16 -05:00 |
|
|
aa8b32d97e
|
added more notes (although I could have sworn I have had more notes that i can't recall)
|
2025-03-25 18:53:06 -05:00 |
|
|
df5b870908
|
added remark about not using sliding attention
|
2025-03-22 12:44:34 -05:00 |
|
|
02a8bcbe29
|
fixed errant index error (although it makes me wonder if my segmented masking is still flawed)
|
2025-03-21 23:41:34 -05:00 |
|
|
d1d91295b3
|
add segmented sliding attention, also found a bug with prom-less segments in the attention mask generation.........
|
2025-03-21 19:05:49 -05:00 |
|
|
589cfb0e18
|
yuge speedup because of a dumb oversight
|
2025-03-20 17:39:41 -05:00 |
|
|
8068f24e35
|
cleaned up parallel nar, i think it's slightly faster but even the smallest model is still slower than ar+nar-len-llama-8...
|
2025-03-20 15:56:15 -05:00 |
|
|
9a7458cf17
|
fixed inferencing since I did delete the len_emb, some more notes on the model since it seems I just had bad experimental settings
|
2025-03-19 22:41:48 -05:00 |
|
|
61de653ad9
|
now causal training should work again
|
2025-03-19 14:20:19 -05:00 |
|
|
85b9dd47c1
|
ugh
|
2025-03-19 13:31:50 -05:00 |
|
|
81acd565b3
|
re-enable these
|
2025-03-18 20:59:33 -05:00 |
|
|
5479d2eacc
|
more tweaks to the new implementation (properly trim the len stuff to save some params, decoder to d_ffn expansion to 2 to maybe also make it faster, etc.)
|
2025-03-18 19:34:37 -05:00 |
|
|
9a8a8e3195
|
off by one bateman
|
2025-03-18 08:40:43 -05:00 |
|
|
0280e72257
|
ugh
|
2025-03-17 21:49:45 -05:00 |
|
|
b0dba9db07
|
this may bite me in the ass
|
2025-03-17 21:46:50 -05:00 |
|
|
2dfef693c4
|
comments for clarity
|
2025-03-16 11:30:23 -05:00 |
|
|
c5475ebc91
|
another dataloader optimization
|
2025-03-15 20:18:58 -05:00 |
|
|
bee2688dea
|
ugh
|
2025-03-15 16:50:21 -05:00 |
|
|
2053580838
|
updated dataloader to hopefully reduce RAM usage
|
2025-03-15 13:14:37 -05:00 |
|
|
9cfbf94b1c
|
config-ify the len_loss_factor
|
2025-03-14 20:30:48 -05:00 |
|
|
ca8cc15271
|
more tweaks (vall_e.webui --yaml still breaks things, --model needs to deduce what audio backend now that im supporting other ones again // added easy top-sampler settings back for new implementation)
|
2025-03-14 20:18:25 -05:00 |
|
|
6ee505cffd
|
fixed dac
|
2025-03-12 23:17:27 -05:00 |
|
|
ba5f3d19b4
|
use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes)
|
2025-03-12 22:47:19 -05:00 |
|
|
2ccf1b5740
|
actually do duration prediction
|
2025-03-11 22:14:54 -05:00 |
|
|
5c512717a6
|
len prediction for new model (and remove logit normalization since it kills inferencing)
|
2025-03-11 20:33:09 -05:00 |
|
|
5f98543d4d
|
ughh
|
2025-03-10 21:18:57 -05:00 |
|
|
8ac03aac8a
|
ugh
|
2025-03-10 21:14:56 -05:00 |
|
|
5670fcb23f
|
hopefully the final tweaks needed for this bastard of a model
|
2025-03-10 20:59:11 -05:00 |
|