|
8068f24e35
|
cleaned up parallel nar, i think it's slightly faster but even the smallest model is still slower than ar+nar-len-llama-8...
|
2025-03-20 15:56:15 -05:00 |
|
|
9a7458cf17
|
fixed inferencing since I did delete the len_emb, some more notes on the model since it seems I just had bad experimental settings
|
2025-03-19 22:41:48 -05:00 |
|
|
61de653ad9
|
now causal training should work again
|
2025-03-19 14:20:19 -05:00 |
|
|
85b9dd47c1
|
ugh
|
2025-03-19 13:31:50 -05:00 |
|
|
81acd565b3
|
re-enable these
|
2025-03-18 20:59:33 -05:00 |
|
|
5479d2eacc
|
more tweaks to the new implementation (properly trim the len stuff to save some params, decoder to d_ffn expansion to 2 to maybe also make it faster, etc.)
|
2025-03-18 19:34:37 -05:00 |
|
|
9a8a8e3195
|
off by one bateman
|
2025-03-18 08:40:43 -05:00 |
|
|
0280e72257
|
ugh
|
2025-03-17 21:49:45 -05:00 |
|
|
b0dba9db07
|
this may bite me in the ass
|
2025-03-17 21:46:50 -05:00 |
|
|
2dfef693c4
|
comments for clarity
|
2025-03-16 11:30:23 -05:00 |
|
|
c5475ebc91
|
another dataloader optimization
|
2025-03-15 20:18:58 -05:00 |
|
|
bee2688dea
|
ugh
|
2025-03-15 16:50:21 -05:00 |
|
|
2053580838
|
updated dataloader to hopefully reduce RAM usage
|
2025-03-15 13:14:37 -05:00 |
|
|
9cfbf94b1c
|
config-ify the len_loss_factor
|
2025-03-14 20:30:48 -05:00 |
|
|
ca8cc15271
|
more tweaks (vall_e.webui --yaml still breaks things, --model needs to deduce what audio backend now that im supporting other ones again // added easy top-sampler settings back for new implementation)
|
2025-03-14 20:18:25 -05:00 |
|
|
6ee505cffd
|
fixed dac
|
2025-03-12 23:17:27 -05:00 |
|
|
ba5f3d19b4
|
use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes)
|
2025-03-12 22:47:19 -05:00 |
|
|
2ccf1b5740
|
actually do duration prediction
|
2025-03-11 22:14:54 -05:00 |
|
|
5c512717a6
|
len prediction for new model (and remove logit normalization since it kills inferencing)
|
2025-03-11 20:33:09 -05:00 |
|
|
5f98543d4d
|
ughh
|
2025-03-10 21:18:57 -05:00 |
|
|
8ac03aac8a
|
ugh
|
2025-03-10 21:14:56 -05:00 |
|
|
5670fcb23f
|
hopefully the final tweaks needed for this bastard of a model
|
2025-03-10 20:59:11 -05:00 |
|
|
00d1fed217
|
another optimization (within the dataloader because the similar utterance sampler was mondo slow)
|
2025-03-08 17:10:50 -06:00 |
|
|
5e9d1a5302
|
one more time one more time (this normalization isn't a spook)
|
2025-03-07 19:32:42 -06:00 |
|
|
93044829af
|
one more time (could have sworn i tested it with batch size > 1)
|
2025-03-07 19:14:33 -06:00 |
|
|
6cea840710
|
oops
|
2025-03-07 18:57:25 -06:00 |
|
|
dbd34b6430
|
add specialized calc_loss because schizo
|
2025-03-07 18:44:11 -06:00 |
|
|
8d848ed549
|
handle case of dropping cond for segment mask
|
2025-03-07 14:11:58 -06:00 |
|
|
89e52b9877
|
ugh
|
2025-03-07 13:55:57 -06:00 |
|
|
6afc2b7526
|
gut feeling to change the attention mask
|
2025-03-07 13:51:59 -06:00 |
|
|
91ede71cf0
|
ugh
|
2025-03-06 17:19:27 -06:00 |
|
|
2dd80a03ff
|
stuff for interfacing with the loss scaler value (because I want to cap it)
|
2025-03-06 17:07:29 -06:00 |
|
|
a30dffcca7
|
wandb additions (to-do eventually, upload samples as artifacts)
|
2025-03-06 15:44:40 -06:00 |
|
|
ec87308d75
|
final tweaks before training this meme 44khz model for the 3rd time
|
2025-03-06 15:31:15 -06:00 |
|
|
5cd71ef238
|
QoL so I can stop having to manually inject different configs
|
2025-03-06 14:48:14 -06:00 |
|
|
0d809561c6
|
accuracy k=1 and k=80 because im probably dumb for k=10 as the default since it does not represent any usecase
|
2025-03-05 16:35:34 -06:00 |
|
|
2fb2b732fc
|
wow that was fast
|
2025-03-04 23:17:18 -06:00 |
|
|
462f71e2f7
|
ugh
|
2025-03-04 14:57:00 -06:00 |
|
|
1cd24f3381
|
a birdie tells me i should probably use a different optimizer (also preliminary support for native sparse attention but I don't know if I'll use it)
|
2025-03-04 14:53:02 -06:00 |
|
|
0451f75e33
|
now that the new model seems a little more promising, i can re-document things non-cynically
|
2025-03-03 13:21:41 -06:00 |
|
|
3f1070f575
|
tweaks
|
2025-03-02 22:36:25 -06:00 |
|
|
4afa4ccce5
|
at wits end (parhaps the semantic token approach is the toughest pill to swallow)
|
2025-03-01 21:03:25 -06:00 |
|
|
1d3290b023
|
could have sworn this worked before, might have broke it when i decoupled from omegaconf
|
2025-03-01 19:30:26 -06:00 |
|
|
17094b8002
|
reticulating splines
|
2025-03-01 17:48:51 -06:00 |
|
|
56f8be4d62
|
lol
|
2025-02-28 22:15:37 -06:00 |
|
|
ddc49c89c5
|
the learning rate scheduler pill is a tough pill to swallow
|
2025-02-28 22:12:19 -06:00 |
|
|
b97faa8173
|
fixes...
|
2025-02-28 18:53:07 -06:00 |
|
|
4e7d885542
|
lol
|
2025-02-28 18:06:41 -06:00 |
|
|
a174c33db6
|
a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow)
|
2025-02-28 17:56:50 -06:00 |
|
|
09d82a26fe
|
ugh
|
2025-02-28 01:06:38 -06:00 |
|