Commit Graph

823 Commits

Author SHA1 Message Date
mrq
b0dba9db07 this may bite me in the ass 2025-03-17 21:46:50 -05:00
mrq
2dfef693c4 comments for clarity 2025-03-16 11:30:23 -05:00
mrq
c5475ebc91 another dataloader optimization 2025-03-15 20:18:58 -05:00
mrq
bee2688dea ugh 2025-03-15 16:50:21 -05:00
mrq
2053580838 updated dataloader to hopefully reduce RAM usage 2025-03-15 13:14:37 -05:00
mrq
9cfbf94b1c config-ify the len_loss_factor 2025-03-14 20:30:48 -05:00
mrq
ca8cc15271 more tweaks (vall_e.webui --yaml still breaks things, --model needs to deduce what audio backend now that im supporting other ones again // added easy top-sampler settings back for new implementation) 2025-03-14 20:18:25 -05:00
mrq
6ee505cffd fixed dac 2025-03-12 23:17:27 -05:00
mrq
ba5f3d19b4 use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes) 2025-03-12 22:47:19 -05:00
mrq
2ccf1b5740 actually do duration prediction 2025-03-11 22:14:54 -05:00
mrq
5c512717a6 len prediction for new model (and remove logit normalization since it kills inferencing) 2025-03-11 20:33:09 -05:00
mrq
5f98543d4d ughh 2025-03-10 21:18:57 -05:00
mrq
8ac03aac8a ugh 2025-03-10 21:14:56 -05:00
mrq
5670fcb23f hopefully the final tweaks needed for this bastard of a model 2025-03-10 20:59:11 -05:00
mrq
00d1fed217 another optimization (within the dataloader because the similar utterance sampler was mondo slow) 2025-03-08 17:10:50 -06:00
mrq
5e9d1a5302 one more time one more time (this normalization isn't a spook) 2025-03-07 19:32:42 -06:00
mrq
93044829af one more time (could have sworn i tested it with batch size > 1) 2025-03-07 19:14:33 -06:00
mrq
6cea840710 oops 2025-03-07 18:57:25 -06:00
mrq
dbd34b6430 add specialized calc_loss because schizo 2025-03-07 18:44:11 -06:00
mrq
8d848ed549 handle case of dropping cond for segment mask 2025-03-07 14:11:58 -06:00
mrq
89e52b9877 ugh 2025-03-07 13:55:57 -06:00
mrq
6afc2b7526 gut feeling to change the attention mask 2025-03-07 13:51:59 -06:00
mrq
91ede71cf0 ugh 2025-03-06 17:19:27 -06:00
mrq
2dd80a03ff stuff for interfacing with the loss scaler value (because I want to cap it) 2025-03-06 17:07:29 -06:00
mrq
a30dffcca7 wandb additions (to-do eventually, upload samples as artifacts) 2025-03-06 15:44:40 -06:00
mrq
ec87308d75 final tweaks before training this meme 44khz model for the 3rd time 2025-03-06 15:31:15 -06:00
mrq
5cd71ef238 QoL so I can stop having to manually inject different configs 2025-03-06 14:48:14 -06:00
mrq
0d809561c6 accuracy k=1 and k=80 because im probably dumb for k=10 as the default since it does not represent any usecase 2025-03-05 16:35:34 -06:00
mrq
2fb2b732fc wow that was fast 2025-03-04 23:17:18 -06:00
mrq
462f71e2f7 ugh 2025-03-04 14:57:00 -06:00
mrq
1cd24f3381 a birdie tells me i should probably use a different optimizer (also preliminary support for native sparse attention but I don't know if I'll use it) 2025-03-04 14:53:02 -06:00
mrq
0451f75e33 now that the new model seems a little more promising, i can re-document things non-cynically 2025-03-03 13:21:41 -06:00
mrq
3f1070f575 tweaks 2025-03-02 22:36:25 -06:00
mrq
4afa4ccce5 at wits end (parhaps the semantic token approach is the toughest pill to swallow) 2025-03-01 21:03:25 -06:00
mrq
1d3290b023 could have sworn this worked before, might have broke it when i decoupled from omegaconf 2025-03-01 19:30:26 -06:00
mrq
17094b8002 reticulating splines 2025-03-01 17:48:51 -06:00
mrq
56f8be4d62 lol 2025-02-28 22:15:37 -06:00
mrq
ddc49c89c5 the learning rate scheduler pill is a tough pill to swallow 2025-02-28 22:12:19 -06:00
mrq
b97faa8173 fixes... 2025-02-28 18:53:07 -06:00
mrq
4e7d885542 lol 2025-02-28 18:06:41 -06:00
mrq
a174c33db6 a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow) 2025-02-28 17:56:50 -06:00
mrq
09d82a26fe ugh 2025-02-28 01:06:38 -06:00
mrq
93feb5660f do not like that 2025-02-27 23:59:56 -06:00
mrq
f4f435d7f5 when you already had these ideas to stabilize training but you just ignored them 2025-02-27 23:39:20 -06:00
mrq
0a45c9c042 fix attention backend not being used 2025-02-27 21:38:38 -06:00
mrq
b8e9f3d785 maybe this will work 2025-02-27 20:42:12 -06:00
mrq
01e96bafc9 ugh 2025-02-27 19:05:32 -06:00
mrq
eff180248c decoupled llama backend to avoid any funny changes from transformers, removed other backends since i dont think i'll ever bother using them 2025-02-27 19:00:37 -06:00
mrq
ceecac6ffe I think I made resp_parallel_training=True faster with loss factoring? 2025-02-26 23:13:32 -06:00
mrq
06ef3daf3c require minimum of 1 second durations for training because of my slop code auto-transposing that I don't wanna fix right now 2025-02-26 22:00:33 -06:00