Commit Graph

819 Commits

Author SHA1 Message Date
mrq
2053580838 updated dataloader to hopefully reduce RAM usage 2025-03-15 13:14:37 -05:00
mrq
9cfbf94b1c config-ify the len_loss_factor 2025-03-14 20:30:48 -05:00
mrq
ca8cc15271 more tweaks (vall_e.webui --yaml still breaks things, --model needs to deduce what audio backend now that im supporting other ones again // added easy top-sampler settings back for new implementation) 2025-03-14 20:18:25 -05:00
mrq
6ee505cffd fixed dac 2025-03-12 23:17:27 -05:00
mrq
ba5f3d19b4 use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes) 2025-03-12 22:47:19 -05:00
mrq
2ccf1b5740 actually do duration prediction 2025-03-11 22:14:54 -05:00
mrq
5c512717a6 len prediction for new model (and remove logit normalization since it kills inferencing) 2025-03-11 20:33:09 -05:00
mrq
5f98543d4d ughh 2025-03-10 21:18:57 -05:00
mrq
8ac03aac8a ugh 2025-03-10 21:14:56 -05:00
mrq
5670fcb23f hopefully the final tweaks needed for this bastard of a model 2025-03-10 20:59:11 -05:00
mrq
00d1fed217 another optimization (within the dataloader because the similar utterance sampler was mondo slow) 2025-03-08 17:10:50 -06:00
mrq
5e9d1a5302 one more time one more time (this normalization isn't a spook) 2025-03-07 19:32:42 -06:00
mrq
93044829af one more time (could have sworn i tested it with batch size > 1) 2025-03-07 19:14:33 -06:00
mrq
6cea840710 oops 2025-03-07 18:57:25 -06:00
mrq
dbd34b6430 add specialized calc_loss because schizo 2025-03-07 18:44:11 -06:00
mrq
8d848ed549 handle case of dropping cond for segment mask 2025-03-07 14:11:58 -06:00
mrq
89e52b9877 ugh 2025-03-07 13:55:57 -06:00
mrq
6afc2b7526 gut feeling to change the attention mask 2025-03-07 13:51:59 -06:00
mrq
91ede71cf0 ugh 2025-03-06 17:19:27 -06:00
mrq
2dd80a03ff stuff for interfacing with the loss scaler value (because I want to cap it) 2025-03-06 17:07:29 -06:00
mrq
a30dffcca7 wandb additions (to-do eventually, upload samples as artifacts) 2025-03-06 15:44:40 -06:00
mrq
ec87308d75 final tweaks before training this meme 44khz model for the 3rd time 2025-03-06 15:31:15 -06:00
mrq
5cd71ef238 QoL so I can stop having to manually inject different configs 2025-03-06 14:48:14 -06:00
mrq
0d809561c6 accuracy k=1 and k=80 because im probably dumb for k=10 as the default since it does not represent any usecase 2025-03-05 16:35:34 -06:00
mrq
2fb2b732fc wow that was fast 2025-03-04 23:17:18 -06:00
mrq
462f71e2f7 ugh 2025-03-04 14:57:00 -06:00
mrq
1cd24f3381 a birdie tells me i should probably use a different optimizer (also preliminary support for native sparse attention but I don't know if I'll use it) 2025-03-04 14:53:02 -06:00
mrq
0451f75e33 now that the new model seems a little more promising, i can re-document things non-cynically 2025-03-03 13:21:41 -06:00
mrq
3f1070f575 tweaks 2025-03-02 22:36:25 -06:00
mrq
4afa4ccce5 at wits end (parhaps the semantic token approach is the toughest pill to swallow) 2025-03-01 21:03:25 -06:00
mrq
1d3290b023 could have sworn this worked before, might have broke it when i decoupled from omegaconf 2025-03-01 19:30:26 -06:00
mrq
17094b8002 reticulating splines 2025-03-01 17:48:51 -06:00
mrq
56f8be4d62 lol 2025-02-28 22:15:37 -06:00
mrq
ddc49c89c5 the learning rate scheduler pill is a tough pill to swallow 2025-02-28 22:12:19 -06:00
mrq
b97faa8173 fixes... 2025-02-28 18:53:07 -06:00
mrq
4e7d885542 lol 2025-02-28 18:06:41 -06:00
mrq
a174c33db6 a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow) 2025-02-28 17:56:50 -06:00
mrq
09d82a26fe ugh 2025-02-28 01:06:38 -06:00
mrq
93feb5660f do not like that 2025-02-27 23:59:56 -06:00
mrq
f4f435d7f5 when you already had these ideas to stabilize training but you just ignored them 2025-02-27 23:39:20 -06:00
mrq
0a45c9c042 fix attention backend not being used 2025-02-27 21:38:38 -06:00
mrq
b8e9f3d785 maybe this will work 2025-02-27 20:42:12 -06:00
mrq
01e96bafc9 ugh 2025-02-27 19:05:32 -06:00
mrq
eff180248c decoupled llama backend to avoid any funny changes from transformers, removed other backends since i dont think i'll ever bother using them 2025-02-27 19:00:37 -06:00
mrq
ceecac6ffe I think I made resp_parallel_training=True faster with loss factoring? 2025-02-26 23:13:32 -06:00
mrq
06ef3daf3c require minimum of 1 second durations for training because of my slop code auto-transposing that I don't wanna fix right now 2025-02-26 22:00:33 -06:00
mrq
cbd4d7d7f4 ugh 2025-02-26 21:31:10 -06:00
mrq
2ea387c08a segregated experimental changes into its own streamlined file to avoid breaking the existing model, and it can pivot to the cleaned up code if it actually works (nothing is working) 2025-02-26 21:26:13 -06:00
mrq
7d2e64630c lol 2025-02-26 10:49:06 -06:00
mrq
95da4e9405 made muon actually work by actually utilizing param groups (thanks APOLLO for reminding me this is the sane way to handle this split) 2025-02-26 10:39:13 -06:00