Commit Graph

  • 7f4206a879 fixing an error I caught while fixing tortoise_tts, possibly actually load a LoRA if not passing a yaml/model master mrq 2025-07-24 20:56:09 -0500
  • 98b357cc53 things i forgot to do last week now that some mental faculties were restored mrq 2025-05-30 22:56:07 -0500
  • 0cca4eb943 disable this cringe precheck for now since it causes problems mrq 2025-05-22 13:21:36 -0500
  • f12746b091 allow defining the default model name through env var, register nemo-larger in the model name list thing mrq 2025-05-21 16:50:59 -0500
  • e46d7ef2cb warn and ignore export when lora training because the state dict exported during training is wrong mrq 2025-05-20 23:38:10 -0500
  • fee02f4153 added option to explicitly load a lora without having to lobotomize yourself with creating a yaml just to do so mrq 2025-05-20 23:28:29 -0500
  • 5018ddb107 i dont know why this managed to escape my attention mrq 2025-05-20 15:13:21 -0500
  • b2b243e7e7 addresses #9 mrq 2025-05-05 13:03:44 -0500
  • 5fe01ffc6c more notes / re-enabled top-k/p samplers for new implementation mrq 2025-04-19 14:04:34 -0500
  • f8e1d110dc when you uhh when you for once use your main rig to test and forgot to and when you port things back over mrq 2025-04-18 20:49:00 -0500
  • d9e18037cc new implementation tweaks and fixes to make it actually better (there were a lot of badwrong things being done that harmed the output quality, will evaluate the model further) mrq 2025-04-18 20:36:44 -0500
  • 98d1d8cb1e added some more notes, tweaks (RIP DAC, it's over) mrq 2025-04-17 20:24:40 -0500
  • 9e27d2e02e huggingface zerogpu cringe mrq 2025-04-16 15:25:45 -0500
  • 814146a5e0 more settings bloat because there seems to be instability with the encoder as-is mrq 2025-04-12 12:53:44 -0500
  • f144389920 the culprit was initializing the level_weights for killing newly trained models............. mrq 2025-04-10 23:06:16 -0500
  • 6c6a34dd21 i can't be assed to test if the prior commit works so being explicit like this should help until i can be bothered to halt training just to test this mrq 2025-04-07 23:13:35 -0500
  • 6d42c9ae23 how foolish of me, not having a softmax as float32 (maybe addresses an emergent regression where bfloat16 training shits the bed where float16+loss scaling doesnt) mrq 2025-04-07 22:51:52 -0500
  • d6cd848c32 goodbye nvidia/audio-codec-44khz, crossed fingers for DAC again mrq 2025-04-06 21:05:29 -0500
  • 1e22519d94 diagnosed both hf/llama.cpp versions to probably just being a faulty export method (to-do: migrate vall_e.models.base to vall_e.export --hf) mrq 2025-04-05 22:05:39 -0500
  • c34763769a ugh mrq 2025-04-05 18:58:25 -0500
  • b6692ce3de ugh mrq 2025-04-05 18:20:46 -0500
  • 4a909ceff8 temp fix for vall_e.cpp demask scoring regression mrq 2025-04-05 11:04:26 -0500
  • 44260f7445 tweaks mrq 2025-04-05 10:27:07 -0500
  • 0ede3bfc12 updated vall_e.cpp, but i could have sworn it worked much better than this...... mrq 2025-04-05 01:22:51 -0500
  • 28d39ef962 should not be working late mrq 2025-04-03 23:32:58 -0500
  • bfe70e9d56 ugh mrq 2025-04-03 23:26:00 -0500
  • 2e93438867 reintroduced sampler_type = speaker because I think this might salvage the nemo model to have better speaker similarities mrq 2025-04-03 19:01:10 -0500
  • caad99ab78 fix for bsz>1 because I forgot the old implementation implicitly handles this mrq 2025-04-02 17:17:37 -0500
  • 068dbdb785 ugh mrq 2025-04-02 17:05:16 -0500
  • 0e995dbf2c is this my last cope (falling back to explicit duration prediction, as this regression just won't go away) (also the smaller model was lobotomized because of my ROCm setup having a botched SDPA for who knows why) mrq 2025-04-02 17:01:24 -0500
  • 7a0956863d oops mrq 2025-03-31 21:11:43 -0500
  • a1184586ef should never have trusted mse_loss, it never works mrq 2025-03-31 20:59:13 -0500
  • 99f251c768 slight tweaks to condition-less NS/SR mrq 2025-03-30 10:37:40 -0500
  • 478aea0e8c tweaks mrq 2025-03-28 19:49:54 -0500
  • 6ae282e090 re-added noise dataloader sampler whatever for the old implementation's other tasks that require it mrq 2025-03-28 15:07:06 -0500
  • 90b3509404 I'll just cope and say I cannot apply segmented attention masks to the smaller model as it's too trained on not doing it, and the regression came from dumb python aliasing rules mrq 2025-03-27 13:27:51 -0500
  • 2fd82a7a22 cannot get segmented mask to actually work without gradients exploding (need to find a different way to do duration prediction...) mrq 2025-03-27 00:51:41 -0500
  • 4d777b5618 add remark that segmented attention actually might be broken (for some reason this only emerged recently, need to investigate) mrq 2025-03-26 12:08:47 -0500
  • 09e9438941 ugh mrq 2025-03-25 23:24:01 -0500
  • 8641c87611 nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression) mrq 2025-03-25 23:06:16 -0500
  • aa8b32d97e added more notes (although I could have sworn I have had more notes that i can't recall) mrq 2025-03-25 18:53:06 -0500
  • df5b870908 added remark about not using sliding attention mrq 2025-03-22 12:44:34 -0500
  • 02a8bcbe29 fixed errant index error (although it makes me wonder if my segmented masking is still flawed) mrq 2025-03-21 23:41:34 -0500
  • d1d91295b3 add segmented sliding attention, also found a bug with prom-less segments in the attention mask generation......... mrq 2025-03-21 19:05:49 -0500
  • 589cfb0e18 yuge speedup because of a dumb oversight mrq 2025-03-20 17:39:41 -0500
  • 8068f24e35 cleaned up parallel nar, i think it's slightly faster but even the smallest model is still slower than ar+nar-len-llama-8... mrq 2025-03-20 15:56:15 -0500
  • 9a7458cf17 fixed inferencing since I did delete the len_emb, some more notes on the model since it seems I just had bad experimental settings mrq 2025-03-19 22:41:48 -0500
  • 61de653ad9 now causal training should work again mrq 2025-03-19 14:20:19 -0500
  • 85b9dd47c1 ugh mrq 2025-03-19 13:31:50 -0500
  • 81acd565b3 re-enable these mrq 2025-03-18 20:59:33 -0500
  • 5479d2eacc more tweaks to the new implementation (properly trim the len stuff to save some params, decoder to d_ffn expansion to 2 to maybe also make it faster, etc.) mrq 2025-03-18 19:34:37 -0500
  • 9a8a8e3195 off by one bateman mrq 2025-03-18 08:40:43 -0500
  • 0280e72257 ugh mrq 2025-03-17 21:49:45 -0500
  • b0dba9db07 this may bite me in the ass mrq 2025-03-17 21:46:50 -0500
  • 2dfef693c4 comments for clarity mrq 2025-03-16 11:30:23 -0500
  • c5475ebc91 another dataloader optimization mrq 2025-03-15 20:18:58 -0500
  • bee2688dea ugh mrq 2025-03-15 16:50:21 -0500
  • 2053580838 updated dataloader to hopefully reduce RAM usage mrq 2025-03-15 13:14:37 -0500
  • 9cfbf94b1c config-ify the len_loss_factor mrq 2025-03-14 20:30:48 -0500
  • ca8cc15271 more tweaks (vall_e.webui --yaml still breaks things, --model needs to deduce what audio backend now that im supporting other ones again // added easy top-sampler settings back for new implementation) mrq 2025-03-14 20:18:25 -0500
  • 6ee505cffd fixed dac mrq 2025-03-12 23:17:27 -0500
  • ba5f3d19b4 use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes) mrq 2025-03-12 22:47:19 -0500
  • 2ccf1b5740 actually do duration prediction mrq 2025-03-11 22:14:54 -0500
  • 5c512717a6 len prediction for new model (and remove logit normalization since it kills inferencing) mrq 2025-03-11 20:33:09 -0500
  • 5f98543d4d ughh mrq 2025-03-10 21:18:57 -0500
  • 8ac03aac8a ugh mrq 2025-03-10 21:14:56 -0500
  • 5670fcb23f hopefully the final tweaks needed for this bastard of a model mrq 2025-03-10 20:59:11 -0500
  • 00d1fed217 another optimization (within the dataloader because the similar utterance sampler was mondo slow) mrq 2025-03-08 17:10:50 -0600
  • 5e9d1a5302 one more time one more time (this normalization isn't a spook) mrq 2025-03-07 19:32:42 -0600
  • 93044829af one more time (could have sworn i tested it with batch size > 1) mrq 2025-03-07 19:14:33 -0600
  • 6cea840710 oops mrq 2025-03-07 18:57:25 -0600
  • dbd34b6430 add specialized calc_loss because schizo mrq 2025-03-07 18:44:11 -0600
  • 8d848ed549 handle case of dropping cond for segment mask mrq 2025-03-07 14:11:58 -0600
  • 89e52b9877 ugh mrq 2025-03-07 13:55:57 -0600
  • 6afc2b7526 gut feeling to change the attention mask mrq 2025-03-07 13:51:59 -0600
  • 91ede71cf0 ugh mrq 2025-03-06 17:19:27 -0600
  • 2dd80a03ff stuff for interfacing with the loss scaler value (because I want to cap it) mrq 2025-03-06 17:07:29 -0600
  • a30dffcca7 wandb additions (to-do eventually, upload samples as artifacts) mrq 2025-03-06 15:44:40 -0600
  • ec87308d75 final tweaks before training this meme 44khz model for the 3rd time mrq 2025-03-06 15:31:15 -0600
  • 5cd71ef238 QoL so I can stop having to manually inject different configs mrq 2025-03-06 14:48:14 -0600
  • 0d809561c6 accuracy k=1 and k=80 because im probably dumb for k=10 as the default since it does not represent any usecase mrq 2025-03-05 16:35:34 -0600
  • 2fb2b732fc wow that was fast mrq 2025-03-04 23:17:18 -0600
  • 462f71e2f7 ugh mrq 2025-03-04 14:57:00 -0600
  • 1cd24f3381 a birdie tells me i should probably use a different optimizer (also preliminary support for native sparse attention but I don't know if I'll use it) mrq 2025-03-04 14:53:02 -0600
  • 0451f75e33 now that the new model seems a little more promising, i can re-document things non-cynically mrq 2025-03-03 13:21:41 -0600
  • 3f1070f575 tweaks mrq 2025-03-02 22:36:25 -0600
  • 4afa4ccce5 at wits end (parhaps the semantic token approach is the toughest pill to swallow) mrq 2025-03-01 21:03:25 -0600
  • 1d3290b023 could have sworn this worked before, might have broke it when i decoupled from omegaconf mrq 2025-03-01 19:30:26 -0600
  • 17094b8002 reticulating splines mrq 2025-03-01 17:48:51 -0600
  • 56f8be4d62 lol mrq 2025-02-28 22:15:37 -0600
  • ddc49c89c5 the learning rate scheduler pill is a tough pill to swallow mrq 2025-02-28 22:12:19 -0600
  • b97faa8173 fixes... mrq 2025-02-28 18:53:07 -0600
  • 4e7d885542 lol mrq 2025-02-28 18:06:41 -0600
  • a174c33db6 a gorillionth time's the charm (aka: the encoder/decoder pill is a tough pill to swallow) mrq 2025-02-28 17:56:50 -0600
  • 09d82a26fe ugh mrq 2025-02-28 01:06:38 -0600
  • 93feb5660f do not like that mrq 2025-02-27 23:59:56 -0600
  • f4f435d7f5 when you already had these ideas to stabilize training but you just ignored them mrq 2025-02-27 23:39:20 -0600
  • 0a45c9c042 fix attention backend not being used mrq 2025-02-27 21:38:38 -0600
  • b8e9f3d785 maybe this will work mrq 2025-02-27 20:42:12 -0600
  • 01e96bafc9 ugh mrq 2025-02-27 19:05:32 -0600