7f4206a879fixing an error I caught while fixing tortoise_tts, possibly actually load a LoRA if not passing a yaml/model
master
mrq2025-07-24 20:56:09 -0500
98b357cc53things i forgot to do last week now that some mental faculties were restoredmrq2025-05-30 22:56:07 -0500
0cca4eb943disable this cringe precheck for now since it causes problemsmrq2025-05-22 13:21:36 -0500
f12746b091allow defining the default model name through env var, register nemo-larger in the model name list thingmrq2025-05-21 16:50:59 -0500
e46d7ef2cbwarn and ignore export when lora training because the state dict exported during training is wrongmrq2025-05-20 23:38:10 -0500
fee02f4153added option to explicitly load a lora without having to lobotomize yourself with creating a yaml just to do somrq2025-05-20 23:28:29 -0500
5018ddb107i dont know why this managed to escape my attentionmrq2025-05-20 15:13:21 -0500
5fe01ffc6cmore notes / re-enabled top-k/p samplers for new implementationmrq2025-04-19 14:04:34 -0500
f8e1d110dcwhen you uhh when you for once use your main rig to test and forgot to and when you port things back overmrq2025-04-18 20:49:00 -0500
d9e18037ccnew implementation tweaks and fixes to make it actually better (there were a lot of badwrong things being done that harmed the output quality, will evaluate the model further)mrq2025-04-18 20:36:44 -0500
98d1d8cb1eadded some more notes, tweaks (RIP DAC, it's over)mrq2025-04-17 20:24:40 -0500
814146a5e0more settings bloat because there seems to be instability with the encoder as-ismrq2025-04-12 12:53:44 -0500
f144389920the culprit was initializing the level_weights for killing newly trained models.............mrq2025-04-10 23:06:16 -0500
6c6a34dd21i can't be assed to test if the prior commit works so being explicit like this should help until i can be bothered to halt training just to test thismrq2025-04-07 23:13:35 -0500
6d42c9ae23how foolish of me, not having a softmax as float32 (maybe addresses an emergent regression where bfloat16 training shits the bed where float16+loss scaling doesnt)mrq2025-04-07 22:51:52 -0500
d6cd848c32goodbye nvidia/audio-codec-44khz, crossed fingers for DAC againmrq2025-04-06 21:05:29 -0500
1e22519d94diagnosed both hf/llama.cpp versions to probably just being a faulty export method (to-do: migrate vall_e.models.base to vall_e.export --hf)mrq2025-04-05 22:05:39 -0500
2e93438867reintroduced sampler_type = speaker because I think this might salvage the nemo model to have better speaker similaritiesmrq2025-04-03 19:01:10 -0500
caad99ab78fix for bsz>1 because I forgot the old implementation implicitly handles thismrq2025-04-02 17:17:37 -0500
0e995dbf2cis this my last cope (falling back to explicit duration prediction, as this regression just won't go away) (also the smaller model was lobotomized because of my ROCm setup having a botched SDPA for who knows why)mrq2025-04-02 17:01:24 -0500
6ae282e090re-added noise dataloader sampler whatever for the old implementation's other tasks that require itmrq2025-03-28 15:07:06 -0500
90b3509404I'll just cope and say I cannot apply segmented attention masks to the smaller model as it's too trained on not doing it, and the regression came from dumb python aliasing rulesmrq2025-03-27 13:27:51 -0500
2fd82a7a22cannot get segmented mask to actually work without gradients exploding (need to find a different way to do duration prediction...)mrq2025-03-27 00:51:41 -0500
4d777b5618add remark that segmented attention actually might be broken (for some reason this only emerged recently, need to investigate)mrq2025-03-26 12:08:47 -0500
8641c87611nothing could go wrong part 2 (reverted and rewrote commits since there was a nasty regression)mrq2025-03-25 23:06:16 -0500
aa8b32d97eadded more notes (although I could have sworn I have had more notes that i can't recall)mrq2025-03-25 18:53:06 -0500
df5b870908added remark about not using sliding attentionmrq2025-03-22 12:44:34 -0500
02a8bcbe29fixed errant index error (although it makes me wonder if my segmented masking is still flawed)mrq2025-03-21 23:41:34 -0500
d1d91295b3add segmented sliding attention, also found a bug with prom-less segments in the attention mask generation.........mrq2025-03-21 19:05:49 -0500
589cfb0e18yuge speedup because of a dumb oversightmrq2025-03-20 17:39:41 -0500
8068f24e35cleaned up parallel nar, i think it's slightly faster but even the smallest model is still slower than ar+nar-len-llama-8...mrq2025-03-20 15:56:15 -0500
9a7458cf17fixed inferencing since I did delete the len_emb, some more notes on the model since it seems I just had bad experimental settingsmrq2025-03-19 22:41:48 -0500
61de653ad9now causal training should work againmrq2025-03-19 14:20:19 -0500
5479d2eaccmore tweaks to the new implementation (properly trim the len stuff to save some params, decoder to d_ffn expansion to 2 to maybe also make it faster, etc.)mrq2025-03-18 19:34:37 -0500
9a8a8e3195off by one batemanmrq2025-03-18 08:40:43 -0500
2053580838updated dataloader to hopefully reduce RAM usagemrq2025-03-15 13:14:37 -0500
9cfbf94b1cconfig-ify the len_loss_factormrq2025-03-14 20:30:48 -0500
ca8cc15271more tweaks (vall_e.webui --yaml still breaks things, --model needs to deduce what audio backend now that im supporting other ones again // added easy top-sampler settings back for new implementation)mrq2025-03-14 20:18:25 -0500
ba5f3d19b4use the FSQ-targeted encoder/decodede whole-ly as it works for EnCodec too, as the RVQ-targeted encoder/decoder doesnt (and some notes)mrq2025-03-12 22:47:19 -0500
2ccf1b5740actually do duration predictionmrq2025-03-11 22:14:54 -0500
5c512717a6len prediction for new model (and remove logit normalization since it kills inferencing)mrq2025-03-11 20:33:09 -0500
1cd24f3381a birdie tells me i should probably use a different optimizer (also preliminary support for native sparse attention but I don't know if I'll use it)mrq2025-03-04 14:53:02 -0600
0451f75e33now that the new model seems a little more promising, i can re-document things non-cynicallymrq2025-03-03 13:21:41 -0600