Commit Graph

412 Commits

Author SHA1 Message Date
mrq
8515038968 imagine my disappointment when the epoch finished just for it to throw an exception 2024-12-16 18:28:01 -06:00
mrq
4a65ac9eb7 oops 2024-12-15 17:21:51 -06:00
mrq
9a62e3b824 APOLLO cringe (doesn't want to work with deepspeed) 2024-12-12 00:31:58 -06:00
mrq
cddf8ca814 sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them) 2024-12-11 22:45:38 -06:00
mrq
6468e5d124 lol 2024-12-11 19:10:32 -06:00
mrq
3ef8894290 oops 2024-12-08 15:24:21 -06:00
mrq
1d460b9fe3 logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago) 2024-12-08 14:52:47 -06:00
mrq
5d80a2d0d4 fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing now 2024-12-07 19:21:05 -06:00
mrq
61ed662856 ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode) 2024-12-07 12:31:54 -06:00
mrq
34a66e1052 agnostified KD 2024-12-06 23:53:46 -06:00
mrq
953d3eb030 ugh 2024-12-06 22:35:30 -06:00
mrq
42fafbaaca actually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps) 2024-12-06 21:55:20 -06:00
mrq
23d402bf01 added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......) 2024-12-05 23:05:52 -06:00
mrq
93d27be539 rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting) 2024-12-04 20:31:44 -06:00
mrq
9dff68c0c5 NAR-len tweaks (remasks a small amount of tokens per step, it seems to help with reducing the number of steps needed some of the time?, disable CFG for the first half to speed things up) 2024-12-04 09:30:29 -06:00
mrq
cf97560e70 minimum CFG of 3 for NAR-len because it seems the model will auto-default to NAR-len now 2024-12-03 19:40:05 -06:00
mrq
ca31da0a95 sageattn (forgot to bother with testing this the other day, seems ifne) 2024-12-03 15:14:57 -06:00
mrq
84a05acb6d touch ups in docs 2024-12-02 19:10:42 -06:00
mrq
dcaf38b359 fixed training tqdm being stubborn 2024-11-23 09:45:23 -06:00
mrq
41d7c30ea5 added much cleaner non-causal mask generation 2024-11-22 19:43:32 -06:00
mrq
c99a74e834 actually generate a causal mask because it seems sometimes it does not actually generate one because it makes assumptions 2024-11-22 18:30:24 -06:00
mrq
ccee5fc11c that was actually all pointless since sdpa always had an attention mask fed to it and does not need is_causal to implicitly generate one 2024-11-22 16:51:50 -06:00
mrq
4aa685e749 what has science done 2024-11-22 16:45:40 -06:00
mrq
147219a5e0 huge oversight in the attention masking......... (i realized I have not been providing a non-causal mask to non-causal tasks) 2024-11-22 13:44:43 -06:00
mrq
24d888c47c temporarily dropping support for xformers because it's breaking when using an attention mask (which i dont remember commenting it out when being passed), default to not use wandb because it's being a pain when doing tests and not actual sessionsS) 2024-11-22 11:29:12 -06:00
mrq
8aafae91fd dont use timeembedding 2024-11-21 23:14:52 -06:00
mrq
2cef97e43f cleanup 2024-11-21 23:08:43 -06:00
mrq
67f7bad168 added mixed modality AR+NAR-len to generate a short prefix through the AR, then inference with said prefix through the NAR-len (need to experiment with it more to ensure that the masked off tokens are the only tokens getting updated) 2024-11-20 14:22:12 -06:00
mrq
b1369e7824 better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len) 2024-11-19 18:51:17 -06:00
mrq
190a917b3e I did it. 2024-11-19 12:24:33 -06:00
mrq
0e621354e7 cleaned up classifier-free guidance logit processing (in order to try and cope with a bad nar-len model) 2024-11-19 10:30:05 -06:00
mrq
5ba80686e1 two weeks of agony concludes 2024-11-18 21:29:28 -06:00
mrq
2b29790173 oops 2024-11-18 14:12:26 -06:00
mrq
6cfdf94bf9 swap priority to use nar-len if available, added notes 2024-11-18 09:40:04 -06:00
mrq
069b27570f set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things) 2024-11-17 17:04:07 -06:00
mrq
88d840218d default set cfg strength to 3.0 since the reference model is updated 2024-11-17 10:23:40 -06:00
mrq
a3e1fa3518 ugh 2024-11-17 09:28:33 -06:00
mrq
23fdba0c98 tweaks and changes 2024-11-16 15:49:06 -06:00
mrq
2fbeacfe92 ugh 2024-11-14 22:18:33 -06:00
mrq
39096f8ff3 redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........) 2024-11-14 22:17:47 -06:00
mrq
e412e98125 ugh 2024-11-14 07:34:22 -06:00
mrq
c00fc18b62 actually use the right embedding for nar-len 2024-11-13 18:04:04 -06:00
mrq
3ea8a610d6 fix STT 2024-11-13 14:27:15 -06:00
mrq
910033343c overhauled how the right resp level / classifier gets picked to avoid cringemath 2024-11-13 13:31:17 -06:00
mrq
269648605e move NAR-len rvq level 0 to separate embedding 2024-11-13 11:38:58 -06:00
mrq
be83ddabaa better causal-ness for split loss calc, and also do masking for NAR-len for it 2024-11-13 10:17:52 -06:00
mrq
6b76419123 ugh 2024-11-13 09:54:20 -06:00
mrq
ad7cfffc00 NAR-len RVQ-0 was being trained causally............. 2024-11-13 09:43:50 -06:00
mrq
8286aa54c8 do not pass timestep token/embedding since it doesn't seem to matter at all after all, fixed training masking rate to 80% because a paper said so 2024-11-13 09:07:10 -06:00
mrq
0f2584eba7 new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?) 2024-11-12 22:30:09 -06:00
mrq
663f07038d haha... (do not create a token dropout/noise mask when not training (this sadly didnt fix NAR-len output)) 2024-11-12 16:41:58 -06:00
mrq
b09328069e actually do CFG sampling for base AR+NAR tasks 2024-11-12 13:42:39 -06:00
mrq
2495a7ef67 Fixed STT in the web UI 2024-11-12 12:49:53 -06:00
mrq
8927bad7bc actually fixed rep pen (for ar and nar, it seems to help with nar unmasking) 2024-11-11 21:40:19 -06:00
mrq
b1f4db39c8 threw in CFG sampling for normal model as well to experiment with 2024-11-11 20:27:38 -06:00
mrq
2f56696506 overhauled inference/sampler kwargs to stop being a bloated mess 2024-11-11 20:21:16 -06:00
mrq
a748e223ce tweaks 2024-11-11 12:40:41 -06:00
mrq
48490757da fixes 2024-11-10 20:37:50 -06:00
mrq
9def34cd66 lol 2024-11-10 12:48:41 -06:00
mrq
9cb0b6901b unified nar.py into ar_nar.py 2024-11-10 12:19:48 -06:00
mrq
a9d2faf2d7 all I can do now until I wait for the model to (re)train for pure NAR 2024-11-09 22:57:34 -06:00
mrq
ad7e290a5e ugh (ROCm seems to silently clamp any token value >= logits.shape[-1] for loss calculation, while cuda will throw an assert, making it hard to find this dumb fuckup) 2024-11-09 19:40:02 -06:00
mrq
943fe70c10 I don't know why this fixes an assert thrown but it does 2024-11-09 19:04:13 -06:00
mrq
f50d92ba6c Almost made a mistake 2024-11-09 18:12:54 -06:00
mrq
c6a38693a2 This better work 2024-11-09 18:04:59 -06:00
mrq
8b3d1cf70a Something's Wrong 2024-11-09 15:07:43 -06:00
mrq
dcd5fecff3 some cleanup while I wait for the NAR-len to train to an acceptable state (currently it performs okay, but only on audo after 3 seconds or so) 2024-11-09 12:12:46 -06:00
mrq
69b0b3b854 set timestep tensor to whatever the time embedding's dtype is because it'll gripe under amp 2024-11-09 00:11:16 -06:00
mrq
5a09a5f6e9 I forgot about the time embedding... 2024-11-08 22:46:26 -06:00
mrq
811b15d280 I suppose I just have a shit training method since the sampler is as solid as I can get it............... 2024-11-08 22:05:41 -06:00
mrq
13b54953bd agony 2024-11-08 13:34:39 -06:00
mrq
c127c4e488 'borrowed' a sampling scheduler for NAR-len's RVQ level 0 (better than before, but still not good enough) 2024-11-07 21:19:14 -06:00
mrq
e108c54daf new NAR-len training paradigm...... 2024-11-07 11:32:11 -06:00
mrq
ed174c589e ugh 2024-11-07 09:19:21 -06:00
mrq
d13ab00ad8 one more note 2024-11-07 09:11:21 -06:00
mrq
5698188824 あたしって、ほんとバカ 2024-11-07 09:10:18 -06:00
mrq
77ff23e319 repeat extend the prom to fill the initial tokens for nar-len (it somewhat works, the model just needs to train more) 2024-11-06 23:29:53 -06:00
mrq
105ed51159 I guess I'll fall for the NAR-len meme again (I don't know where my previous weights are, so I need to train it again to test something) 2024-11-06 19:17:12 -06:00
mrq
aefe8fcdad UGH 2024-11-05 22:13:58 -06:00
mrq
9e65e05e83 more windows specific fixes, limit gradio to <5.0.0 on linux (it works on windows, but not on my linux machine tm) 2024-11-04 18:00:33 -06:00
mrq
c83670c38c Windows specific fixes (to-do: find libespeak-ng.dll automatically because it cannot be trusted to do it by default) 2024-11-03 19:19:15 -06:00
mrq
d229725c76 more adjustments (adjustments of early-exit entropy/varentropy thresholds, default rep pen being 1.5, experimental refine-on-stop, etc.) 2024-11-03 18:31:28 -06:00
mrq
aee08b7307 changed layerskip float16 training warning (since it didnt seem to fry on my 4xV100 system) 2024-11-03 09:58:29 -06:00
mrq
3826f9bae4 saner mask creation? (it doesnt matter, kv cache wont work) 2024-11-02 21:00:21 -05:00
mrq
ded746e157 very, very naive layerskip speculative sampling (it just checks if the current layer's state is good enough) 2024-11-02 11:49:05 -05:00
mrq
ec79230965 shuffled web UI options hidden by cfg.experimental to its own tab, expose early exit selection to inferencing (it kinda works naively, still need to implement self-speculation) 2024-11-01 21:30:06 -05:00
mrq
fb8faa295b actually float16(+AMP) and layerskip is bad and will kill the model...... 2024-11-01 18:36:44 -05:00
mrq
9b6c57bc57 third time's the charm (for some reason it escaped me that I should treat early exit loss as an aux_loss to be used with the normal loss, as if I was training a MoE's router) 2024-11-01 12:50:37 -05:00
mrq
76ebef45dc off-by-one... 2024-10-31 13:24:48 -05:00
mrq
b63293cbbe ugh 2024-10-30 22:49:11 -05:00
mrq
a22534e8f4 layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this) 2024-10-30 20:05:45 -05:00
mrq
ccf71dc1b6 added option to load from a model state dict directly instead of a yaml (to-do: do this for LoRAs too), automatically download the default model if none is provided 2024-10-25 22:15:15 -05:00
mrq
a96f5aee32 adjusted how i want to pass eval kwargs 2024-10-25 20:38:09 -05:00
mrq
92e6bff6dc actually ar temp 0.5 with rep pen 1.125 seems to have the benefits of better outputs without it degrading some of the time but not all the time 2024-10-23 00:03:35 -05:00
mrq
8920e5e86b actually have beam_width in the webUI work 2024-10-22 22:06:22 -05:00
mrq
910571ad34 too brainlet to diagnose why low temp / greedy sampling is randomly unstable some of the time 2024-10-22 20:13:54 -05:00
mrq
8eb9a4056b modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling 2024-10-22 18:12:39 -05:00
mrq
1a02cd5bce modify demo template to say F5 instead of YourTTS, swap LoRA comparison around to make the lora'd the base file, and the no-lora the suffix'd file 2024-10-21 19:52:02 -05:00
mrq
71731ed785 added prefixing with silence (was to test something, currently hidden under cfg.experimental=True) 2024-10-18 17:19:52 -05:00
mrq
fc8dfd8617 made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar) 2024-10-18 16:55:00 -05:00