Commit Graph

198 Commits

Author SHA1 Message Date
mrq
e029a8804d ironically none of this cruft gets the loss lower than the original way 2025-02-12 11:17:00 -06:00
mrq
4b31f5c808 this seems preferable 2025-02-12 00:36:50 -06:00
mrq
04fef5dad5 agony 2025-02-12 00:18:24 -06:00
mrq
47eb498046 more tweaks 2025-02-06 23:26:26 -06:00
mrq
79c504c278 cleaned up encode/decode functions to make them a little more coherent, added option to batch encode/decode (would have been very nice in the past, but this should speed things up for me when i fall for the latest meme codec) 2025-02-05 20:54:31 -06:00
mrq
69c1d2991f updated mixtral backend (need this for something else) 2025-01-20 21:50:56 -06:00
mrq
1a26f789a5 added option to playback audio directly, removed no-phonemize option since I swear it worked in testing but it doesn't actually work 2025-01-12 21:52:49 -06:00
mrq
3ab11bdc7b oops 2025-01-05 23:53:17 -06:00
mrq
b445f4abb6 experimental 2025-01-05 19:05:00 -06:00
mrq
2e6a7625e4 experimental 2025-01-05 12:47:03 -06:00
mrq
c2e17e287b really shoddy voice conversion implementation (it sort of works...) 2024-12-16 22:54:53 -06:00
mrq
8515038968 imagine my disappointment when the epoch finished just for it to throw an exception 2024-12-16 18:28:01 -06:00
mrq
9a62e3b824 APOLLO cringe (doesn't want to work with deepspeed) 2024-12-12 00:31:58 -06:00
mrq
6468e5d124 lol 2024-12-11 19:10:32 -06:00
mrq
3ef8894290 oops 2024-12-08 15:24:21 -06:00
mrq
1d460b9fe3 logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago) 2024-12-08 14:52:47 -06:00
mrq
5d80a2d0d4 fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing now 2024-12-07 19:21:05 -06:00
mrq
34a66e1052 agnostified KD 2024-12-06 23:53:46 -06:00
mrq
23d402bf01 added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......) 2024-12-05 23:05:52 -06:00
mrq
93d27be539 rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting) 2024-12-04 20:31:44 -06:00
mrq
9dff68c0c5 NAR-len tweaks (remasks a small amount of tokens per step, it seems to help with reducing the number of steps needed some of the time?, disable CFG for the first half to speed things up) 2024-12-04 09:30:29 -06:00
mrq
cf97560e70 minimum CFG of 3 for NAR-len because it seems the model will auto-default to NAR-len now 2024-12-03 19:40:05 -06:00
mrq
67f7bad168 added mixed modality AR+NAR-len to generate a short prefix through the AR, then inference with said prefix through the NAR-len (need to experiment with it more to ensure that the masked off tokens are the only tokens getting updated) 2024-11-20 14:22:12 -06:00
mrq
b1369e7824 better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len) 2024-11-19 18:51:17 -06:00
mrq
190a917b3e I did it. 2024-11-19 12:24:33 -06:00
mrq
0e621354e7 cleaned up classifier-free guidance logit processing (in order to try and cope with a bad nar-len model) 2024-11-19 10:30:05 -06:00
mrq
5ba80686e1 two weeks of agony concludes 2024-11-18 21:29:28 -06:00
mrq
2b29790173 oops 2024-11-18 14:12:26 -06:00
mrq
6cfdf94bf9 swap priority to use nar-len if available, added notes 2024-11-18 09:40:04 -06:00
mrq
069b27570f set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things) 2024-11-17 17:04:07 -06:00
mrq
23fdba0c98 tweaks and changes 2024-11-16 15:49:06 -06:00
mrq
39096f8ff3 redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........) 2024-11-14 22:17:47 -06:00
mrq
e412e98125 ugh 2024-11-14 07:34:22 -06:00
mrq
910033343c overhauled how the right resp level / classifier gets picked to avoid cringemath 2024-11-13 13:31:17 -06:00
mrq
269648605e move NAR-len rvq level 0 to separate embedding 2024-11-13 11:38:58 -06:00
mrq
8286aa54c8 do not pass timestep token/embedding since it doesn't seem to matter at all after all, fixed training masking rate to 80% because a paper said so 2024-11-13 09:07:10 -06:00
mrq
0f2584eba7 new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?) 2024-11-12 22:30:09 -06:00
mrq
663f07038d haha... (do not create a token dropout/noise mask when not training (this sadly didnt fix NAR-len output)) 2024-11-12 16:41:58 -06:00
mrq
b09328069e actually do CFG sampling for base AR+NAR tasks 2024-11-12 13:42:39 -06:00
mrq
2495a7ef67 Fixed STT in the web UI 2024-11-12 12:49:53 -06:00
mrq
8927bad7bc actually fixed rep pen (for ar and nar, it seems to help with nar unmasking) 2024-11-11 21:40:19 -06:00
mrq
b1f4db39c8 threw in CFG sampling for normal model as well to experiment with 2024-11-11 20:27:38 -06:00
mrq
2f56696506 overhauled inference/sampler kwargs to stop being a bloated mess 2024-11-11 20:21:16 -06:00
mrq
a748e223ce tweaks 2024-11-11 12:40:41 -06:00
mrq
48490757da fixes 2024-11-10 20:37:50 -06:00
mrq
9def34cd66 lol 2024-11-10 12:48:41 -06:00
mrq
9cb0b6901b unified nar.py into ar_nar.py 2024-11-10 12:19:48 -06:00
mrq
a9d2faf2d7 all I can do now until I wait for the model to (re)train for pure NAR 2024-11-09 22:57:34 -06:00
mrq
c127c4e488 'borrowed' a sampling scheduler for NAR-len's RVQ level 0 (better than before, but still not good enough) 2024-11-07 21:19:14 -06:00
mrq
d229725c76 more adjustments (adjustments of early-exit entropy/varentropy thresholds, default rep pen being 1.5, experimental refine-on-stop, etc.) 2024-11-03 18:31:28 -06:00