|
67f7bad168
|
added mixed modality AR+NAR-len to generate a short prefix through the AR, then inference with said prefix through the NAR-len (need to experiment with it more to ensure that the masked off tokens are the only tokens getting updated)
|
2024-11-20 14:22:12 -06:00 |
|
|
b1369e7824
|
better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len)
|
2024-11-19 18:51:17 -06:00 |
|
|
190a917b3e
|
I did it.
|
2024-11-19 12:24:33 -06:00 |
|
|
0e621354e7
|
cleaned up classifier-free guidance logit processing (in order to try and cope with a bad nar-len model)
|
2024-11-19 10:30:05 -06:00 |
|
|
5ba80686e1
|
two weeks of agony concludes
|
2024-11-18 21:29:28 -06:00 |
|
|
2b29790173
|
oops
|
2024-11-18 14:12:26 -06:00 |
|
|
6cfdf94bf9
|
swap priority to use nar-len if available, added notes
|
2024-11-18 09:40:04 -06:00 |
|
|
069b27570f
|
set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)
|
2024-11-17 17:04:07 -06:00 |
|
|
23fdba0c98
|
tweaks and changes
|
2024-11-16 15:49:06 -06:00 |
|
|
39096f8ff3
|
redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........)
|
2024-11-14 22:17:47 -06:00 |
|
|
e412e98125
|
ugh
|
2024-11-14 07:34:22 -06:00 |
|
|
910033343c
|
overhauled how the right resp level / classifier gets picked to avoid cringemath
|
2024-11-13 13:31:17 -06:00 |
|
|
269648605e
|
move NAR-len rvq level 0 to separate embedding
|
2024-11-13 11:38:58 -06:00 |
|
|
8286aa54c8
|
do not pass timestep token/embedding since it doesn't seem to matter at all after all, fixed training masking rate to 80% because a paper said so
|
2024-11-13 09:07:10 -06:00 |
|
|
0f2584eba7
|
new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?)
|
2024-11-12 22:30:09 -06:00 |
|
|
663f07038d
|
haha... (do not create a token dropout/noise mask when not training (this sadly didnt fix NAR-len output))
|
2024-11-12 16:41:58 -06:00 |
|
|
b09328069e
|
actually do CFG sampling for base AR+NAR tasks
|
2024-11-12 13:42:39 -06:00 |
|
|
2495a7ef67
|
Fixed STT in the web UI
|
2024-11-12 12:49:53 -06:00 |
|
|
8927bad7bc
|
actually fixed rep pen (for ar and nar, it seems to help with nar unmasking)
|
2024-11-11 21:40:19 -06:00 |
|
|
b1f4db39c8
|
threw in CFG sampling for normal model as well to experiment with
|
2024-11-11 20:27:38 -06:00 |
|
|
2f56696506
|
overhauled inference/sampler kwargs to stop being a bloated mess
|
2024-11-11 20:21:16 -06:00 |
|
|
a748e223ce
|
tweaks
|
2024-11-11 12:40:41 -06:00 |
|
|
48490757da
|
fixes
|
2024-11-10 20:37:50 -06:00 |
|
|
9def34cd66
|
lol
|
2024-11-10 12:48:41 -06:00 |
|
|
9cb0b6901b
|
unified nar.py into ar_nar.py
|
2024-11-10 12:19:48 -06:00 |
|
|
a9d2faf2d7
|
all I can do now until I wait for the model to (re)train for pure NAR
|
2024-11-09 22:57:34 -06:00 |
|
|
c127c4e488
|
'borrowed' a sampling scheduler for NAR-len's RVQ level 0 (better than before, but still not good enough)
|
2024-11-07 21:19:14 -06:00 |
|
|
d229725c76
|
more adjustments (adjustments of early-exit entropy/varentropy thresholds, default rep pen being 1.5, experimental refine-on-stop, etc.)
|
2024-11-03 18:31:28 -06:00 |
|
|
aee08b7307
|
changed layerskip float16 training warning (since it didnt seem to fry on my 4xV100 system)
|
2024-11-03 09:58:29 -06:00 |
|
|
ded746e157
|
very, very naive layerskip speculative sampling (it just checks if the current layer's state is good enough)
|
2024-11-02 11:49:05 -05:00 |
|
|
ec79230965
|
shuffled web UI options hidden by cfg.experimental to its own tab, expose early exit selection to inferencing (it kinda works naively, still need to implement self-speculation)
|
2024-11-01 21:30:06 -05:00 |
|
|
9b6c57bc57
|
third time's the charm (for some reason it escaped me that I should treat early exit loss as an aux_loss to be used with the normal loss, as if I was training a MoE's router)
|
2024-11-01 12:50:37 -05:00 |
|
|
a22534e8f4
|
layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this)
|
2024-10-30 20:05:45 -05:00 |
|
|
a96f5aee32
|
adjusted how i want to pass eval kwargs
|
2024-10-25 20:38:09 -05:00 |
|
|
92e6bff6dc
|
actually ar temp 0.5 with rep pen 1.125 seems to have the benefits of better outputs without it degrading some of the time but not all the time
|
2024-10-23 00:03:35 -05:00 |
|
|
8920e5e86b
|
actually have beam_width in the webUI work
|
2024-10-22 22:06:22 -05:00 |
|
|
910571ad34
|
too brainlet to diagnose why low temp / greedy sampling is randomly unstable some of the time
|
2024-10-22 20:13:54 -05:00 |
|
|
8eb9a4056b
|
modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling
|
2024-10-22 18:12:39 -05:00 |
|
|
1a02cd5bce
|
modify demo template to say F5 instead of YourTTS, swap LoRA comparison around to make the lora'd the base file, and the no-lora the suffix'd file
|
2024-10-21 19:52:02 -05:00 |
|
|
71731ed785
|
added prefixing with silence (was to test something, currently hidden under cfg.experimental=True)
|
2024-10-18 17:19:52 -05:00 |
|
|
fc8dfd8617
|
made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar)
|
2024-10-18 16:55:00 -05:00 |
|
|
75b90be325
|
cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified
|
2024-10-17 17:06:48 -05:00 |
|
|
04e983b86b
|
modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now
|
2024-10-12 11:27:55 -05:00 |
|
|
666e8038fb
|
ugh
|
2024-10-12 10:41:35 -05:00 |
|
|
d0ab7d755a
|
added min-p (really does not seem useful since it's very sensitive), more tweaks to entropix
|
2024-10-11 22:36:06 -05:00 |
|
|
bef43a0c18
|
added experimental entropix sampling support
|
2024-10-11 21:18:26 -05:00 |
|
|
75a4c866d6
|
more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI)
|
2024-10-10 19:04:12 -05:00 |
|
|
acdce66d4e
|
readme tweaks, set the (unused) default model download URL back to the base ar+nar-llama-8 model, as ar+nar-tts+stt-llama-8 was renamed back to it since it performs well
|
2024-10-05 22:53:53 -05:00 |
|
|
84c7419001
|
faster
|
2024-10-04 22:30:47 -05:00 |
|
|
a507b769a1
|
sped up inferencing by not doing .tolist() for rep pen / length pen (and a bug fix in the web UI from prev commit)
|
2024-10-04 22:18:20 -05:00 |
|