Commit Graph

305 Commits

Author SHA1 Message Date
mrq
48490757da fixes 2024-11-10 20:37:50 -06:00
mrq
9def34cd66 lol 2024-11-10 12:48:41 -06:00
mrq
9cb0b6901b unified nar.py into ar_nar.py 2024-11-10 12:19:48 -06:00
mrq
a9d2faf2d7 all I can do now until I wait for the model to (re)train for pure NAR 2024-11-09 22:57:34 -06:00
mrq
ad7e290a5e ugh (ROCm seems to silently clamp any token value >= logits.shape[-1] for loss calculation, while cuda will throw an assert, making it hard to find this dumb fuckup) 2024-11-09 19:40:02 -06:00
mrq
943fe70c10 I don't know why this fixes an assert thrown but it does 2024-11-09 19:04:13 -06:00
mrq
f50d92ba6c Almost made a mistake 2024-11-09 18:12:54 -06:00
mrq
c6a38693a2 This better work 2024-11-09 18:04:59 -06:00
mrq
8b3d1cf70a Something's Wrong 2024-11-09 15:07:43 -06:00
mrq
dcd5fecff3 some cleanup while I wait for the NAR-len to train to an acceptable state (currently it performs okay, but only on audo after 3 seconds or so) 2024-11-09 12:12:46 -06:00
mrq
69b0b3b854 set timestep tensor to whatever the time embedding's dtype is because it'll gripe under amp 2024-11-09 00:11:16 -06:00
mrq
5a09a5f6e9 I forgot about the time embedding... 2024-11-08 22:46:26 -06:00
mrq
811b15d280 I suppose I just have a shit training method since the sampler is as solid as I can get it............... 2024-11-08 22:05:41 -06:00
mrq
13b54953bd agony 2024-11-08 13:34:39 -06:00
mrq
c127c4e488 'borrowed' a sampling scheduler for NAR-len's RVQ level 0 (better than before, but still not good enough) 2024-11-07 21:19:14 -06:00
mrq
e108c54daf new NAR-len training paradigm...... 2024-11-07 11:32:11 -06:00
mrq
ed174c589e ugh 2024-11-07 09:19:21 -06:00
mrq
d13ab00ad8 one more note 2024-11-07 09:11:21 -06:00
mrq
5698188824 あたしって、ほんとバカ 2024-11-07 09:10:18 -06:00
mrq
77ff23e319 repeat extend the prom to fill the initial tokens for nar-len (it somewhat works, the model just needs to train more) 2024-11-06 23:29:53 -06:00
mrq
105ed51159 I guess I'll fall for the NAR-len meme again (I don't know where my previous weights are, so I need to train it again to test something) 2024-11-06 19:17:12 -06:00
mrq
aefe8fcdad UGH 2024-11-05 22:13:58 -06:00
mrq
9e65e05e83 more windows specific fixes, limit gradio to <5.0.0 on linux (it works on windows, but not on my linux machine tm) 2024-11-04 18:00:33 -06:00
mrq
c83670c38c Windows specific fixes (to-do: find libespeak-ng.dll automatically because it cannot be trusted to do it by default) 2024-11-03 19:19:15 -06:00
mrq
d229725c76 more adjustments (adjustments of early-exit entropy/varentropy thresholds, default rep pen being 1.5, experimental refine-on-stop, etc.) 2024-11-03 18:31:28 -06:00
mrq
aee08b7307 changed layerskip float16 training warning (since it didnt seem to fry on my 4xV100 system) 2024-11-03 09:58:29 -06:00
mrq
3826f9bae4 saner mask creation? (it doesnt matter, kv cache wont work) 2024-11-02 21:00:21 -05:00
mrq
ded746e157 very, very naive layerskip speculative sampling (it just checks if the current layer's state is good enough) 2024-11-02 11:49:05 -05:00
mrq
ec79230965 shuffled web UI options hidden by cfg.experimental to its own tab, expose early exit selection to inferencing (it kinda works naively, still need to implement self-speculation) 2024-11-01 21:30:06 -05:00
mrq
fb8faa295b actually float16(+AMP) and layerskip is bad and will kill the model...... 2024-11-01 18:36:44 -05:00
mrq
9b6c57bc57 third time's the charm (for some reason it escaped me that I should treat early exit loss as an aux_loss to be used with the normal loss, as if I was training a MoE's router) 2024-11-01 12:50:37 -05:00
mrq
76ebef45dc off-by-one... 2024-10-31 13:24:48 -05:00
mrq
b63293cbbe ugh 2024-10-30 22:49:11 -05:00
mrq
a22534e8f4 layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this) 2024-10-30 20:05:45 -05:00
mrq
ccf71dc1b6 added option to load from a model state dict directly instead of a yaml (to-do: do this for LoRAs too), automatically download the default model if none is provided 2024-10-25 22:15:15 -05:00
mrq
a96f5aee32 adjusted how i want to pass eval kwargs 2024-10-25 20:38:09 -05:00
mrq
92e6bff6dc actually ar temp 0.5 with rep pen 1.125 seems to have the benefits of better outputs without it degrading some of the time but not all the time 2024-10-23 00:03:35 -05:00
mrq
8920e5e86b actually have beam_width in the webUI work 2024-10-22 22:06:22 -05:00
mrq
910571ad34 too brainlet to diagnose why low temp / greedy sampling is randomly unstable some of the time 2024-10-22 20:13:54 -05:00
mrq
8eb9a4056b modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling 2024-10-22 18:12:39 -05:00
mrq
1a02cd5bce modify demo template to say F5 instead of YourTTS, swap LoRA comparison around to make the lora'd the base file, and the no-lora the suffix'd file 2024-10-21 19:52:02 -05:00
mrq
71731ed785 added prefixing with silence (was to test something, currently hidden under cfg.experimental=True) 2024-10-18 17:19:52 -05:00
mrq
fc8dfd8617 made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar) 2024-10-18 16:55:00 -05:00
mrq
75b90be325 cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified 2024-10-17 17:06:48 -05:00
mrq
84005c5b00 entropix apparently processes the entire sequence of logits but it falls apart when doing that 2024-10-13 12:01:12 -05:00
mrq
c800d28bb8 respect attention defined in the yaml for web UI (which might explain why theres been a discrepancy in outputs for me) 2024-10-13 11:02:24 -05:00
mrq
ed6b7a690f ugh......... 2024-10-13 00:26:46 -05:00
mrq
d405f243d4 at wits end in trying to output the right attention scores 2024-10-12 23:53:13 -05:00
mrq
70cf694cfd output attention scores for SDPA/flash, since naive attention seems broken 2024-10-12 12:09:17 -05:00
mrq
04e983b86b modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now 2024-10-12 11:27:55 -05:00