|
a22534e8f4
|
layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this)
|
2024-10-30 20:05:45 -05:00 |
|
|
4049f51ba9
|
added option to load lora directly from the model file itself with --lora
|
2024-10-26 00:13:10 -05:00 |
|
|
023c3af331
|
updated readme to reflect changes
|
2024-10-25 22:17:05 -05:00 |
|
|
ccf71dc1b6
|
added option to load from a model state dict directly instead of a yaml (to-do: do this for LoRAs too), automatically download the default model if none is provided
|
2024-10-25 22:15:15 -05:00 |
|
|
a96f5aee32
|
adjusted how i want to pass eval kwargs
|
2024-10-25 20:38:09 -05:00 |
|
|
92e6bff6dc
|
actually ar temp 0.5 with rep pen 1.125 seems to have the benefits of better outputs without it degrading some of the time but not all the time
|
2024-10-23 00:03:35 -05:00 |
|
|
8920e5e86b
|
actually have beam_width in the webUI work
|
2024-10-22 22:06:22 -05:00 |
|
|
910571ad34
|
too brainlet to diagnose why low temp / greedy sampling is randomly unstable some of the time
|
2024-10-22 20:13:54 -05:00 |
|
|
8eb9a4056b
|
modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling
|
2024-10-22 18:12:39 -05:00 |
|
|
1a02cd5bce
|
modify demo template to say F5 instead of YourTTS, swap LoRA comparison around to make the lora'd the base file, and the no-lora the suffix'd file
|
2024-10-21 19:52:02 -05:00 |
|
|
02dfc60ac3
|
ugh
|
2024-10-18 17:23:22 -05:00 |
|
|
71731ed785
|
added prefixing with silence (was to test something, currently hidden under cfg.experimental=True)
|
2024-10-18 17:19:52 -05:00 |
|
|
6b04c13c56
|
print warning if audio promtpless inferencing with low AR temp (it really doesn't like low temps / greedy sampling)
|
2024-10-18 17:01:40 -05:00 |
|
|
c8f31db1de
|
default to greedy sample AR (i should probably test this more but it seems to pass my harvard sentences and tongue twisters)
|
2024-10-18 16:58:56 -05:00 |
|
|
fc8dfd8617
|
made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar)
|
2024-10-18 16:55:00 -05:00 |
|
|
07f4935a75
|
more tweaks
|
2024-10-18 13:19:36 -05:00 |
|
|
0dfab973e7
|
oops
|
2024-10-18 09:40:06 -05:00 |
|
|
75b90be325
|
cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified
|
2024-10-17 17:06:48 -05:00 |
|
|
8b6095f681
|
saner defaults, maybe
|
2024-10-17 14:37:21 -05:00 |
|
|
f88097ccf6
|
add config option to set the rate of sampling randomly vs similar speakers during training
|
2024-10-16 14:27:58 -05:00 |
|
|
48461833c2
|
ugh
|
2024-10-15 19:30:43 -05:00 |
|
|
eea70f5698
|
kludge fix for an oversight in the model when trying to train for longer input prompt durations......
|
2024-10-15 19:25:03 -05:00 |
|
|
84005c5b00
|
entropix apparently processes the entire sequence of logits but it falls apart when doing that
|
2024-10-13 12:01:12 -05:00 |
|
|
c800d28bb8
|
respect attention defined in the yaml for web UI (which might explain why theres been a discrepancy in outputs for me)
|
2024-10-13 11:02:24 -05:00 |
|
|
ed6b7a690f
|
ugh.........
|
2024-10-13 00:26:46 -05:00 |
|
|
d405f243d4
|
at wits end in trying to output the right attention scores
|
2024-10-12 23:53:13 -05:00 |
|
|
70cf694cfd
|
output attention scores for SDPA/flash, since naive attention seems broken
|
2024-10-12 12:09:17 -05:00 |
|
|
541e45263c
|
ugh
|
2024-10-12 11:29:16 -05:00 |
|
|
04e983b86b
|
modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now
|
2024-10-12 11:27:55 -05:00 |
|
|
666e8038fb
|
ugh
|
2024-10-12 10:41:35 -05:00 |
|
|
3d6ef9666b
|
overridden naive llama attention to get the right score values that entropix needs
|
2024-10-12 10:05:47 -05:00 |
|
|
40b089daf3
|
lol
|
2024-10-12 09:57:34 -05:00 |
|
|
d6f7c86a5c
|
entropix tweaks (it doesn't output garbage but it loves to go for silence)
|
2024-10-12 09:46:18 -05:00 |
|
|
d0ab7d755a
|
added min-p (really does not seem useful since it's very sensitive), more tweaks to entropix
|
2024-10-11 22:36:06 -05:00 |
|
|
bef43a0c18
|
added experimental entropix sampling support
|
2024-10-11 21:18:26 -05:00 |
|
|
85d85c1351
|
more arg creep for demo page
|
2024-10-10 19:40:01 -05:00 |
|
|
301468f519
|
<<
|
2024-10-10 19:13:52 -05:00 |
|
|
75a4c866d6
|
more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI)
|
2024-10-10 19:04:12 -05:00 |
|
|
96d05be73c
|
demo page tweaks
|
2024-10-10 13:52:37 -05:00 |
|
|
2ea978f318
|
added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval
|
2024-10-10 13:40:25 -05:00 |
|
|
52299127ab
|
fix vall_e.emb.process
|
2024-10-08 20:00:34 -05:00 |
|
|
0656a762af
|
fix vall_e.emb.transcriber
|
2024-10-08 19:24:43 -05:00 |
|
|
acdce66d4e
|
readme tweaks, set the (unused) default model download URL back to the base ar+nar-llama-8 model, as ar+nar-tts+stt-llama-8 was renamed back to it since it performs well
|
2024-10-05 22:53:53 -05:00 |
|
|
84c7419001
|
faster
|
2024-10-04 22:30:47 -05:00 |
|
|
a507b769a1
|
sped up inferencing by not doing .tolist() for rep pen / length pen (and a bug fix in the web UI from prev commit)
|
2024-10-04 22:18:20 -05:00 |
|
|
4a8e3ccf06
|
README tweaks, added --input-prompt-prefix as an experiment (its literally better to just not do this, but i'll retain it in case i have a revelation on how to improve it)
|
2024-10-04 18:57:19 -05:00 |
|
|
a9fa0898a9
|
tweaked demo page script to sample speakers instead
|
2024-09-28 10:50:26 -05:00 |
|
|
2f1dca3089
|
added language selection in web UI, tweaked demo script
|
2024-09-28 09:49:45 -05:00 |
|
|
10df2ef5f3
|
fixed oversight where input audio does not resample (lol...)
|
2024-09-27 20:27:53 -05:00 |
|
|
039482a48e
|
don't do eval on stt because it's so slow and I don't even bother doing any metrics against it anyways (to-do: make this a flag)
|
2024-09-26 18:56:57 -05:00 |
|