Commit Graph

615 Commits

Author SHA1 Message Date
mrq
92e6bff6dc actually ar temp 0.5 with rep pen 1.125 seems to have the benefits of better outputs without it degrading some of the time but not all the time 2024-10-23 00:03:35 -05:00
mrq
8920e5e86b actually have beam_width in the webUI work 2024-10-22 22:06:22 -05:00
mrq
910571ad34 too brainlet to diagnose why low temp / greedy sampling is randomly unstable some of the time 2024-10-22 20:13:54 -05:00
mrq
8eb9a4056b modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling 2024-10-22 18:12:39 -05:00
mrq
1a02cd5bce modify demo template to say F5 instead of YourTTS, swap LoRA comparison around to make the lora'd the base file, and the no-lora the suffix'd file 2024-10-21 19:52:02 -05:00
mrq
02dfc60ac3 ugh 2024-10-18 17:23:22 -05:00
mrq
71731ed785 added prefixing with silence (was to test something, currently hidden under cfg.experimental=True) 2024-10-18 17:19:52 -05:00
mrq
6b04c13c56 print warning if audio promtpless inferencing with low AR temp (it really doesn't like low temps / greedy sampling) 2024-10-18 17:01:40 -05:00
mrq
c8f31db1de default to greedy sample AR (i should probably test this more but it seems to pass my harvard sentences and tongue twisters) 2024-10-18 16:58:56 -05:00
mrq
fc8dfd8617 made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar) 2024-10-18 16:55:00 -05:00
mrq
07f4935a75 more tweaks 2024-10-18 13:19:36 -05:00
mrq
0dfab973e7 oops 2024-10-18 09:40:06 -05:00
mrq
75b90be325 cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified 2024-10-17 17:06:48 -05:00
mrq
8b6095f681 saner defaults, maybe 2024-10-17 14:37:21 -05:00
mrq
f88097ccf6 add config option to set the rate of sampling randomly vs similar speakers during training 2024-10-16 14:27:58 -05:00
mrq
48461833c2 ugh 2024-10-15 19:30:43 -05:00
mrq
eea70f5698 kludge fix for an oversight in the model when trying to train for longer input prompt durations...... 2024-10-15 19:25:03 -05:00
mrq
84005c5b00 entropix apparently processes the entire sequence of logits but it falls apart when doing that 2024-10-13 12:01:12 -05:00
mrq
c800d28bb8 respect attention defined in the yaml for web UI (which might explain why theres been a discrepancy in outputs for me) 2024-10-13 11:02:24 -05:00
mrq
ed6b7a690f ugh......... 2024-10-13 00:26:46 -05:00
mrq
d405f243d4 at wits end in trying to output the right attention scores 2024-10-12 23:53:13 -05:00
mrq
70cf694cfd output attention scores for SDPA/flash, since naive attention seems broken 2024-10-12 12:09:17 -05:00
mrq
541e45263c ugh 2024-10-12 11:29:16 -05:00
mrq
04e983b86b modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now 2024-10-12 11:27:55 -05:00
mrq
666e8038fb ugh 2024-10-12 10:41:35 -05:00
mrq
3d6ef9666b overridden naive llama attention to get the right score values that entropix needs 2024-10-12 10:05:47 -05:00
mrq
40b089daf3 lol 2024-10-12 09:57:34 -05:00
mrq
d6f7c86a5c entropix tweaks (it doesn't output garbage but it loves to go for silence) 2024-10-12 09:46:18 -05:00
mrq
d0ab7d755a added min-p (really does not seem useful since it's very sensitive), more tweaks to entropix 2024-10-11 22:36:06 -05:00
mrq
bef43a0c18 added experimental entropix sampling support 2024-10-11 21:18:26 -05:00
mrq
85d85c1351 more arg creep for demo page 2024-10-10 19:40:01 -05:00
mrq
301468f519 << 2024-10-10 19:13:52 -05:00
mrq
75a4c866d6 more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI) 2024-10-10 19:04:12 -05:00
mrq
96d05be73c demo page tweaks 2024-10-10 13:52:37 -05:00
mrq
2ea978f318 added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval 2024-10-10 13:40:25 -05:00
mrq
52299127ab fix vall_e.emb.process 2024-10-08 20:00:34 -05:00
mrq
0656a762af fix vall_e.emb.transcriber 2024-10-08 19:24:43 -05:00
mrq
acdce66d4e readme tweaks, set the (unused) default model download URL back to the base ar+nar-llama-8 model, as ar+nar-tts+stt-llama-8 was renamed back to it since it performs well 2024-10-05 22:53:53 -05:00
mrq
84c7419001 faster 2024-10-04 22:30:47 -05:00
mrq
a507b769a1 sped up inferencing by not doing .tolist() for rep pen / length pen (and a bug fix in the web UI from prev commit) 2024-10-04 22:18:20 -05:00
mrq
4a8e3ccf06 README tweaks, added --input-prompt-prefix as an experiment (its literally better to just not do this, but i'll retain it in case i have a revelation on how to improve it) 2024-10-04 18:57:19 -05:00
mrq
a9fa0898a9 tweaked demo page script to sample speakers instead 2024-09-28 10:50:26 -05:00
mrq
2f1dca3089 added language selection in web UI, tweaked demo script 2024-09-28 09:49:45 -05:00
mrq
10df2ef5f3 fixed oversight where input audio does not resample (lol...) 2024-09-27 20:27:53 -05:00
mrq
039482a48e don't do eval on stt because it's so slow and I don't even bother doing any metrics against it anyways (to-do: make this a flag) 2024-09-26 18:56:57 -05:00
mrq
ff7a1b4163 coerce into path for other sampler_types (it's required for sampling for similar utterances) 2024-09-26 18:37:56 -05:00
mrq
f24547ad4e add top_k sampling / offset for prompt similar utterance sampling 2024-09-26 16:26:40 -05:00
mrq
9da630f73a swap order of demo entries, as the model prioritizes adhering to the speaker prompt more (instead of trying to match the ground truth magically) 2024-09-25 23:31:24 -05:00
mrq
e84d466261 vall_e.plot tweaks 2024-09-24 20:05:10 -05:00
mrq
2266d34818 oops 2024-09-21 16:06:01 -05:00