|
84a05acb6d
|
touch ups in docs
|
2024-12-02 19:10:42 -06:00 |
|
|
6aee08f9c0
|
moved stuff in the web UI around (un-experimented the max NAR-len steps because its kind of important to adjust this value for better sounding audio / quicker generated audio)
|
2024-11-20 20:37:33 -06:00 |
|
|
b1369e7824
|
better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len)
|
2024-11-19 18:51:17 -06:00 |
|
|
069b27570f
|
set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)
|
2024-11-17 17:04:07 -06:00 |
|
|
88d840218d
|
default set cfg strength to 3.0 since the reference model is updated
|
2024-11-17 10:23:40 -06:00 |
|
|
0f2584eba7
|
new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?)
|
2024-11-12 22:30:09 -06:00 |
|
|
9e65e05e83
|
more windows specific fixes, limit gradio to <5.0.0 on linux (it works on windows, but not on my linux machine tm)
|
2024-11-04 18:00:33 -06:00 |
|
|
c83670c38c
|
Windows specific fixes (to-do: find libespeak-ng.dll automatically because it cannot be trusted to do it by default)
|
2024-11-03 19:19:15 -06:00 |
|
|
d229725c76
|
more adjustments (adjustments of early-exit entropy/varentropy thresholds, default rep pen being 1.5, experimental refine-on-stop, etc.)
|
2024-11-03 18:31:28 -06:00 |
|
|
aee08b7307
|
changed layerskip float16 training warning (since it didnt seem to fry on my 4xV100 system)
|
2024-11-03 09:58:29 -06:00 |
|
|
a22534e8f4
|
layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this)
|
2024-10-30 20:05:45 -05:00 |
|
|
4049f51ba9
|
added option to load lora directly from the model file itself with --lora
|
2024-10-26 00:13:10 -05:00 |
|
|
8920e5e86b
|
actually have beam_width in the webUI work
|
2024-10-22 22:06:22 -05:00 |
|
|
8eb9a4056b
|
modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling
|
2024-10-22 18:12:39 -05:00 |
|
|
1a02cd5bce
|
modify demo template to say F5 instead of YourTTS, swap LoRA comparison around to make the lora'd the base file, and the no-lora the suffix'd file
|
2024-10-21 19:52:02 -05:00 |
|
|
02dfc60ac3
|
ugh
|
2024-10-18 17:23:22 -05:00 |
|
|
c8f31db1de
|
default to greedy sample AR (i should probably test this more but it seems to pass my harvard sentences and tongue twisters)
|
2024-10-18 16:58:56 -05:00 |
|
|
07f4935a75
|
more tweaks
|
2024-10-18 13:19:36 -05:00 |
|
|
0dfab973e7
|
oops
|
2024-10-18 09:40:06 -05:00 |
|
|
70cf694cfd
|
output attention scores for SDPA/flash, since naive attention seems broken
|
2024-10-12 12:09:17 -05:00 |
|
|
541e45263c
|
ugh
|
2024-10-12 11:29:16 -05:00 |
|
|
04e983b86b
|
modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now
|
2024-10-12 11:27:55 -05:00 |
|
|
bef43a0c18
|
added experimental entropix sampling support
|
2024-10-11 21:18:26 -05:00 |
|
|
85d85c1351
|
more arg creep for demo page
|
2024-10-10 19:40:01 -05:00 |
|
|
301468f519
|
<<
|
2024-10-10 19:13:52 -05:00 |
|
|
75a4c866d6
|
more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI)
|
2024-10-10 19:04:12 -05:00 |
|
|
96d05be73c
|
demo page tweaks
|
2024-10-10 13:52:37 -05:00 |
|
|
2ea978f318
|
added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval
|
2024-10-10 13:40:25 -05:00 |
|
|
a9fa0898a9
|
tweaked demo page script to sample speakers instead
|
2024-09-28 10:50:26 -05:00 |
|
|
2f1dca3089
|
added language selection in web UI, tweaked demo script
|
2024-09-28 09:49:45 -05:00 |
|
|
039482a48e
|
don't do eval on stt because it's so slow and I don't even bother doing any metrics against it anyways (to-do: make this a flag)
|
2024-09-26 18:56:57 -05:00 |
|
|
9da630f73a
|
swap order of demo entries, as the model prioritizes adhering to the speaker prompt more (instead of trying to match the ground truth magically)
|
2024-09-25 23:31:24 -05:00 |
|
|
32287710a2
|
moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)
|
2024-08-29 13:27:16 -05:00 |
|
|
3a65cc4b22
|
fix issue with sft and shared tensors...
|
2024-08-04 19:56:21 -05:00 |
|
|
ad024f400f
|
actually pass language into dataset process script, fix coercing japanese into hiragana because espeak does not like kanji
|
2024-07-21 23:21:37 -05:00 |
|
|
3e5ca3a201
|
more demo page tweaks
|
2024-07-21 19:31:13 -05:00 |
|
|
7366f36f81
|
oops
|
2024-07-21 19:17:25 -05:00 |
|
|
e19aa643a6
|
cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training
|
2024-07-21 19:12:03 -05:00 |
|
|
ba7ee8c0ee
|
added demo link to readme
|
2024-07-19 21:22:30 -05:00 |
|
|
9ec88d9444
|
validated passing URI path for assets instead of base64 encoding them
|
2024-07-19 21:07:17 -05:00 |
|
|
d87b492295
|
added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)
|
2024-07-19 20:49:40 -05:00 |
|