|
7617b6485f
|
instead just compute a bunch of stuff on the transcriptions to store later in different names so I can just retrieve what I want, also added tongue twisters for nefarious reasons
|
2024-12-18 23:43:11 -06:00 |
|
|
4775edaa41
|
added text cleaning/normalization for wer purposes but it amounts to nothing desu
|
2024-12-18 19:58:53 -06:00 |
|
|
9090c34f10
|
cringe script to process seed-tts-eval's eval dataset into something i can easily use
|
2024-12-17 22:47:12 -06:00 |
|
|
ed152f78df
|
tweaks to prompt duration to allow me to divorce how i use it for training with how I'm using it for the demo page, and demo page tweaks to make my life easier
|
2024-12-17 19:33:04 -06:00 |
|
|
7129582303
|
actually do proper wer/cer calculation by un-normalizing the scores
|
2024-12-17 14:22:30 -06:00 |
|
|
c2c6d912ac
|
actually do speaker verification
|
2024-12-17 10:11:14 -06:00 |
|
|
c2e17e287b
|
really shoddy voice conversion implementation (it sort of works...)
|
2024-12-16 22:54:53 -06:00 |
|
|
8515038968
|
imagine my disappointment when the epoch finished just for it to throw an exception
|
2024-12-16 18:28:01 -06:00 |
|
|
4a65ac9eb7
|
oops
|
2024-12-15 17:21:51 -06:00 |
|
|
cd4a5f427c
|
KO/ZH model soon
|
2024-12-15 17:01:14 -06:00 |
|
|
4800e7179a
|
remove nan checks because it causes problems in distributed training because I'm not syncing between GPUs (and nan losses gets ignored anyways with loss scaling)
|
2024-12-15 09:42:54 -06:00 |
|
|
2ba6b483dc
|
ugh
|
2024-12-14 22:43:51 -06:00 |
|
|
3dd31e74d1
|
finally figured out a clean way to handle "resuming" the tqdm bar
|
2024-12-14 18:44:43 -06:00 |
|
|
35389481ee
|
move lazy-stored ortho matrix to the grad device for apollo because agony
|
2024-12-13 23:22:26 -06:00 |
|
|
09804ecc16
|
APOLLO tweaks to make it work with deepspeed
|
2024-12-13 23:03:52 -06:00 |
|
|
64c67160a3
|
tweaks
|
2024-12-13 19:00:35 -06:00 |
|
|
0fbfb8bbe8
|
actually save the optimizer for the local engine backend because safetensors doesn't save it
|
2024-12-12 17:12:59 -06:00 |
|
|
f41251f648
|
more fixes for local engine backend
|
2024-12-12 14:38:42 -06:00 |
|
|
6b237ae5e3
|
tweaks for the local engine orchestrator (that I never caught since I always used the deepspeed backend)
|
2024-12-12 13:37:38 -06:00 |
|
|
9a62e3b824
|
APOLLO cringe (doesn't want to work with deepspeed)
|
2024-12-12 00:31:58 -06:00 |
|
|
cddf8ca814
|
sort batches to try and reduce number of padded tokens in batched inference (also commented out F5 samples getting added to the demo page because I would have to regenerate them)
|
2024-12-11 22:45:38 -06:00 |
|
|
20b87bfbd0
|
store metrics and only recalculate them if the output file is newer than the metrics file
|
2024-12-11 20:55:43 -06:00 |
|
|
0c69e798f7
|
template cleanup
|
2024-12-11 20:06:55 -06:00 |
|
|
7e54e897f7
|
also shifted to transformer's pipeline for transcribing
|
2024-12-11 19:57:53 -06:00 |
|
|
b81a98799b
|
uplifting transformer's WavLM stuff to do speaker verification instead
|
2024-12-11 19:30:05 -06:00 |
|
|
6468e5d124
|
lol
|
2024-12-11 19:10:32 -06:00 |
|
|
6f1ee0c6fa
|
Added CER, transcription/similarity model args in demo
|
2024-12-10 21:00:51 -06:00 |
|
|
8568a93dad
|
added WER/SIM-O metrics, added APOLLO but I need to test it
|
2024-12-10 20:13:21 -06:00 |
|
|
a6c745bafb
|
chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted
|
2024-12-09 14:26:19 -06:00 |
|
|
3ef8894290
|
oops
|
2024-12-08 15:24:21 -06:00 |
|
|
1d460b9fe3
|
logic fixes, I feel like output is better? (also NAR can have a temperature, I imagine it couldn't because it was having a causal masked passed to it for the longest time before I caught it a month ago)
|
2024-12-08 14:52:47 -06:00 |
|
|
0c5a458b00
|
deduce language per line to allow for a cheap way to allow for cross-lingual switching, kinda
|
2024-12-07 22:57:29 -06:00 |
|
|
a032ff588f
|
doc update, added automatically deducing language from a given text, also checks if the input is already phonemized text to allow direct control without being cringe (procrastinating adding WER/SIM-O)
|
2024-12-07 22:34:25 -06:00 |
|
|
5d80a2d0d4
|
fixed NAR-len issues with non-english maybe (langs weren't being passed), added interface to inference in batches through tts.batched_inference (no support for rolling context/prefixes because there's no way to do that), demo page uses batched inferencing now
|
2024-12-07 19:21:05 -06:00 |
|
|
1f54bf5b40
|
revert sageattn back to optional dependency because it's not on windows, force resize_modules on by default because I broke something
|
2024-12-07 17:09:39 -06:00 |
|
|
218d0e29fd
|
ugh (batchmean actually expects batch=seq_len, and not the actual batch)
|
2024-12-07 12:39:01 -06:00 |
|
|
61ed662856
|
ACTUALLY actually fix KD-loss (the -inf in the logits was caused by cringecode)
|
2024-12-07 12:31:54 -06:00 |
|
|
f97e8b0c7f
|
ACTUALLY do KD-loss because of an oversight with masked_select outputting 1D tensors that get softmax'd in total
|
2024-12-07 09:52:51 -06:00 |
|
|
34a66e1052
|
agnostified KD
|
2024-12-06 23:53:46 -06:00 |
|
|
953d3eb030
|
ugh
|
2024-12-06 22:35:30 -06:00 |
|
|
42fafbaaca
|
actually fixed knowledge distillation because of errant -inf logits causing problems and needed to be filtered (and splitting text language / output audio language because it helps)
|
2024-12-06 21:55:20 -06:00 |
|
|
23d402bf01
|
added knowledge distillation in the trainer (sadly it is not agnostic because of the grave mistake of further processing the batch within the forward pass, so subsequent calls do not match......)
|
2024-12-05 23:05:52 -06:00 |
|
|
4e21df8092
|
oops
|
2024-12-04 21:24:22 -06:00 |
|
|
93d27be539
|
rolling context finally (use last N utterances as the prefix for the next gen), option to split input text prompt by sentences instead of lines (or no splitting)
|
2024-12-04 20:31:44 -06:00 |
|
|
9dff68c0c5
|
NAR-len tweaks (remasks a small amount of tokens per step, it seems to help with reducing the number of steps needed some of the time?, disable CFG for the first half to speed things up)
|
2024-12-04 09:30:29 -06:00 |
|
|
cf97560e70
|
minimum CFG of 3 for NAR-len because it seems the model will auto-default to NAR-len now
|
2024-12-03 19:40:05 -06:00 |
|
|
ca31da0a95
|
sageattn (forgot to bother with testing this the other day, seems ifne)
|
2024-12-03 15:14:57 -06:00 |
|
|
31ab90d84a
|
cringe code to convert to LlamaForCausalLM-happy weights + tokenizer dict (still need to write logic to actually use these weights for proper inferencing)
|
2024-12-03 10:18:58 -06:00 |
|
|
84a05acb6d
|
touch ups in docs
|
2024-12-02 19:10:42 -06:00 |
|
|
dcaf38b359
|
fixed training tqdm being stubborn
|
2024-11-23 09:45:23 -06:00 |
|
|
41d7c30ea5
|
added much cleaner non-causal mask generation
|
2024-11-22 19:43:32 -06:00 |
|
|
c99a74e834
|
actually generate a causal mask because it seems sometimes it does not actually generate one because it makes assumptions
|
2024-11-22 18:30:24 -06:00 |
|
|
ccee5fc11c
|
that was actually all pointless since sdpa always had an attention mask fed to it and does not need is_causal to implicitly generate one
|
2024-11-22 16:51:50 -06:00 |
|
|
4aa685e749
|
what has science done
|
2024-11-22 16:45:40 -06:00 |
|
|
147219a5e0
|
huge oversight in the attention masking......... (i realized I have not been providing a non-causal mask to non-causal tasks)
|
2024-11-22 13:44:43 -06:00 |
|
|
24d888c47c
|
temporarily dropping support for xformers because it's breaking when using an attention mask (which i dont remember commenting it out when being passed), default to not use wandb because it's being a pain when doing tests and not actual sessionsS)
|
2024-11-22 11:29:12 -06:00 |
|
|
8aafae91fd
|
dont use timeembedding
|
2024-11-21 23:14:52 -06:00 |
|
|
2cef97e43f
|
cleanup
|
2024-11-21 23:08:43 -06:00 |
|
|
3fc0540f49
|
m
|
2024-11-21 15:07:46 -06:00 |
|
|
6845c447c9
|
added more harvard sentences to load from a text file
|
2024-11-21 13:18:11 -06:00 |
|
|
2a084544e8
|
moved duration padding for NAR-len to be a scalar instead (since it seems longer utterances need it much more so than shorter utterances)
|
2024-11-21 13:04:07 -06:00 |
|
|
6aee08f9c0
|
moved stuff in the web UI around (un-experimented the max NAR-len steps because its kind of important to adjust this value for better sounding audio / quicker generated audio)
|
2024-11-20 20:37:33 -06:00 |
|
|
dfdba3f190
|
oops
|
2024-11-20 19:21:03 -06:00 |
|
|
cd6e9ba2f2
|
oops
|
2024-11-20 16:27:51 -06:00 |
|
|
1a73ac6a20
|
I cannot believe it's not actually called Wand DB (added wandb logging support since I think it would have been a much better way to look at my metrics)
|
2024-11-20 16:10:47 -06:00 |
|
|
67f7bad168
|
added mixed modality AR+NAR-len to generate a short prefix through the AR, then inference with said prefix through the NAR-len (need to experiment with it more to ensure that the masked off tokens are the only tokens getting updated)
|
2024-11-20 14:22:12 -06:00 |
|
|
db64e6cb59
|
dependency updates (gradio 5.x now works on my machine)
|
2024-11-20 12:33:01 -06:00 |
|
|
b1369e7824
|
better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len)
|
2024-11-19 18:51:17 -06:00 |
|
|
190a917b3e
|
I did it.
|
2024-11-19 12:24:33 -06:00 |
|
|
0e621354e7
|
cleaned up classifier-free guidance logit processing (in order to try and cope with a bad nar-len model)
|
2024-11-19 10:30:05 -06:00 |
|
|
5ba80686e1
|
two weeks of agony concludes
|
2024-11-18 21:29:28 -06:00 |
|
|
2b29790173
|
oops
|
2024-11-18 14:12:26 -06:00 |
|
|
4a71981456
|
normalize sampler index by batch size (if not using batched sampler), add option to cap out utterances for a speaker, some other things
|
2024-11-18 12:46:50 -06:00 |
|
|
6cfdf94bf9
|
swap priority to use nar-len if available, added notes
|
2024-11-18 09:40:04 -06:00 |
|
|
069b27570f
|
set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)
|
2024-11-17 17:04:07 -06:00 |
|
|
88d840218d
|
default set cfg strength to 3.0 since the reference model is updated
|
2024-11-17 10:23:40 -06:00 |
|
|
a3e1fa3518
|
ugh
|
2024-11-17 09:28:33 -06:00 |
|
|
23fdba0c98
|
tweaks and changes
|
2024-11-16 15:49:06 -06:00 |
|
|
2fbeacfe92
|
ugh
|
2024-11-14 22:18:33 -06:00 |
|
|
39096f8ff3
|
redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........)
|
2024-11-14 22:17:47 -06:00 |
|
|
ef05c951ff
|
adjust fp16 loss scaling since I fried a model overnight when it hit 8K scale
|
2024-11-14 09:23:52 -06:00 |
|
|
e412e98125
|
ugh
|
2024-11-14 07:34:22 -06:00 |
|
|
c00fc18b62
|
actually use the right embedding for nar-len
|
2024-11-13 18:04:04 -06:00 |
|
|
3ea8a610d6
|
fix STT
|
2024-11-13 14:27:15 -06:00 |
|
|
910033343c
|
overhauled how the right resp level / classifier gets picked to avoid cringemath
|
2024-11-13 13:31:17 -06:00 |
|
|
269648605e
|
move NAR-len rvq level 0 to separate embedding
|
2024-11-13 11:38:58 -06:00 |
|
|
29e45be0b4
|
tweaks to bucket sampling
|
2024-11-13 11:09:24 -06:00 |
|
|
b2eca271a8
|
ugh
|
2024-11-13 10:35:44 -06:00 |
|
|
be83ddabaa
|
better causal-ness for split loss calc, and also do masking for NAR-len for it
|
2024-11-13 10:17:52 -06:00 |
|
|
6b76419123
|
ugh
|
2024-11-13 09:54:20 -06:00 |
|
|
ad7cfffc00
|
NAR-len RVQ-0 was being trained causally.............
|
2024-11-13 09:43:50 -06:00 |
|
|
976ee87f6f
|
resume iteration step in tqdm trainer, warn to logger if the sampler state dict was invalidated
|
2024-11-13 09:09:28 -06:00 |
|
|
8286aa54c8
|
do not pass timestep token/embedding since it doesn't seem to matter at all after all, fixed training masking rate to 80% because a paper said so
|
2024-11-13 09:07:10 -06:00 |
|
|
caf721c67b
|
set it to zero because it'll make the stop token hide more often than not
|
2024-11-12 22:30:50 -06:00 |
|
|
0f2584eba7
|
new meme sampler PogChamp new meme sampler PogChamp (it sort of helps?)
|
2024-11-12 22:30:09 -06:00 |
|
|
663f07038d
|
haha... (do not create a token dropout/noise mask when not training (this sadly didnt fix NAR-len output))
|
2024-11-12 16:41:58 -06:00 |
|
|
b09328069e
|
actually do CFG sampling for base AR+NAR tasks
|
2024-11-12 13:42:39 -06:00 |
|
|
2495a7ef67
|
Fixed STT in the web UI
|
2024-11-12 12:49:53 -06:00 |
|
|
8927bad7bc
|
actually fixed rep pen (for ar and nar, it seems to help with nar unmasking)
|
2024-11-11 21:40:19 -06:00 |
|
|
ec92613847
|
actually pass input prompt length size to inference
|
2024-11-11 20:39:48 -06:00 |
|
|
b1df6a7bed
|
reverted rep pen sampler due to a regression
|
2024-11-11 20:35:08 -06:00 |
|
|
b1f4db39c8
|
threw in CFG sampling for normal model as well to experiment with
|
2024-11-11 20:27:38 -06:00 |
|
|
2f56696506
|
overhauled inference/sampler kwargs to stop being a bloated mess
|
2024-11-11 20:21:16 -06:00 |
|
|
354f8e059d
|
store dataset hash alongside state dict so it can be ignored if mismatched
|
2024-11-11 18:16:56 -06:00 |
|
|
f7b8b1e825
|
dropped subtrain dataloader since its useless to duplicate
|
2024-11-11 17:00:49 -06:00 |
|
|
cf9df71f2c
|
use homwbrewed caching system for dataloader paths / durations (I'm pretty sure I am now triggering OOM killers with my entire dataset used)
|
2024-11-11 16:32:08 -06:00 |
|
|
a748e223ce
|
tweaks
|
2024-11-11 12:40:41 -06:00 |
|
|
48490757da
|
fixes
|
2024-11-10 20:37:50 -06:00 |
|
|
9def34cd66
|
lol
|
2024-11-10 12:48:41 -06:00 |
|
|
9cb0b6901b
|
unified nar.py into ar_nar.py
|
2024-11-10 12:19:48 -06:00 |
|
|
a9d2faf2d7
|
all I can do now until I wait for the model to (re)train for pure NAR
|
2024-11-09 22:57:34 -06:00 |
|
|
ad7e290a5e
|
ugh (ROCm seems to silently clamp any token value >= logits.shape[-1] for loss calculation, while cuda will throw an assert, making it hard to find this dumb fuckup)
|
2024-11-09 19:40:02 -06:00 |
|
|
943fe70c10
|
I don't know why this fixes an assert thrown but it does
|
2024-11-09 19:04:13 -06:00 |
|
|
f50d92ba6c
|
Almost made a mistake
|
2024-11-09 18:12:54 -06:00 |
|
|
c6a38693a2
|
This better work
|
2024-11-09 18:04:59 -06:00 |
|
|
8b3d1cf70a
|
Something's Wrong
|
2024-11-09 15:07:43 -06:00 |
|
|
dcd5fecff3
|
some cleanup while I wait for the NAR-len to train to an acceptable state (currently it performs okay, but only on audo after 3 seconds or so)
|
2024-11-09 12:12:46 -06:00 |
|
|
69b0b3b854
|
set timestep tensor to whatever the time embedding's dtype is because it'll gripe under amp
|
2024-11-09 00:11:16 -06:00 |
|
|
5a09a5f6e9
|
I forgot about the time embedding...
|
2024-11-08 22:46:26 -06:00 |
|
|
811b15d280
|
I suppose I just have a shit training method since the sampler is as solid as I can get it...............
|
2024-11-08 22:05:41 -06:00 |
|
|
13b54953bd
|
agony
|
2024-11-08 13:34:39 -06:00 |
|
|
c127c4e488
|
'borrowed' a sampling scheduler for NAR-len's RVQ level 0 (better than before, but still not good enough)
|
2024-11-07 21:19:14 -06:00 |
|
|
e108c54daf
|
new NAR-len training paradigm......
|
2024-11-07 11:32:11 -06:00 |
|
|
ed174c589e
|
ugh
|
2024-11-07 09:19:21 -06:00 |
|
|
d13ab00ad8
|
one more note
|
2024-11-07 09:11:21 -06:00 |
|
|
5698188824
|
あたしって、ほんとバカ
|
2024-11-07 09:10:18 -06:00 |
|
|
77ff23e319
|
repeat extend the prom to fill the initial tokens for nar-len (it somewhat works, the model just needs to train more)
|
2024-11-06 23:29:53 -06:00 |
|
|
a3bc26f7ec
|
ugh
|
2024-11-06 23:16:28 -06:00 |
|
|
d606a693ff
|
eval fix for nar-len
|
2024-11-06 23:14:16 -06:00 |
|
|
105ed51159
|
I guess I'll fall for the NAR-len meme again (I don't know where my previous weights are, so I need to train it again to test something)
|
2024-11-06 19:17:12 -06:00 |
|
|
bcabde3454
|
more notes
|
2024-11-06 13:51:28 -06:00 |
|
|
bfc5e1d723
|
agony
|
2024-11-05 22:30:49 -06:00 |
|
|
aefe8fcdad
|
UGH
|
2024-11-05 22:13:58 -06:00 |
|
|
556d9db0d5
|
web UI support for HF ZeroGPU
|
2024-11-05 21:38:02 -06:00 |
|
|
e58a9469a3
|
move layerskip to experimental settings.......
|
2024-11-05 20:37:06 -06:00 |
|
|
bbc2de3713
|
ugh
|
2024-11-05 11:50:05 -06:00 |
|
|
9e65e05e83
|
more windows specific fixes, limit gradio to <5.0.0 on linux (it works on windows, but not on my linux machine tm)
|
2024-11-04 18:00:33 -06:00 |
|
|
c83670c38c
|
Windows specific fixes (to-do: find libespeak-ng.dll automatically because it cannot be trusted to do it by default)
|
2024-11-03 19:19:15 -06:00 |
|
|
d229725c76
|
more adjustments (adjustments of early-exit entropy/varentropy thresholds, default rep pen being 1.5, experimental refine-on-stop, etc.)
|
2024-11-03 18:31:28 -06:00 |
|
|
aee08b7307
|
changed layerskip float16 training warning (since it didnt seem to fry on my 4xV100 system)
|
2024-11-03 09:58:29 -06:00 |
|
|
3826f9bae4
|
saner mask creation? (it doesnt matter, kv cache wont work)
|
2024-11-02 21:00:21 -05:00 |
|
|
ded746e157
|
very, very naive layerskip speculative sampling (it just checks if the current layer's state is good enough)
|
2024-11-02 11:49:05 -05:00 |
|
|
62fe5b0943
|
ughh
|
2024-11-01 22:36:48 -05:00 |
|
|
ec79230965
|
shuffled web UI options hidden by cfg.experimental to its own tab, expose early exit selection to inferencing (it kinda works naively, still need to implement self-speculation)
|
2024-11-01 21:30:06 -05:00 |
|
|
ef1c17430f
|
skip step on nan loss (ironically I have not had a nan loss after adding this), throw exception with invalid cfg.dataset.sample_type and sample_order combination (because I was tricked by this in my yaml and had inconsistent vram usage)
|
2024-11-01 20:54:53 -05:00 |
|
|
fb8faa295b
|
actually float16(+AMP) and layerskip is bad and will kill the model......
|
2024-11-01 18:36:44 -05:00 |
|
|
edf1e66bf9
|
layerskip_r=6 fries the model so hard the loss is sub-1...
|
2024-11-01 17:06:07 -05:00 |
|
|
9b6c57bc57
|
third time's the charm (for some reason it escaped me that I should treat early exit loss as an aux_loss to be used with the normal loss, as if I was training a MoE's router)
|
2024-11-01 12:50:37 -05:00 |
|
|
76ebef45dc
|
off-by-one...
|
2024-10-31 13:24:48 -05:00 |
|
|
b63293cbbe
|
ugh
|
2024-10-30 22:49:11 -05:00 |
|
|
a22534e8f4
|
layer skip training implemented (need to gut the inferencing from the repo, and to actually see if the model can benefit from this)
|
2024-10-30 20:05:45 -05:00 |
|
|
4049f51ba9
|
added option to load lora directly from the model file itself with --lora
|
2024-10-26 00:13:10 -05:00 |
|
|
ccf71dc1b6
|
added option to load from a model state dict directly instead of a yaml (to-do: do this for LoRAs too), automatically download the default model if none is provided
|
2024-10-25 22:15:15 -05:00 |
|
|
a96f5aee32
|
adjusted how i want to pass eval kwargs
|
2024-10-25 20:38:09 -05:00 |
|
|
92e6bff6dc
|
actually ar temp 0.5 with rep pen 1.125 seems to have the benefits of better outputs without it degrading some of the time but not all the time
|
2024-10-23 00:03:35 -05:00 |
|
|
8920e5e86b
|
actually have beam_width in the webUI work
|
2024-10-22 22:06:22 -05:00 |
|
|
910571ad34
|
too brainlet to diagnose why low temp / greedy sampling is randomly unstable some of the time
|
2024-10-22 20:13:54 -05:00 |
|
|
8eb9a4056b
|
modified default arguments (ar temp = 0 and rep pen = 1.125 seems to be stable, at least given the few things i tested), do not pass top k/top p/min p to NAR even though technically none of those things should matter when greedy sampling
|
2024-10-22 18:12:39 -05:00 |
|
|
1a02cd5bce
|
modify demo template to say F5 instead of YourTTS, swap LoRA comparison around to make the lora'd the base file, and the no-lora the suffix'd file
|
2024-10-21 19:52:02 -05:00 |
|
|
02dfc60ac3
|
ugh
|
2024-10-18 17:23:22 -05:00 |
|
|
71731ed785
|
added prefixing with silence (was to test something, currently hidden under cfg.experimental=True)
|
2024-10-18 17:19:52 -05:00 |
|
|
6b04c13c56
|
print warning if audio promtpless inferencing with low AR temp (it really doesn't like low temps / greedy sampling)
|
2024-10-18 17:01:40 -05:00 |
|
|
c8f31db1de
|
default to greedy sample AR (i should probably test this more but it seems to pass my harvard sentences and tongue twisters)
|
2024-10-18 16:58:56 -05:00 |
|
|
fc8dfd8617
|
made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar)
|
2024-10-18 16:55:00 -05:00 |
|
|
07f4935a75
|
more tweaks
|
2024-10-18 13:19:36 -05:00 |
|
|
0dfab973e7
|
oops
|
2024-10-18 09:40:06 -05:00 |
|
|
75b90be325
|
cleaned up unused config flags, allow less strict yaml by pruning missing keys, renamed some dataset configs to be more unified
|
2024-10-17 17:06:48 -05:00 |
|
|
8b6095f681
|
saner defaults, maybe
|
2024-10-17 14:37:21 -05:00 |
|
|
f88097ccf6
|
add config option to set the rate of sampling randomly vs similar speakers during training
|
2024-10-16 14:27:58 -05:00 |
|
|
48461833c2
|
ugh
|
2024-10-15 19:30:43 -05:00 |
|
|
eea70f5698
|
kludge fix for an oversight in the model when trying to train for longer input prompt durations......
|
2024-10-15 19:25:03 -05:00 |
|
|
84005c5b00
|
entropix apparently processes the entire sequence of logits but it falls apart when doing that
|
2024-10-13 12:01:12 -05:00 |
|
|
c800d28bb8
|
respect attention defined in the yaml for web UI (which might explain why theres been a discrepancy in outputs for me)
|
2024-10-13 11:02:24 -05:00 |
|
|
ed6b7a690f
|
ugh.........
|
2024-10-13 00:26:46 -05:00 |
|
|
d405f243d4
|
at wits end in trying to output the right attention scores
|
2024-10-12 23:53:13 -05:00 |
|
|
70cf694cfd
|
output attention scores for SDPA/flash, since naive attention seems broken
|
2024-10-12 12:09:17 -05:00 |
|
|
541e45263c
|
ugh
|
2024-10-12 11:29:16 -05:00 |
|
|
04e983b86b
|
modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now
|
2024-10-12 11:27:55 -05:00 |
|
|
666e8038fb
|
ugh
|
2024-10-12 10:41:35 -05:00 |
|
|
3d6ef9666b
|
overridden naive llama attention to get the right score values that entropix needs
|
2024-10-12 10:05:47 -05:00 |
|
|
40b089daf3
|
lol
|
2024-10-12 09:57:34 -05:00 |
|
|
d6f7c86a5c
|
entropix tweaks (it doesn't output garbage but it loves to go for silence)
|
2024-10-12 09:46:18 -05:00 |
|
|
d0ab7d755a
|
added min-p (really does not seem useful since it's very sensitive), more tweaks to entropix
|
2024-10-11 22:36:06 -05:00 |
|
|
bef43a0c18
|
added experimental entropix sampling support
|
2024-10-11 21:18:26 -05:00 |
|
|
85d85c1351
|
more arg creep for demo page
|
2024-10-10 19:40:01 -05:00 |
|
|
301468f519
|
<<
|
2024-10-10 19:13:52 -05:00 |
|
|
75a4c866d6
|
more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI)
|
2024-10-10 19:04:12 -05:00 |
|
|
96d05be73c
|
demo page tweaks
|
2024-10-10 13:52:37 -05:00 |
|
|
2ea978f318
|
added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval
|
2024-10-10 13:40:25 -05:00 |
|
|
52299127ab
|
fix vall_e.emb.process
|
2024-10-08 20:00:34 -05:00 |
|
|
0656a762af
|
fix vall_e.emb.transcriber
|
2024-10-08 19:24:43 -05:00 |
|
|
acdce66d4e
|
readme tweaks, set the (unused) default model download URL back to the base ar+nar-llama-8 model, as ar+nar-tts+stt-llama-8 was renamed back to it since it performs well
|
2024-10-05 22:53:53 -05:00 |
|
|
84c7419001
|
faster
|
2024-10-04 22:30:47 -05:00 |
|
|
a507b769a1
|
sped up inferencing by not doing .tolist() for rep pen / length pen (and a bug fix in the web UI from prev commit)
|
2024-10-04 22:18:20 -05:00 |
|
|
4a8e3ccf06
|
README tweaks, added --input-prompt-prefix as an experiment (its literally better to just not do this, but i'll retain it in case i have a revelation on how to improve it)
|
2024-10-04 18:57:19 -05:00 |
|
|
a9fa0898a9
|
tweaked demo page script to sample speakers instead
|
2024-09-28 10:50:26 -05:00 |
|
|
2f1dca3089
|
added language selection in web UI, tweaked demo script
|
2024-09-28 09:49:45 -05:00 |
|
|
10df2ef5f3
|
fixed oversight where input audio does not resample (lol...)
|
2024-09-27 20:27:53 -05:00 |
|
|
039482a48e
|
don't do eval on stt because it's so slow and I don't even bother doing any metrics against it anyways (to-do: make this a flag)
|
2024-09-26 18:56:57 -05:00 |
|
|
ff7a1b4163
|
coerce into path for other sampler_types (it's required for sampling for similar utterances)
|
2024-09-26 18:37:56 -05:00 |
|