|
6845c447c9
|
added more harvard sentences to load from a text file
|
2024-11-21 13:18:11 -06:00 |
|
|
2a084544e8
|
moved duration padding for NAR-len to be a scalar instead (since it seems longer utterances need it much more so than shorter utterances)
|
2024-11-21 13:04:07 -06:00 |
|
|
6aee08f9c0
|
moved stuff in the web UI around (un-experimented the max NAR-len steps because its kind of important to adjust this value for better sounding audio / quicker generated audio)
|
2024-11-20 20:37:33 -06:00 |
|
|
67f7bad168
|
added mixed modality AR+NAR-len to generate a short prefix through the AR, then inference with said prefix through the NAR-len (need to experiment with it more to ensure that the masked off tokens are the only tokens getting updated)
|
2024-11-20 14:22:12 -06:00 |
|
|
b1369e7824
|
better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len)
|
2024-11-19 18:51:17 -06:00 |
|
|
5ba80686e1
|
two weeks of agony concludes
|
2024-11-18 21:29:28 -06:00 |
|
|
6cfdf94bf9
|
swap priority to use nar-len if available, added notes
|
2024-11-18 09:40:04 -06:00 |
|
|
39096f8ff3
|
redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........)
|
2024-11-14 22:17:47 -06:00 |
|
|
2f56696506
|
overhauled inference/sampler kwargs to stop being a bloated mess
|
2024-11-11 20:21:16 -06:00 |
|
|
9cb0b6901b
|
unified nar.py into ar_nar.py
|
2024-11-10 12:19:48 -06:00 |
|
|
a9d2faf2d7
|
all I can do now until I wait for the model to (re)train for pure NAR
|
2024-11-09 22:57:34 -06:00 |
|
|
77ff23e319
|
repeat extend the prom to fill the initial tokens for nar-len (it somewhat works, the model just needs to train more)
|
2024-11-06 23:29:53 -06:00 |
|
|
d229725c76
|
more adjustments (adjustments of early-exit entropy/varentropy thresholds, default rep pen being 1.5, experimental refine-on-stop, etc.)
|
2024-11-03 18:31:28 -06:00 |
|
|
aee08b7307
|
changed layerskip float16 training warning (since it didnt seem to fry on my 4xV100 system)
|
2024-11-03 09:58:29 -06:00 |
|
|
ec79230965
|
shuffled web UI options hidden by cfg.experimental to its own tab, expose early exit selection to inferencing (it kinda works naively, still need to implement self-speculation)
|
2024-11-01 21:30:06 -05:00 |
|
|
4049f51ba9
|
added option to load lora directly from the model file itself with --lora
|
2024-10-26 00:13:10 -05:00 |
|
|
ccf71dc1b6
|
added option to load from a model state dict directly instead of a yaml (to-do: do this for LoRAs too), automatically download the default model if none is provided
|
2024-10-25 22:15:15 -05:00 |
|
|
71731ed785
|
added prefixing with silence (was to test something, currently hidden under cfg.experimental=True)
|
2024-10-18 17:19:52 -05:00 |
|
|
6b04c13c56
|
print warning if audio promtpless inferencing with low AR temp (it really doesn't like low temps / greedy sampling)
|
2024-10-18 17:01:40 -05:00 |
|
|
c8f31db1de
|
default to greedy sample AR (i should probably test this more but it seems to pass my harvard sentences and tongue twisters)
|
2024-10-18 16:58:56 -05:00 |
|
|
fc8dfd8617
|
made greedy AR sampling viable (and preferable), with caveats (per comment in vall_e.models.ar_nar)
|
2024-10-18 16:55:00 -05:00 |
|
|
8b6095f681
|
saner defaults, maybe
|
2024-10-17 14:37:21 -05:00 |
|
|
48461833c2
|
ugh
|
2024-10-15 19:30:43 -05:00 |
|
|
eea70f5698
|
kludge fix for an oversight in the model when trying to train for longer input prompt durations......
|
2024-10-15 19:25:03 -05:00 |
|
|
04e983b86b
|
modified demo page to be more modular with demoing comparisons, actually provide a path to use modified naive attention, entropix sampling is not tied to an experimental yaml flag now
|
2024-10-12 11:27:55 -05:00 |
|
|
d0ab7d755a
|
added min-p (really does not seem useful since it's very sensitive), more tweaks to entropix
|
2024-10-11 22:36:06 -05:00 |
|
|
75a4c866d6
|
more demo page tweaks, added arg to force enable/disable LoRAs for inferencing (to-do: setup arg flags to handle this, and checkbox in web UI)
|
2024-10-10 19:04:12 -05:00 |
|
|
2ea978f318
|
added --eval-random-text-prompts to use random text prompts for eval pass, added --random-prompts for demo page and --lora to use a sample with the lora disabled, probably finally fixed validation dataloader breaking on eval
|
2024-10-10 13:40:25 -05:00 |
|
|
4a8e3ccf06
|
README tweaks, added --input-prompt-prefix as an experiment (its literally better to just not do this, but i'll retain it in case i have a revelation on how to improve it)
|
2024-10-04 18:57:19 -05:00 |
|
|
4f3c7a37c8
|
also do text similarities (dont know what use I'll have for this)
|
2024-09-10 16:45:59 -05:00 |
|
|
1c615a0f52
|
helper script (vall_e.emb.similar) to figure out the best way to compute similarity scores for audio (iunno how to go about it desu)
|
2024-09-10 16:34:23 -05:00 |
|
|
54203c059d
|
validated rep pen for STT (sometimes needed to wrangle the model)
|
2024-09-08 08:30:30 -05:00 |
|
|
a6ad0577b8
|
cleanup the resultant text from STT
|
2024-09-06 18:44:25 -05:00 |
|
|
4bd9bb39c8
|
webui for STT (still need to bake the model to handle it better, a few hours so far has it generate what looks like a normal transcription but does not correlate to the audio right now)
|
2024-09-06 15:13:04 -05:00 |
|
|
94cf81d38c
|
tweak
|
2024-09-05 23:21:18 -05:00 |
|
|
32287710a2
|
moved prints to use logger, edited readme (fused_attn doesnt seem stable for training)
|
2024-08-29 13:27:16 -05:00 |
|
|
b7b99a25f1
|
added ability to specify attention backend for CLI and webui (because im tired of editing the yaml)
|
2024-08-26 19:33:51 -05:00 |
|
|
d7c6be6f78
|
fix weird regression in handling checkpoints when backend is local, but deepspeed checkpoints are in (it was handled with LoRA loading but not real loading...)
|
2024-07-30 22:15:56 -05:00 |
|
|
c2f5b916fc
|
added what I think is DRY sampling
|
2024-07-29 19:15:07 -05:00 |
|
|
75b04686f8
|
added prom-less training / inferencing, some other things
|
2024-07-22 19:36:07 -05:00 |
|
|
d87b492295
|
added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that)
|
2024-07-19 20:49:40 -05:00 |
|
|
3acc54df22
|
allow loading a different model within the web ui (apparently I did not have the web UI in the documentation)
|
2024-07-15 19:59:48 -05:00 |
|
|
bc2a6fa756
|
sanity cleanup: moved experimental features under its own thing
|
2024-06-30 10:37:33 -05:00 |
|
|
8fffb94964
|
backport fix from tortoise_tts with local trainer + loading state when training lora
|
2024-06-25 13:41:29 -05:00 |
|
|
bcf3910a17
|
the NAR only dream is dead (it just won't work)
|
2024-06-12 19:49:47 -05:00 |
|
|
a7a6e0ac76
|
validated that inferencing works, changed some defaults (NAR benefits from greedy sampling)
|
2024-06-09 17:11:38 -05:00 |
|
|
da8242d086
|
finally got around to removing omegaconf
|
2024-06-07 20:23:53 -05:00 |
|
|
b2194b859a
|
re-added loading multiple models because I'm now entertaining having split AR/NAR models again (and need a way to load both at once)
|
2024-06-06 09:48:43 -05:00 |
|
|
ddbacde0d1
|
DAC just doesn't work well enough......
|
2024-05-25 11:07:52 -05:00 |
|
|
ffa200eec7
|
added option to specify frames per second for the given audio representation (Encodec is 75Hz, DAC is 41Hz (at 24K sources))
|
2024-05-04 12:05:41 -05:00 |
|