• https://git.ecker.tech/ aims to provide a place to share my efforts while maintaining true ownership of my code, as I do not trust GitHub.

    XMR: 4B9TQdkAkBFYrbj5ztvTx89e5LpucPeTSPzemCihdDi9EBnx7btn8RDNZTBz2zihWsjMnDkzn5As1LU6gLv3KQy8BLsZ8SG

  • Joined on 2022-10-10
mrq pushed to master at mrq/vall-e 2024-11-21 02:33:09 +00:00
6aee08f9c0 moved stuff in the web UI around (un-experimented the max NAR-len steps because its kind of important to adjust this value for better sounding audio / quicker generated audio)
mrq pushed to master at mrq/vall-e 2024-11-21 02:31:33 +00:00
d75f220647 moved stuff in the web UI around (un-experimented the max NAR-len steps because its kind of important to adjust this value for better sounding audio / quicker generated audio)
mrq pushed to master at mrq/vall-e 2024-11-21 01:16:47 +00:00
dfdba3f190 oops
mrq pushed to master at mrq/vall-e 2024-11-20 22:23:29 +00:00
cd6e9ba2f2 oops
mrq pushed to master at mrq/vall-e 2024-11-20 22:06:29 +00:00
1a73ac6a20 I cannot believe it's not actually called Wand DB (added wandb logging support since I think it would have been a much better way to look at my metrics)
mrq pushed to master at mrq/vall-e 2024-11-20 20:18:02 +00:00
67f7bad168 added mixed modality AR+NAR-len to generate a short prefix through the AR, then inference with said prefix through the NAR-len (need to experiment with it more to ensure that the masked off tokens are the only tokens getting updated)
mrq pushed to master at mrq/vall-e 2024-11-20 18:28:41 +00:00
db64e6cb59 dependency updates (gradio 5.x now works on my machine)
mrq pushed to master at mrq/vall-e 2024-11-20 01:15:13 +00:00
efeb55e1b7 documentation update
mrq pushed to master at mrq/vall-e 2024-11-20 00:46:59 +00:00
b1369e7824 better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len)
mrq pushed to master at mrq/vall-e 2024-11-19 18:20:16 +00:00
190a917b3e I did it.
mrq pushed to master at mrq/vall-e 2024-11-19 16:25:46 +00:00
0e621354e7 cleaned up classifier-free guidance logit processing (in order to try and cope with a bad nar-len model)
mrq pushed to master at mrq/vall-e 2024-11-19 03:25:12 +00:00
5ba80686e1 two weeks of agony concludes
mrq pushed to master at mrq/vall-e 2024-11-18 20:08:02 +00:00
2b29790173 oops
mrq pushed to master at mrq/vall-e 2024-11-18 18:42:27 +00:00
4a71981456 normalize sampler index by batch size (if not using batched sampler), add option to cap out utterances for a speaker, some other things
mrq pushed to master at mrq/vall-e 2024-11-18 15:35:43 +00:00
6cfdf94bf9 swap priority to use nar-len if available, added notes
mrq pushed to master at mrq/vall-e 2024-11-17 22:59:43 +00:00
069b27570f set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)
mrq pushed to master at mrq/vall-e 2024-11-17 22:55:44 +00:00
538fbc1ce3 set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)
mrq pushed to master at mrq/vall-e 2024-11-17 16:19:16 +00:00
88d840218d default set cfg strength to 3.0 since the reference model is updated
mrq pushed to master at mrq/vall-e 2024-11-17 15:24:12 +00:00
mrq pushed to master at mrq/vall-e 2024-11-16 21:46:01 +00:00
23fdba0c98 tweaks and changes