mrq - ecker.tech

mrq

https://git.ecker.tech/ aims to provide a place to share my efforts while maintaining true ownership of my code, as I do not trust GitHub.

XMR: 4B9TQdkAkBFYrbj5ztvTx89e5LpucPeTSPzemCihdDi9EBnx7btn8RDNZTBz2zihWsjMnDkzn5As1LU6gLv3KQy8BLsZ8SG
Joined on 2022-10-10

mrq pushed to master at mrq/vall-e

2024-11-21 02:33:09 +00:00

6aee08f9c0 moved stuff in the web UI around (un-experimented the max NAR-len steps because its kind of important to adjust this value for better sounding audio / quicker generated audio)

mrq pushed to master at mrq/vall-e

2024-11-21 02:31:33 +00:00

d75f220647 moved stuff in the web UI around (un-experimented the max NAR-len steps because its kind of important to adjust this value for better sounding audio / quicker generated audio)

mrq pushed to master at mrq/vall-e

2024-11-21 01:16:47 +00:00

dfdba3f190 oops

mrq pushed to master at mrq/vall-e

2024-11-20 22:23:29 +00:00

cd6e9ba2f2 oops

mrq pushed to master at mrq/vall-e

2024-11-20 22:06:29 +00:00

1a73ac6a20 I cannot believe it's not actually called Wand DB (added wandb logging support since I think it would have been a much better way to look at my metrics)

mrq pushed to master at mrq/vall-e

2024-11-20 20:18:02 +00:00

67f7bad168 added mixed modality AR+NAR-len to generate a short prefix through the AR, then inference with said prefix through the NAR-len (need to experiment with it more to ensure that the masked off tokens are the only tokens getting updated)

mrq pushed to master at mrq/vall-e

2024-11-20 18:28:41 +00:00

db64e6cb59 dependency updates (gradio 5.x now works on my machine)

mrq pushed to master at mrq/vall-e

2024-11-20 01:15:13 +00:00

efeb55e1b7 documentation update

mrq pushed to master at mrq/vall-e

2024-11-20 00:46:59 +00:00

b1369e7824 better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len)

mrq pushed to master at mrq/vall-e

2024-11-19 18:20:16 +00:00

190a917b3e I did it.

mrq pushed to master at mrq/vall-e

2024-11-19 16:25:46 +00:00

0e621354e7 cleaned up classifier-free guidance logit processing (in order to try and cope with a bad nar-len model)

mrq pushed to master at mrq/vall-e

2024-11-19 03:25:12 +00:00

5ba80686e1 two weeks of agony concludes

mrq pushed to master at mrq/vall-e

2024-11-18 20:08:02 +00:00

2b29790173 oops

mrq pushed to master at mrq/vall-e

2024-11-18 18:42:27 +00:00

4a71981456 normalize sampler index by batch size (if not using batched sampler), add option to cap out utterances for a speaker, some other things

mrq pushed to master at mrq/vall-e

2024-11-18 15:35:43 +00:00

6cfdf94bf9 swap priority to use nar-len if available, added notes

mrq pushed to master at mrq/vall-e

2024-11-17 22:59:43 +00:00

069b27570f set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)

mrq pushed to master at mrq/vall-e

2024-11-17 22:55:44 +00:00

538fbc1ce3 set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)

mrq pushed to master at mrq/vall-e

2024-11-17 16:19:16 +00:00

88d840218d default set cfg strength to 3.0 since the reference model is updated

mrq pushed to master at mrq/vall-e

2024-11-17 15:24:12 +00:00

a3e1fa3518 ugh

mrq pushed to master at mrq/vall-e

2024-11-16 21:46:01 +00:00

23fdba0c98 tweaks and changes

1 2 3 4 5 ...