• https://git.ecker.tech/ aims to provide a place to share my efforts while maintaining true ownership of my code, as I do not trust GitHub.

    XMR: 4B9TQdkAkBFYrbj5ztvTx89e5LpucPeTSPzemCihdDi9EBnx7btn8RDNZTBz2zihWsjMnDkzn5As1LU6gLv3KQy8BLsZ8SG

  • Joined on 2022-10-10
mrq pushed to master at mrq/vall-e 2024-11-20 01:15:13 +00:00
efeb55e1b7 documentation update
mrq pushed to master at mrq/vall-e 2024-11-20 00:46:59 +00:00
b1369e7824 better modality selection (pick AR+NAR by default for the ar+nar model, pick NAR-len by default for the nar-len model), lowered default CFG because it makes the AR+NAR output sped up (but can't be too low since it's required for the NAR-len)
mrq pushed to master at mrq/vall-e 2024-11-19 18:20:16 +00:00
190a917b3e I did it.
mrq pushed to master at mrq/vall-e 2024-11-19 16:25:46 +00:00
0e621354e7 cleaned up classifier-free guidance logit processing (in order to try and cope with a bad nar-len model)
mrq pushed to master at mrq/vall-e 2024-11-19 03:25:12 +00:00
5ba80686e1 two weeks of agony concludes
mrq pushed to master at mrq/vall-e 2024-11-18 20:08:02 +00:00
2b29790173 oops
mrq pushed to master at mrq/vall-e 2024-11-18 18:42:27 +00:00
4a71981456 normalize sampler index by batch size (if not using batched sampler), add option to cap out utterances for a speaker, some other things
mrq pushed to master at mrq/vall-e 2024-11-18 15:35:43 +00:00
6cfdf94bf9 swap priority to use nar-len if available, added notes
mrq pushed to master at mrq/vall-e 2024-11-17 22:59:43 +00:00
069b27570f set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)
mrq pushed to master at mrq/vall-e 2024-11-17 22:55:44 +00:00
538fbc1ce3 set option to set training masking ratio (I don't think for tts a fixed masking ratio is beneficial since the magic of the AR+NAR is being able to still reference the prior sequence of tokens for predicting things)
mrq pushed to master at mrq/vall-e 2024-11-17 16:19:16 +00:00
88d840218d default set cfg strength to 3.0 since the reference model is updated
mrq pushed to master at mrq/vall-e 2024-11-17 15:24:12 +00:00
mrq pushed to master at mrq/vall-e 2024-11-16 21:46:01 +00:00
23fdba0c98 tweaks and changes
mrq pushed to master at mrq/vall-e 2024-11-15 04:14:27 +00:00
mrq pushed to master at mrq/vall-e 2024-11-15 04:14:07 +00:00
39096f8ff3 redid loss calculation to be cleaner, and position ID generation, and other things (I might need to train the NAR-len from scratch and not resume from an existing checkpoint.........)
mrq pushed to master at mrq/vall-e 2024-11-14 15:19:32 +00:00
ef05c951ff adjust fp16 loss scaling since I fried a model overnight when it hit 8K scale
mrq pushed to master at mrq/vall-e 2024-11-14 13:30:22 +00:00
mrq pushed to master at mrq/vall-e 2024-11-13 23:59:42 +00:00
c00fc18b62 actually use the right embedding for nar-len
mrq pushed to master at mrq/vall-e 2024-11-13 20:22:53 +00:00
3ea8a610d6 fix STT
mrq pushed to master at mrq/vall-e 2024-11-13 19:26:59 +00:00
910033343c overhauled how the right resp level / classifier gets picked to avoid cringemath