mrq
-
https://git.ecker.tech/ aims to provide a place to share my efforts while maintaining true ownership of my code, as I do not trust GitHub.
XMR: 4B9TQdkAkBFYrbj5ztvTx89e5LpucPeTSPzemCihdDi9EBnx7btn8RDNZTBz2zihWsjMnDkzn5As1LU6gLv3KQy8BLsZ8SG
- Joined on
2022-10-10
Block a user
9da630f73a
swap order of demo entries, as the model prioritizes adhering to the speaker prompt more (instead of trying to match the ground truth magically)
536c11c4ac
actually validated and fixed sampling similar utterances for the prompt (hopefully nothing else is needed)
d31f27119a
regex replace out the (lang) markers in espeak, updated tokenizer vocab as lazily as possible to not have unk tokens
fe241f6a99
support for wildcard in training/validation/noise dataset array (to-do: a better way to query between metadata folder and data folder)
b5bec0c9ce
oops, turns out these are not split by speaker names already........ (also added sampling the dataset in the webui for easy viewing)
ebac1db16c
maybe final tweaks, I really needed to unify my json read/write and orjson is proven to be fast enough for me to try and rely on it more
804ddb5182
optimizations (6 hours to do cosine similarities on a speaker set of just 17k utterances................)