My naivety left it in as-is, and had whatever DLAS/tortoise uses handle it (which seem to be unidecode). I probably should have validated more, but naturally, hindsight is 20/20.
Do the…
I've had my Japanese finetunes that proved to vary from unfavorable to decent, but those stem from other issues I've since resolved.
Did you convert all the Kanji in your training set to…
@psammites yes it's correct
So before a vowel is it labiodental? Or is it bilabial and part of a dipthong?
phonemize
and the SAMPA included in [SOFES](https://www.clarin.si/repository/xm…
@nk990 Could you have a look at Wikipedia:Slovene Phonology and tell me how accurate it is? In particular the /ʋ/ section:
- Before a…
I saw you started to train the Slovenian model, any results, could you share some examples?
@nk990 As of 200 iterations it wasn't any good. I might have picked the wrong learning rates, I'll…
@nk990 Do you have an IPA-annotated Slovnenian dataset? I added the missing symbols to models/tokenizers/ipa.json but SOFES is transcribed in SAMPA and the UCLA Phonetics Lab Slovenian corpus is…
Tried again, added some missing IPAs and merges; sounds like James Sunderland if he was Fr*nch: https://vocaroo.com/1c9ylHLoNrwU ("You're not Mary." =>
jʊɹ nɑːt mɛɹi.
).
That's with…
Although, redefining the token vocab might mean having to retrain an entire new model.
I might take a crack at this weekend, weather permitting. Now that you've added the option to select…
$env:PHONEMIZER_ESPEAK_LIBRARY='C:\Program Files\eSpeak NG\libespeak-ng.dll'
I both used
set
and manually declared the env var in whatever Windows sand it didn't like that. I don't…
As I understand it if I provide my own dataset then I have to provide my own inference model trained on it
No shit. You already need to provide your own model anyways, as there's no…
This doesn't look like anywhere near enough coverage to do true multi-lingual speech. Unless they're getting it somewhere else?
That's generated from the LibriTTS dataset specifically.…
I started disabling validation because I got better results before it was introduced
Validation has no effect on training quality (it sometimes will eat up your total iteration count…
That's about pushing the limits of what you can do without replacing the tokenizer:
[ć] /tɕ/ -> [w]
[č] /tʃ/ -> [ch]
[d] /d/ -> [q]
[đ] /dʑ/ -> [x]
[dž] /dʒ/ -> [dzh]
[ž]…
At last, I can train it to speak Ubykh (or at least pronounce გვფრცქვნი)!
@psammites Can you also share you experience with us? Like how many clips do you have in your…
I caved and added a way to override the tokenizer JSON under Settings, because I realized it actually does affect Japanese (at least, from seeing it merge the "phonemes"). The overrided tokenizer…
I'm not too sure of the implications though. They're not true phonemes, rather virtual-ones (in the loosest sense). In theory, if you were able to magically source a better tuned tokenizer.json…
Trim silence, text cull length 4, audio cull length 1 second.
Just to be explicit: "text cull length" is Validation Text Length Threshold, and "audio cull length" is Validation Audio…