|
a6c745bafb
|
chinese (mandarin?) support added (I guess I don't need pinyin, but tone markers are handled), korean validated, vocab adjusted
|
2024-12-09 14:26:19 -06:00 |
|
|
2266d34818
|
oops
|
2024-09-21 16:06:01 -05:00 |
|
|
d31f27119a
|
regex replace out the (lang) markers in espeak, updated tokenizer vocab as lazily as possible to not have unk tokens
|
2024-09-21 12:29:28 -05:00 |
|
|
491ae2a684
|
some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...)
|
2024-07-22 00:30:40 -05:00 |
|
|
d9aabfa3ae
|
final tweaks, hopefully, again
|
2024-05-15 23:04:19 -05:00 |
|
|
6a11bc9cb6
|
update tokenizer because, for some reason, it had the wrong order for the special tokens to where eos = unk
|
2024-04-29 09:09:26 -05:00 |
|