Non-English Tokenizer #322

Open
opened 2023-08-03 17:18:45 +00:00 by Epentibi · 1 comment

Hello, I am planning to train a Chinese model, how would I generate a tokenizer similar to the Japanese one? Thanks.

Hello, I am planning to train a Chinese model, how would I generate a tokenizer similar to the Japanese one? Thanks.

It depends on if you intend to process input in bopomofo, pinyin, or 中文。

It depends on if you intend to process input in bopomofo, pinyin, or 中文。
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#322
No description provided.