Help Needed Problem Training #453

Open
opened 2023-11-17 20:51:39 +00:00 by Aeternus · 0 comments

I believe I am basically clueless on this. I am using: https://zenodo.org/records/7265581 as Dataset, and result I get is basically gibberish. I am not sure what I am doing wrong. My most likely assumption is that my tokenizer is horrible. I need help on few things:

  1. If I am training for German should all mentions of english_cleaners be changed to basic_cleaners?
  2. Is Tokenizer most important part for the training not to result in gibberish ?
  3. Can tokenizer be changed before and after training?
  4. What to do with dataset, should it be fixed in any way? Is Prepare Dataset a must if I already have text for each audio file?
  5. Any other tips and tricks?
I believe I am basically clueless on this. I am using: https://zenodo.org/records/7265581 as Dataset, and result I get is basically gibberish. I am not sure what I am doing wrong. My most likely assumption is that my tokenizer is horrible. I need help on few things: 1. If I am training for German should all mentions of english_cleaners be changed to basic_cleaners? 2. Is Tokenizer most important part for the training not to result in gibberish ? 3. Can tokenizer be changed before and after training? 4. What to do with dataset, should it be fixed in any way? Is Prepare Dataset a must if I already have text for each audio file? 5. Any other tips and tricks?
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#453
No description provided.