Help Needed Problem Training #453

New Issue

Aeternus · 2023-11-17T20:51:39Z

Aeternus commented

2023-11-17 20:51:39 +00:00

I believe I am basically clueless on this. I am using: https://zenodo.org/records/7265581 as Dataset, and result I get is basically gibberish. I am not sure what I am doing wrong. My most likely assumption is that my tokenizer is horrible. I need help on few things:

If I am training for German should all mentions of english_cleaners be changed to basic_cleaners?
Is Tokenizer most important part for the training not to result in gibberish ?
Can tokenizer be changed before and after training?
What to do with dataset, should it be fixed in any way? Is Prepare Dataset a must if I already have text for each audio file?
Any other tips and tricks?

I believe I am basically clueless on this. I am using: https://zenodo.org/records/7265581 as Dataset, and result I get is basically gibberish. I am not sure what I am doing wrong. My most likely assumption is that my tokenizer is horrible. I need help on few things: 1. If I am training for German should all mentions of english_cleaners be changed to basic_cleaners? 2. Is Tokenizer most important part for the training not to result in gibberish ? 3. Can tokenizer be changed before and after training? 4. What to do with dataset, should it be fixed in any way? Is Prepare Dataset a must if I already have text for each audio file? 5. Any other tips and tricks?

Sign in to join this conversation.