A very strange Issue: "Exception: Empty dataset Error" #465

Open
opened 2023-12-19 15:27:49 +00:00 by DreamFIlmVFX · 2 comments

I'm using this tool to clone my own voice in italian language. So, i've set to "it" for the transcription language in whisper config. i've correctly prepared my dataset following a well explained tutorial, i've succesfully done transcription of a 12 minutes dataset with whisper, but all the transcription are present only in the validation.txt file. there is nothing in the train.txt file. as far i know, the italian language it's well supported by whisper, but why the train.txt file it's empty? Btw, when i go to the next step (generate configuration tab) this is the error that appears (see the screenshots) i don't know how to rid out of this. i hope someone can help me. thank you.

I'm using this tool to clone my own voice in italian language. So, i've set to "it" for the transcription language in whisper config. i've correctly prepared my dataset following a well explained tutorial, i've succesfully done transcription of a 12 minutes dataset with whisper, but all the transcription are present only in the validation.txt file. there is nothing in the train.txt file. as far i know, the italian language it's well supported by whisper, but why the train.txt file it's empty? Btw, when i go to the next step (generate configuration tab) this is the error that appears (see the screenshots) i don't know how to rid out of this. i hope someone can help me. thank you.
DreamFIlmVFX changed title from Exception: Empty dataset Error to A very strange Issue: "Exception: Empty dataset Error" 2023-12-19 15:32:58 +00:00

the problem is the transcription is writing to the wrong file. You need to copy the transcribed caption data in the validation.txt file over to the train.txt file, took me forever to notice.

edit: this worked for an english based project but I am not too sure if it would work for other languages
image

the problem is the transcription is writing to the wrong file. You need to copy the transcribed caption data in the validation.txt file over to the train.txt file, took me forever to notice. edit: this worked for an english based project but I am not too sure if it would work for other languages ![image](/attachments/3ec26b65-232d-4376-ba2c-963274958f42)

the problem is the transcription is writing to the wrong file. You need to copy the transcribed caption data in the validation.txt file over to the train.txt file, took me forever to notice.

edit: this worked for an english based project but I am not too sure if it would work for other languages
image

I have the identical problem in standard English. Were you ever able to repair it? I raised a similar issue recently. I think it was due to an update of one of modules, but I don't know enough code to fix it. / G

> the problem is the transcription is writing to the wrong file. You need to copy the transcribed caption data in the validation.txt file over to the train.txt file, took me forever to notice. > > edit: this worked for an english based project but I am not too sure if it would work for other languages > ![image](/attachments/3ec26b65-232d-4376-ba2c-963274958f42) I have the identical problem in standard English. Were you ever able to repair it? I raised a similar issue recently. I think it was due to an update of one of modules, but I don't know enough code to fix it. / G
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#465
No description provided.