Recreate dataset after correcting whisper.json doesn't overwrite the train.txt #389

Closed
opened 2023-09-17 19:37:16 +07:00 by DoctorPopi · 1 comments

Hey there,

So hitting "Process and transcribe", I've went ahead and corrected manually the whisper.json. I thought that by hitting "Recreate Dataset" it would automatically correct the train.txt according to the corrections I made in the json, but somehow it doesn't work anymore... Maybe there's an option I'm not seeing?

EDIT : Okay it seemed I got confused somewhere down the line, because I spotted an error in one of my clips. So I wanted to replace it, and then make sure it was taken into account, so what I did was reclick on "trancscribe and process" after having changed the audio in the voices folder, and then I replaced the whisper.json with the backup one I had already corrected. And that's there that the "Recreate dataset" would not update my train.txt.

Which brings the question: is it possible to replace audio clips once you've started the transcription, and avoid having to correct the .json all over again?

Thank you :)

Hey there, So hitting "Process and transcribe", I've went ahead and corrected manually the whisper.json. I thought that by hitting "Recreate Dataset" it would automatically correct the train.txt according to the corrections I made in the json, but somehow it doesn't work anymore... Maybe there's an option I'm not seeing? EDIT : Okay it seemed I got confused somewhere down the line, because I spotted an error in one of my clips. So I wanted to replace it, and then make sure it was taken into account, so what I did was reclick on "trancscribe and process" after having changed the audio in the voices folder, and then I replaced the whisper.json with the backup one I had already corrected. And that's there that the "Recreate dataset" would not update my train.txt. Which brings the question: is it possible to replace audio clips once you've started the transcription, and avoid having to correct the .json all over again? Thank you :)

Okay I'm closing the issue because I figured out my process. For those who are interested, here's what I think you have to do:

Case 1 - You want to add more audio clips after having started retranscription

  • Add your data clips to the voice folder
  • In the training tab, check the "Skip existing" option, and then hit "Transcribe and process"
  • Add the new corrections to the already existing Json
  • Hit "Recreate dataset"
  • The new corrections and the old ones should be in the train.txt

Case 2 - You want to modify an audio clip that was already transcribed

  • Modify your audio clip and put it in the voice folder, under a different name (like add an index or something)
  • Delete the former clip
  • In the training tab, check the "Skip existing" and then hit "Transcribe and process"
  • in the Json, you will see the new version of the clip, as well as the old one. Simply delete the data block corresponding to the former clip
  • Hit "Recreate dataset"
  • In the train.txt, you should only see the latest version

Enjoy

Okay I'm closing the issue because I figured out my process. For those who are interested, here's what I think you have to do: Case 1 - You want to add more audio clips after having started retranscription - Add your data clips to the voice folder - In the training tab, check the "Skip existing" option, and then hit "Transcribe and process" - Add the new corrections to the already existing Json - Hit "Recreate dataset" - The new corrections and the old ones should be in the train.txt Case 2 - You want to modify an audio clip that was already transcribed - Modify your audio clip and put it in the voice folder, under a different name (like add an index or something) - Delete the former clip - In the training tab, check the "Skip existing" and then hit "Transcribe and process" - in the Json, you will see the new version of the clip, as well as the old one. Simply delete the data block corresponding to the former clip - Hit "Recreate dataset" - In the train.txt, you should only see the latest version Enjoy
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#389
There is no content yet.