Recreate dataset after correcting whisper.json doesn't overwrite the train.txt #389

Hey there,

So hitting "Process and transcribe", I've went ahead and corrected manually the whisper.json. I thought that by hitting "Recreate Dataset" it would automatically correct the train.txt according to the corrections I made in the json, but somehow it doesn't work anymore... Maybe there's an option I'm not seeing?

EDIT : Okay it seemed I got confused somewhere down the line, because I spotted an error in one of my clips. So I wanted to replace it, and then make sure it was taken into account, so what I did was reclick on "trancscribe and process" after having changed the audio in the voices folder, and then I replaced the whisper.json with the backup one I had already corrected. And that's there that the "Recreate dataset" would not update my train.txt.

Which brings the question: is it possible to replace audio clips once you've started the transcription, and avoid having to correct the .json all over again?

Thank you :)

Hey there, So hitting "Process and transcribe", I've went ahead and corrected manually the whisper.json. I thought that by hitting "Recreate Dataset" it would automatically correct the train.txt according to the corrections I made in the json, but somehow it doesn't work anymore... Maybe there's an option I'm not seeing? EDIT : Okay it seemed I got confused somewhere down the line, because I spotted an error in one of my clips. So I wanted to replace it, and then make sure it was taken into account, so what I did was reclick on "trancscribe and process" after having changed the audio in the voices folder, and then I replaced the whisper.json with the backup one I had already corrected. And that's there that the "Recreate dataset" would not update my train.txt. Which brings the question: is it possible to replace audio clips once you've started the transcription, and avoid having to correct the .json all over again? Thank you :)

Okay I'm closing the issue because I figured out my process. For those who are interested, here's what I think you have to do:

Case 1 - You want to add more audio clips after having started retranscription

Add your data clips to the voice folder
In the training tab, check the "Skip existing" option, and then hit "Transcribe and process"
Add the new corrections to the already existing Json
Hit "Recreate dataset"
The new corrections and the old ones should be in the train.txt

Case 2 - You want to modify an audio clip that was already transcribed

Modify your audio clip and put it in the voice folder, under a different name (like add an index or something)
Delete the former clip
In the training tab, check the "Skip existing" and then hit "Transcribe and process"
in the Json, you will see the new version of the clip, as well as the old one. Simply delete the data block corresponding to the former clip
Hit "Recreate dataset"
In the train.txt, you should only see the latest version

Enjoy

Okay I'm closing the issue because I figured out my process. For those who are interested, here's what I think you have to do: Case 1 - You want to add more audio clips after having started retranscription - Add your data clips to the voice folder - In the training tab, check the "Skip existing" option, and then hit "Transcribe and process" - Add the new corrections to the already existing Json - Hit "Recreate dataset" - The new corrections and the old ones should be in the train.txt Case 2 - You want to modify an audio clip that was already transcribed - Modify your audio clip and put it in the voice folder, under a different name (like add an index or something) - Delete the former clip - In the training tab, check the "Skip existing" and then hit "Transcribe and process" - in the Json, you will see the new version of the clip, as well as the old one. Simply delete the data block corresponding to the former clip - Hit "Recreate dataset" - In the train.txt, you should only see the latest version Enjoy

Labels Milestones

Recreate dataset after correcting whisper.json doesn't overwrite the train.txt #389