Whisper transcribing isn't doing all the files #265

New Issue

nirurin · 2023-06-13T20:50:13Z

nirurin commented

2023-06-13 20:50:13 +00:00

I noticed that I was only able to make a "batch size" of 108, even though I knew I had at least 250 files in the training audio folder.
Went and checked, and the train.txt only has 108 entries.

It seems that even though some of the files got sliced up (eg. 1.wmv got cut up into 1_000001.wmv and 1_000002.wmv etc), the transcribed file seems to only have the 1.wmv version, instead of transcribing the sliced up shorter versions...

Edit: Which makes me realise its keeping the un-sliced files, along with the sliced ones. Which seems wrong too.
Also also, it's not ignoring all sliced files... some of the sliced ones are being transcribed, but a lot of them aren't. (and it's not a validation thing, I checked that file and there's only like 5 things in there, so I'm still missing about 100).

I noticed that I was only able to make a "batch size" of 108, even though I knew I had at least 250 files in the training audio folder. Went and checked, and the train.txt only has 108 entries. It seems that even though some of the files got sliced up (eg. 1.wmv got cut up into 1_000001.wmv and 1_000002.wmv etc), the transcribed file seems to only have the 1.wmv version, instead of transcribing the sliced up shorter versions... Edit: Which makes me realise its keeping the un-sliced files, along with the sliced ones. Which seems wrong too. Also also, it's not ignoring all sliced files... some of the sliced ones are being transcribed, but a lot of them aren't. (and it's not a validation thing, I checked that file and there's only like 5 things in there, so I'm still missing about 100).

psammites commented

2023-06-14 02:55:50 +00:00

Are you using whisper or whisperx? How long are the files it's skipping?

Sign in to join this conversation.