I can't imagine 0s files being anything other than poorly cut off. If you have enough data then I'd drop the worst part of it.
You might have botched your venv somehow with it having Windows line endings (CRLF).
I sure did.
I see https://github.com/m-bain/whisperX/blob/main/whisperx/transcribe.py has no_speech_threshold: Optional[float] = 0.6,
no_speech_threshold: float If the no_speech probability is higher…
I don't know if its good or not, just brainstorming. This will be awful for long files. Although theoretically, if whisper can play nice with what it is given, then another tool can do the cutting.
Can we make the cuts for whisper? Say I take my 1 minute source, and cut it into sentences. Can whisper then take each sentence and not cut it further? That's what comes to mind when thinking how…
The thing is I have very little to complain about whisperx transcribe process. It is the splitting the audio up into chunks that is sus. I've seen on https://github.com/m-bain/whisperX that they…
If you're ooming while generating try lowering your Sample Batch Size in settings.
It's possible your fancy magic super duper accuracy boost is correct, but it is the whisper transcribe and…
Regarding latents and "Leveraging LJSpeech dataset for computing latents," does that mean that the latents are sensitive to the quality of the generated dataset by whisper/whisperx/whispercpp? So…