Missing dataset: whisper.json. #279

Closed
opened 2023-06-24 00:09:38 +07:00 by Atoli · 16 comments

I have read the documentation multiple times, even other sections, but i have no idea from where is whisper.json supposed to come from.

https://git.ecker.tech/mrq/ai-voice-cloning/wiki/Training
https://git.ecker.tech/mrq/ai-voice-cloning/wiki/Generate

Nothing about creating a whisper.json file.

I did as instructed and put a wav file in a folder under the voices folder.
Flonne is the folder where the wav file is and i use to generate, just like the training section says.

That same audio file is used to train according to the training section, however, when i press transcribe and process i get the error of the image.

How do i generate this whisper.json file?

I have read the documentation multiple times, even other sections, but i have no idea from where is whisper.json supposed to come from. https://git.ecker.tech/mrq/ai-voice-cloning/wiki/Training https://git.ecker.tech/mrq/ai-voice-cloning/wiki/Generate Nothing about creating a whisper.json file. I did as instructed and put a wav file in a folder under the `voices` folder. Flonne is the folder where the wav file is and i use to generate, just like the training section says. That same audio file is used to train according to the training section, however, when i press transcribe and process i get the error of the image. How do i generate this whisper.json file?

Something may have gone wrong with the transcription, please post your console log.

Something may have gone wrong with the transcription, please post your console log.

Something may have gone wrong with the transcription, please post your console log.

Here is the image, is this what you were asking for?

Extra information to replicate my issue:

The voice, Flonne, was obtained from the link of the mega on the wiki:

https://git.ecker.tech/mrq/ai-voice-cloning/wiki/Collecting-Samples

In the mega folder, the Flonne one.

I converted it to wav with ffmpeg because that's what the documentation mentioned and put it in the voice folder, in a folder named "Flonne".

System:

OS: Windows 10.
GPU: RTX 3060 12gb.
Python: 3.10.
Nvidia drivers: Latest.

> Something may have gone wrong with the transcription, please post your console log. Here is the image, is this what you were asking for? **Extra information to replicate my issue:** The voice, Flonne, was obtained from the link of the mega on the wiki: https://git.ecker.tech/mrq/ai-voice-cloning/wiki/Collecting-Samples In the mega folder, the Flonne one. I converted it to `wav` with ffmpeg because that's what the documentation mentioned and put it in the voice folder, in a folder named "Flonne". **System:** OS: Windows 10. GPU: RTX 3060 12gb. Python: 3.10. Nvidia drivers: Latest.

It's the second-to-last message that indicates the root of the problem. It can't find the file, so it can't transcribe anything, so the whisper.json never gets made.

It's the second-to-last message that indicates the root of the problem. It can't find the file, so it can't transcribe anything, so the whisper.json never gets made.

Thank you for answer.

It's the second-to-last message that indicates the root of the problem. It can't find the file, so it can't transcribe anything, so the whisper.json never gets made.

Why would it not find it?

Please take a look at this image, the file is there as you can see in the folder path, what could the problem be?

Thank you for answer. > It's the second-to-last message that indicates the root of the problem. It can't find the file, so it can't transcribe anything, so the whisper.json never gets made. Why would it not find it? Please take a look at this image, the file is there as you can see in the folder path, what could the problem be?

Please run ffprobe on the file and post the results.

Please run `ffprobe` on the file and post the results.

Please run ffprobe on the file and post the results.

Sure.

C:\ffmpeg>ffprobe -loglevel 0 -show_format -show_streams Flonne.wav
[STREAM]
index=0
codec_name=pcm_s16le
codec_long_name=PCM signed 16-bit little-endian
profile=unknown
codec_type=audio
codec_tag_string=[1][0][0][0]
codec_tag=0x0001
sample_fmt=s16
sample_rate=44100
channels=2
channel_layout=unknown
bits_per_sample=16
initial_padding=0
id=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/44100
start_pts=N/A
start_time=N/A
duration_ts=11636116
duration=263.857506
bit_rate=1411200
max_bit_rate=N/A
bits_per_raw_sample=N/A
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=0
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0
DISPOSITION:visual_impaired=0
DISPOSITION:clean_effects=0
DISPOSITION:attached_pic=0
DISPOSITION:timed_thumbnails=0
DISPOSITION:captions=0
DISPOSITION:descriptions=0
DISPOSITION:metadata=0
DISPOSITION:dependent=0
DISPOSITION:still_image=0
[/STREAM]
[FORMAT]
filename=Flonne.wav
nb_streams=1
nb_programs=0
format_name=wav
format_long_name=WAV / WAVE (Waveform Audio)
start_time=N/A
duration=263.857506
size=46544542
bit_rate=1411202
probe_score=99
TAG:encoder=Lavf59.34.101
[/FORMAT]

Is this good enough or want me to use another parameter?

I got it by doing this in ffmpeg:

ffmpeg -i Flonne.mp3 Flonne.wav

Might i ask, does the slash matter? I noticed it said in the second image i posted ./voices/Flonne\Flonne.wav
Windows uses \ but the ui uses /.

I wonder if that matters.

> Please run `ffprobe` on the file and post the results. Sure. >C:\ffmpeg>ffprobe -loglevel 0 -show_format -show_streams Flonne.wav [STREAM] index=0 codec_name=pcm_s16le codec_long_name=PCM signed 16-bit little-endian profile=unknown codec_type=audio codec_tag_string=[1][0][0][0] codec_tag=0x0001 sample_fmt=s16 sample_rate=44100 channels=2 channel_layout=unknown bits_per_sample=16 initial_padding=0 id=N/A r_frame_rate=0/0 avg_frame_rate=0/0 time_base=1/44100 start_pts=N/A start_time=N/A duration_ts=11636116 duration=263.857506 bit_rate=1411200 max_bit_rate=N/A bits_per_raw_sample=N/A nb_frames=N/A nb_read_frames=N/A nb_read_packets=N/A DISPOSITION:default=0 DISPOSITION:dub=0 DISPOSITION:original=0 DISPOSITION:comment=0 DISPOSITION:lyrics=0 DISPOSITION:karaoke=0 DISPOSITION:forced=0 DISPOSITION:hearing_impaired=0 DISPOSITION:visual_impaired=0 DISPOSITION:clean_effects=0 DISPOSITION:attached_pic=0 DISPOSITION:timed_thumbnails=0 DISPOSITION:captions=0 DISPOSITION:descriptions=0 DISPOSITION:metadata=0 DISPOSITION:dependent=0 DISPOSITION:still_image=0 [/STREAM] [FORMAT] filename=Flonne.wav nb_streams=1 nb_programs=0 format_name=wav format_long_name=WAV / WAVE (Waveform Audio) start_time=N/A duration=263.857506 size=46544542 bit_rate=1411202 probe_score=99 TAG:encoder=Lavf59.34.101 [/FORMAT] Is this good enough or want me to use another parameter? I got it by doing this in ffmpeg: `ffmpeg -i Flonne.mp3 Flonne.wav` Might i ask, does the slash matter? I noticed it said in the second image i posted `./voices/Flonne\Flonne.wav` Windows uses \ but the ui uses /. I wonder if that matters.

Hmm, it's a valid .wav file... it should be able to convert it from there. Can you try running whisperx on (or just whisper if that's what you have installed) and see if it throws an error?

Hmm, it's a valid .wav file... it should be able to convert it from there. Can you try running `whisperx` on (or just `whisper` if that's what you have installed) and see if it throws an error?

Hmm, it's a valid .wav file... it should be able to convert it from there. Can you try running whisperx on (or just whisper if that's what you have installed) and see if it throws an error?

I had no luck installing whisperx, not even sure if i installed the module correctly.

What i did was install it via pip install git+https://github.com/m-bain/whisperx.git --upgrade followed by what they said in git:

$ git clone https://github.com/m-bain/whisperX.git
$ cd whisperX
$ pip install -e .

This repo was cloned in the location \ai-voice-cloning\modules.

To install the normal whisper what i did was what the github repo said: pip install git+https://github.com/openai/whisper.git . That worked, in the screenshot you can see it says it loaded whisper.

I also got another voice and added it to the voices folder, tested the training but still errors (as you can see in this screenshot, "Melina" is the newer voice). I can generate, but when try to train it suddenly cannot find/recognize the wav file, it's very weird.

I noticed something weird though:

When generating it searched on ./voices\Melina\cond_latents_d1f79232.pth
When training it searched on ./voices/Melina\Melina.wav, not on ./voices\Melina\Melina.wav.

I wonder if that matters.

Here is a screenshot of how it looks.

> Hmm, it's a valid .wav file... it should be able to convert it from there. Can you try running `whisperx` on (or just `whisper` if that's what you have installed) and see if it throws an error? I had no luck installing whisperx, not even sure if i installed the module correctly. What i did was install it via `pip install git+https://github.com/m-bain/whisperx.git --upgrade` followed by what they said in git: `$ git clone https://github.com/m-bain/whisperX.git` `$ cd whisperX` `$ pip install -e .` This repo was cloned in the location `\ai-voice-cloning\modules`. To install the normal whisper what i did was what the github repo said: `pip install git+https://github.com/openai/whisper.git `. That worked, in the screenshot you can see it says it loaded whisper. I also got another voice and added it to the voices folder, tested the training but still errors (as you can see in this screenshot, "Melina" is the newer voice). I can generate, but when try to train it suddenly cannot find/recognize the wav file, it's very weird. I noticed something weird though: When generating it searched on `./voices\Melina\cond_latents_d1f79232.pth` When training it searched on `./voices/Melina\Melina.wav`, not on `./voices\Melina\Melina.wav`. I wonder if that matters. Here is a screenshot of how it looks.

I also have this issue. Exactly as Atoli describes above. I see the same mix of / and \ slashes. I just started with ai-voice-cloning yesterday. I removed and re-installed - did not see any errors. Everything else seems to work.

I also have this issue. Exactly as Atoli describes above. I see the same mix of / and \ slashes. I just started with ai-voice-cloning yesterday. I removed and re-installed - did not see any errors. Everything else seems to work.

To install the normal whisper what i did was what the github repo said: pip install git+https://github.com/openai/whisper.git . That worked, in the screenshot you can see it says it loaded whisper.

Try running whisper from the command line on the .wav file and see if it works.

> To install the normal whisper what i did was what the github repo said: pip install git+https://github.com/openai/whisper.git . That worked, in the screenshot you can see it says it loaded whisper. Try running `whisper` from the command line on the .wav file and see if it works.

Try running whisper from the command line on the .wav file and see if it works.

Ok, i tried whisper Flonne.wav where Flonne wav file was.

It "generated" something that was 400mb (i don't know what) and then i got an error.

The screenshot shows all the details.

> Try running `whisper` from the command line on the .wav file and see if it works. Ok, i tried `whisper Flonne.wav` where Flonne wav file was. It "generated" something that was 400mb (i don't know what) and then i got an error. The screenshot shows all the details.

"[WinError 2] The system cannot find the file specified" is the same error as before, so it looks like the problem is not related to the cloning software but more likely something to do with your python install.

"[WinError 2] The system cannot find the file specified" is the same error as before, so it looks like the problem is not related to the cloning software but more likely something to do with your python install.

"[WinError 2] The system cannot find the file specified" is the same error as before, so it looks like the problem is not related to the cloning software but more likely something to do with your python install.

Weird.
Do you have any suggestion on how to possibly fix it? I uninstalled python 3.10 and installed python 3.9 but the same error happens.

Humanzoo also said he/she is having the exact same problem as me so it seems to be a new issue affecting multiple users.

Also, i have a question, the documentation says ffmpeg is needed for training, however, it never says where to put ffmpeg after download it.

So i have the exe but nowhere to put them.

I found whisper repo says this error might be related to the system not finding ffmpeg:

https://github.com/openai/whisper/discussions/109

Could this be the problem?

> "[WinError 2] The system cannot find the file specified" is the same error as before, so it looks like the problem is not related to the cloning software but more likely something to do with your python install. Weird. Do you have any suggestion on how to possibly fix it? I uninstalled python 3.10 and installed python 3.9 but the same error happens. Humanzoo also said he/she is having the exact same problem as me so it seems to be a new issue affecting multiple users. Also, i have a question, the documentation says ffmpeg is needed for training, however, it never says where to put ffmpeg after download it. So i have the exe but nowhere to put them. I found whisper repo says this error might be related to the system not finding ffmpeg: https://github.com/openai/whisper/discussions/109 Could this be the problem?

Also, i have a question, the documentation says ffmpeg is needed for training, however, it never says where to put ffmpeg after download it.

I believe you need to install it via pip so it can be used in the python virtual environment. Could be the root cause of your problem.

> Also, i have a question, the documentation says ffmpeg is needed for training, however, it never says where to put ffmpeg after download it. I believe you need to install it via pip so it can be used in the python virtual environment. Could be the root cause of your problem.

I believe you need to install it via pip so it can be used in the python virtual environment. Could be the root cause of your problem.

Greetings.

I found out the issue: ffmpeg was not set in the PATH so whisper could not find it. I guess the error was not that it couldn't find Flonne.wav but that it could not find ffmpeg.exe.

I also have this issue. Exactly as Atoli describes above. I see the same mix of / and \ slashes. I just started with ai-voice-cloning yesterday. I removed and re-installed - did not see any errors. Everything else seems to work.

humanzoo, the problem is that need to download ffmpeg and put it on the environment PATH.

Here is how to do so: https://linuxhint.com/add-directory-to-path-environment-variables-windows/

> I believe you need to install it via pip so it can be used in the python virtual environment. Could be the root cause of your problem. Greetings. I found out the issue: ffmpeg was not set in the PATH so whisper could not find it. I guess the error was not that it couldn't find `Flonne.wav` but that it could not find `ffmpeg.exe`. > I also have this issue. Exactly as Atoli describes above. I see the same mix of / and \ slashes. I just started with ai-voice-cloning yesterday. I removed and re-installed - did not see any errors. Everything else seems to work. humanzoo, the problem is that need to download `ffmpeg` and put it on the environment PATH. Here is how to do so: https://linuxhint.com/add-directory-to-path-environment-variables-windows/
Atoli closed this issue 2023-06-26 14:55:28 +07:00

That worked! Thank you Atoli. If you are on Windows you must reboot after changing the path. At least I had to.

That worked! Thank you Atoli. If you are on Windows you must reboot after changing the path. At least I had to.
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#279
There is no content yet.