"System error" when using Bark as TTS #306

Open
opened 2023-07-11 11:14:05 +00:00 by NekoDArk · 8 comments

When trying to run the Bark TTS mode, I always get this error, no matter what audio files I use:
`D:\voice\ai-voice-cloning>call .\venv\Scripts\activate.bat
Whisper detected
Error: No module named 'vall_e'
Bark detected
Vocos detected
Error: No module named 'hubert'
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Loading Bark...
Loaded TTS, ready for generation.
[1/1] Generating line: Test
Using as reference: ./training/tboiNarrator/audio/see 4ever 1_00000.wav I can see forever.
Traceback (most recent call last):
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 394, in run_predict
output = await app.get_blocks().process_api(
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1075, in process_api
result = await self.call_function(
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 884, in call_function
prediction = await anyio.to_thread.run_sync(
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\helpers.py", line 587, in tracked_fn
response = fn(*args)
File "D:\voice\ai-voice-cloning\src\webui.py", line 94, in generate_proxy
raise e
File "D:\voice\ai-voice-cloning\src\webui.py", line 88, in generate_proxy
sample, outputs, stats = generate(**kwargs)
File "D:\voice\ai-voice-cloning\src\utils.py", line 323, in generate
return generate_bark(**kwargs)
File "D:\voice\ai-voice-cloning\src\utils.py", line 465, in generate_bark
gen = tts.inference(cut_text, **settings )
File "D:\voice\ai-voice-cloning\src\utils.py", line 263, in inference
self.create_voice( voice )
File "D:\voice\ai-voice-cloning\src\utils.py", line 212, in create_voice
wav, sr = torchaudio.load(audio_filepath)
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\torchaudio\backend\soundfile_backend.py", line 221, in load
with soundfile.SoundFile(filepath, "r") as file_:
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\soundfile.py", line 658, in init
self._file = self._open(file, mode_int, closefd)
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\soundfile.py", line 1216, in _open
raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening './training/tboiNarrator/audio/see 4ever 1_00000.wav': System error.`

When trying to run the Bark TTS mode, I always get this error, no matter what audio files I use: `D:\voice\ai-voice-cloning>call .\venv\Scripts\activate.bat Whisper detected Error: No module named 'vall_e' Bark detected Vocos detected Error: No module named 'hubert' Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Loading Bark... Loaded TTS, ready for generation. [1/1] Generating line: Test Using as reference: ./training/tboiNarrator/audio/see 4ever 1_00000.wav I can see forever. Traceback (most recent call last): File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 394, in run_predict output = await app.get_blocks().process_api( File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1075, in process_api result = await self.call_function( File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 884, in call_function prediction = await anyio.to_thread.run_sync( File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run result = context.run(func, *args) File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\helpers.py", line 587, in tracked_fn response = fn(*args) File "D:\voice\ai-voice-cloning\src\webui.py", line 94, in generate_proxy raise e File "D:\voice\ai-voice-cloning\src\webui.py", line 88, in generate_proxy sample, outputs, stats = generate(**kwargs) File "D:\voice\ai-voice-cloning\src\utils.py", line 323, in generate return generate_bark(**kwargs) File "D:\voice\ai-voice-cloning\src\utils.py", line 465, in generate_bark gen = tts.inference(cut_text, **settings ) File "D:\voice\ai-voice-cloning\src\utils.py", line 263, in inference self.create_voice( voice ) File "D:\voice\ai-voice-cloning\src\utils.py", line 212, in create_voice wav, sr = torchaudio.load(audio_filepath) File "D:\voice\ai-voice-cloning\venv\lib\site-packages\torchaudio\backend\soundfile_backend.py", line 221, in load with soundfile.SoundFile(filepath, "r") as file_: File "D:\voice\ai-voice-cloning\venv\lib\site-packages\soundfile.py", line 658, in __init__ self._file = self._open(file, mode_int, closefd) File "D:\voice\ai-voice-cloning\venv\lib\site-packages\soundfile.py", line 1216, in _open raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name)) soundfile.LibsndfileError: Error opening './training/tboiNarrator/audio/see 4ever 1_00000.wav': System error.`
Owner

I'll go on a limb and say I forgot to document that you need to first transcribe your voice files under Training > Prepare Dataset, since Bark requires a transcription of the reference audio it's being fed. Make sure it also keeps the audio it copies under ./training/tboiNarrator/audio/. You may or may not need to do this while having --tts-backend="tortoise", but I don't think there's any backend exclusive code for it.

However, I'm not sure you'll get much luck with voice cloning under Bark. While using the random option seems to work fine recently, I still couldn't figure out how to get it to clone semi-competently.

I'll go on a limb and say I forgot to document that you need to first transcribe your voice files under `Training > Prepare Dataset`, since Bark requires a transcription of the reference audio it's being fed. Make sure it also keeps the audio it copies under `./training/tboiNarrator/audio/`. You may or may not need to do this while having `--tts-backend="tortoise"`, but I don't think there's any backend exclusive code for it. However, I'm not sure you'll get much luck with voice cloning under Bark. While using the `random` option seems to work fine recently, I still couldn't figure out how to get it to clone semi-competently.

raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening './training/tboiNarrator/audio/see 4ever 1_00000.wav': System error.`

Means it couldn't find the file (or it doesn't have permissions to open it). Examine the filenames in your train.txt and check to make sure the files exist in the locations specified.

>raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name)) >soundfile.LibsndfileError: Error opening './training/tboiNarrator/audio/see 4ever 1_00000.wav': System error.` Means it couldn't find the file (or it doesn't have permissions to open it). Examine the filenames in your `train.txt` and check to make sure the files exist in the locations specified.
Author

I'll go on a limb and say I forgot to document that you need to first transcribe your voice files under Training > Prepare Dataset, since Bark requires a transcription of the reference audio it's being fed. Make sure it also keeps the audio it copies under ./training/tboiNarrator/audio/. You may or may not need to do this while having --tts-backend="tortoise", but I don't think there's any backend exclusive code for it.

However, I'm not sure you'll get much luck with voice cloning under Bark. While using the random option seems to work fine recently, I still couldn't figure out how to get it to clone semi-competently.

They are transscribed using the way you described and in the right folders as far as I'm aware

> I'll go on a limb and say I forgot to document that you need to first transcribe your voice files under `Training > Prepare Dataset`, since Bark requires a transcription of the reference audio it's being fed. Make sure it also keeps the audio it copies under `./training/tboiNarrator/audio/`. You may or may not need to do this while having `--tts-backend="tortoise"`, but I don't think there's any backend exclusive code for it. > > However, I'm not sure you'll get much luck with voice cloning under Bark. While using the `random` option seems to work fine recently, I still couldn't figure out how to get it to clone semi-competently. They are transscribed using the way you described and in the right folders as far as I'm aware
113 KiB
Owner

Ah, I see now. Oops.

It's rather silly to have to do, but you'll need to go into Training > Prepare Dataset and click (Re)Slice Audio.

If I remember how I implemented it, the routine will read the whisper.json (rather than the train.txt) to find transcriptions and, since it checks each segment rather than each whole file, it expects the sliced audio to exist (even if it doesn't need to be sliced).


Whenever I get a chance, I'll rework it to instead check train.txt first and leverage it, since that should always work.

Ah, I see now. Oops. It's rather silly to have to do, but you'll need to go into `Training > Prepare Dataset` and click `(Re)Slice Audio`. If I remember how I implemented it, the routine will read the `whisper.json` (rather than the `train.txt`) to find transcriptions and, since it checks each segment rather than each whole file, it expects the sliced audio to exist (even if it doesn't need to be sliced). --- Whenever I get a chance, I'll rework it to instead check `train.txt` first and leverage it, since that should always work.
Owner

I lied, I didn't even need to that, as I was extremely naively assuming things were sliced.

Fixed in commit e2a6dc1c0a. You shouldn't need to slice your audio beforehand.

I lied, I didn't even need to that, as I was extremely naively assuming things were sliced. Fixed in commit e2a6dc1c0a65ddc161c4020c44ee2b9dae57b87b. You shouldn't need to slice your audio beforehand.
Author

Ah, that did the trick. Thank you

Ah, that did the trick. Thank you
Author

But now I get a new error that says "
[1/1] Generating line: Hello
Using as reference: ./training/tboiNarrator/audio/speed down 2_00000.wav Speed down.
Traceback (most recent call last):
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 394, in run_predict
output = await app.get_blocks().process_api(
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1075, in process_api
result = await self.call_function(
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 884, in call_function
prediction = await anyio.to_thread.run_sync(
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\helpers.py", line 587, in tracked_fn
response = fn(*args)
File "D:\voice\ai-voice-cloning\src\webui.py", line 94, in generate_proxy
raise e
File "D:\voice\ai-voice-cloning\src\webui.py", line 88, in generate_proxy
sample, outputs, stats = generate(**kwargs)
File "D:\voice\ai-voice-cloning\src\utils.py", line 323, in generate
return generate_bark(**kwargs)
File "D:\voice\ai-voice-cloning\src\utils.py", line 465, in generate_bark
gen = tts.inference(cut_text, **settings )
File "D:\voice\ai-voice-cloning\src\utils.py", line 268, in inference
semantic_tokens = text_to_semantic(text, history_prompt=voice, temp=text_temp, silent=False)
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\bark\api.py", line 25, in text_to_semantic
x_semantic = generate_text_semantic(
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\bark\generation.py", line 394, in generate_text_semantic
history_prompt = _load_history_prompt(history_prompt)
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\bark\generation.py", line 364, in _load_history_prompt
history_prompt = np.load(
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\numpy\lib\npyio.py", line 405, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'D:\voice\ai-voice-cloning\venv\lib\site-packages\bark\assets\prompts\tboiNarrator.npz'"

And using a random voice:
"[1/1] Generating line: Hello
Generating line took 1.8284106254577637 seconds
Traceback (most recent call last):
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 394, in run_predict
output = await app.get_blocks().process_api(
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1075, in process_api
result = await self.call_function(
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 884, in call_function
prediction = await anyio.to_thread.run_sync(
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\helpers.py", line 587, in tracked_fn
response = fn(*args)
File "D:\voice\ai-voice-cloning\src\webui.py", line 94, in generate_proxy
raise e
File "D:\voice\ai-voice-cloning\src\webui.py", line 88, in generate_proxy
sample, outputs, stats = generate(**kwargs)
File "D:\voice\ai-voice-cloning\src\utils.py", line 323, in generate
return generate_bark(**kwargs)
File "D:\voice\ai-voice-cloning\src\utils.py", line 528, in generate_bark
audio_cache[name]['output'] = True
KeyError: '00002_1'
"

But now I get a new error that says " [1/1] Generating line: Hello Using as reference: ./training/tboiNarrator/audio/speed down 2_00000.wav Speed down. Traceback (most recent call last): File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 394, in run_predict output = await app.get_blocks().process_api( File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1075, in process_api result = await self.call_function( File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 884, in call_function prediction = await anyio.to_thread.run_sync( File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run result = context.run(func, *args) File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\helpers.py", line 587, in tracked_fn response = fn(*args) File "D:\voice\ai-voice-cloning\src\webui.py", line 94, in generate_proxy raise e File "D:\voice\ai-voice-cloning\src\webui.py", line 88, in generate_proxy sample, outputs, stats = generate(**kwargs) File "D:\voice\ai-voice-cloning\src\utils.py", line 323, in generate return generate_bark(**kwargs) File "D:\voice\ai-voice-cloning\src\utils.py", line 465, in generate_bark gen = tts.inference(cut_text, **settings ) File "D:\voice\ai-voice-cloning\src\utils.py", line 268, in inference semantic_tokens = text_to_semantic(text, history_prompt=voice, temp=text_temp, silent=False) File "D:\voice\ai-voice-cloning\venv\lib\site-packages\bark\api.py", line 25, in text_to_semantic x_semantic = generate_text_semantic( File "D:\voice\ai-voice-cloning\venv\lib\site-packages\bark\generation.py", line 394, in generate_text_semantic history_prompt = _load_history_prompt(history_prompt) File "D:\voice\ai-voice-cloning\venv\lib\site-packages\bark\generation.py", line 364, in _load_history_prompt history_prompt = np.load( File "D:\voice\ai-voice-cloning\venv\lib\site-packages\numpy\lib\npyio.py", line 405, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: 'D:\\voice\\ai-voice-cloning\\venv\\lib\\site-packages\\bark\\assets\\prompts\\tboiNarrator.npz'" And using a random voice: "[1/1] Generating line: Hello Generating line took 1.8284106254577637 seconds Traceback (most recent call last): File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 394, in run_predict output = await app.get_blocks().process_api( File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1075, in process_api result = await self.call_function( File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 884, in call_function prediction = await anyio.to_thread.run_sync( File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "D:\voice\ai-voice-cloning\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run result = context.run(func, *args) File "D:\voice\ai-voice-cloning\venv\lib\site-packages\gradio\helpers.py", line 587, in tracked_fn response = fn(*args) File "D:\voice\ai-voice-cloning\src\webui.py", line 94, in generate_proxy raise e File "D:\voice\ai-voice-cloning\src\webui.py", line 88, in generate_proxy sample, outputs, stats = generate(**kwargs) File "D:\voice\ai-voice-cloning\src\utils.py", line 323, in generate return generate_bark(**kwargs) File "D:\voice\ai-voice-cloning\src\utils.py", line 528, in generate_bark audio_cache[name]['output'] = True KeyError: '00002_1' "
Owner

The first error's due to some jank in requiring Bark to be installed under ./modules/bark/. I fixed this in commit ac645e0a20.

As for the second one, I'm not too sure, but I also added a possible fix for it. I'm not sure what's causing it, since I haven't hit that issue once in testing. I'll see if I can trigger it when I get a chance.

The first error's due to some jank in requiring Bark to be installed under `./modules/bark/`. I fixed this in commit ac645e0a20b42df38df2f30f083749329cbfdc7f. As for the second one, I'm not too sure, but I also added a possible fix for it. I'm not sure what's causing it, since I haven't hit that issue once in testing. I'll see if I can trigger it when I get a chance.
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#306
No description provided.