Did something break after update? #35

Open
opened 2023-02-15 14:41:44 +00:00 by Armored1065 · 10 comments
Contributor

Updated with git pull and something seems to be broken

 Traceback (most recent call last):
  File "C:\Users\Administrator\Desktop\tortoise-tts\main.py", line 10, in <module>
    mrq.webui = mrq.setup_gradio()
  File "C:\Users\Administrator\Desktop\tortoise-tts\webui.py", line 819, in setup_gradio
    get_voice_list("./results/"),
  File "C:\Users\Administrator\Desktop\tortoise-tts\webui.py", line 551, in get_voice_list
    return sorted([d for d in os.listdir(dir) if os.path.isdir(os.path.join(dir, d)) and len(os.listdir(os.path.join(dir, d))) > 0 ]) + ["microphone", "random"]
FileNotFoundError: [WinError 3] The system cannot find the path specified: './results/'
Updated with git pull and something seems to be broken ``` Traceback (most recent call last): File "C:\Users\Administrator\Desktop\tortoise-tts\main.py", line 10, in <module> mrq.webui = mrq.setup_gradio() File "C:\Users\Administrator\Desktop\tortoise-tts\webui.py", line 819, in setup_gradio get_voice_list("./results/"), File "C:\Users\Administrator\Desktop\tortoise-tts\webui.py", line 551, in get_voice_list return sorted([d for d in os.listdir(dir) if os.path.isdir(os.path.join(dir, d)) and len(os.listdir(os.path.join(dir, d))) > 0 ]) + ["microphone", "random"] FileNotFoundError: [WinError 3] The system cannot find the path specified: './results/' ```
Author
Contributor

After 37d25573ac, running start.bat just gets stuck at Initializating voice-fixer

After 37d25573accf2dce213cc5ec72c05c4afa02f2b5, running start.bat just gets stuck at `Initializating voice-fixer`
Author
Contributor

Is there a way to trigger multiple GPUs if you have them? Currently it just uses one.

Is there a way to trigger multiple GPUs if you have them? Currently it just uses one.
Owner

The system cannot find the path specified: './results/'

Well, it's missing the ./results/ folder, which I suppose I forgot to have created if not available outside of a generation call. (Should be) remedied in commit 261beb8c91.

running start.bat just gets stuck at Initializating voice-fixer

Haven't had any issues in testing. Either:

  • disable voicefixer under Settings
  • ensure you did in fact update with update.bat

I have it printing any errors thrown during initialization (despite it should just ignore voicefixer and disable it), since I don't have any other way to debug it.

Is there a way to trigger multiple GPUs if you have them?

Not elegantly. There's some "things" you can expose to the GPT2 autoregressive model to act in parallel with a list of devices, but I haven't bothered with it.

> The system cannot find the path specified: './results/' Well, it's missing the `./results/` folder, which I suppose I forgot to have created if not available outside of a generation call. (Should be) remedied in commit 261beb8c91baa1a749931bac960bd6e235c9f782. > running start.bat just gets stuck at Initializating voice-fixer Haven't had any issues in testing. Either: * disable voicefixer under Settings * ensure you did in fact update with `update.bat` I have it printing any errors thrown during initialization (despite it should just ignore voicefixer and disable it), since I don't have any other way to debug it. > Is there a way to trigger multiple GPUs if you have them? Not elegantly. There's some "things" you can expose to the GPT2 autoregressive model to act in parallel with a list of devices, but I haven't bothered with it.
Author
Contributor

./results/ folder and voicefixer issues seems to be related are are solved, thank you.

However there seems to be a new problem, and I'm not sure what's going on here with connections?

It showed up after Generation took 1007.0080227851868 seconds, saved to..

C:\Users\Administrator\Desktop\tortoise-tts\tortoise-venv\lib\site-packages\gradio\processing_utils.py:236: UserWarning: Trying to convert audio automatically from float32 to 16-bit int format.
  warnings.warn(warning.format(data.dtype))
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\mrq-tts\lib\asyncio\events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "C:\ProgramData\Anaconda3\envs\mrq-tts\lib\asyncio\proactor_events.py", line 162, in _call_connection_lost
    self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\mrq-tts\lib\asyncio\events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "C:\ProgramData\Anaconda3\envs\mrq-tts\lib\asyncio\proactor_events.py", line 162, in _call_connection_lost
    self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

Re: GPU, it's using the less powerful one that I have which is taking a lot of time to run this and heats up bad, meanwhile the more powerful one is sitting idle, unfortunately.

`./results/` folder and voicefixer issues seems to be related are are solved, thank you. However there seems to be a new problem, and I'm not sure what's going on here with connections? It showed up after `Generation took 1007.0080227851868 seconds, saved to..` ``` C:\Users\Administrator\Desktop\tortoise-tts\tortoise-venv\lib\site-packages\gradio\processing_utils.py:236: UserWarning: Trying to convert audio automatically from float32 to 16-bit int format. warnings.warn(warning.format(data.dtype)) Exception in callback _ProactorBasePipeTransport._call_connection_lost(None) handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)> Traceback (most recent call last): File "C:\ProgramData\Anaconda3\envs\mrq-tts\lib\asyncio\events.py", line 80, in _run self._context.run(self._callback, *self._args) File "C:\ProgramData\Anaconda3\envs\mrq-tts\lib\asyncio\proactor_events.py", line 162, in _call_connection_lost self._sock.shutdown(socket.SHUT_RDWR) ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host Exception in callback _ProactorBasePipeTransport._call_connection_lost(None) handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)> Traceback (most recent call last): File "C:\ProgramData\Anaconda3\envs\mrq-tts\lib\asyncio\events.py", line 80, in _run self._context.run(self._callback, *self._args) File "C:\ProgramData\Anaconda3\envs\mrq-tts\lib\asyncio\proactor_events.py", line 162, in _call_connection_lost self._sock.shutdown(socket.SHUT_RDWR) ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host ``` Re: GPU, it's using the less powerful one that I have which is taking a lot of time to run this and heats up bad, meanwhile the more powerful one is sitting idle, unfortunately.
Owner

However there seems to be a new problem, and I'm not sure what's going on here with connections?

Those have usually happened for me. They're very innocuous so I haven't bothered with resolving it.

it's using the less powerful one that I have which is taking a lot of time to run this and heats up bad, meanwhile the more powerful one is sitting idle, unfortunately.

Ah, that can be remedied in a few ways:

  • there's an environment variable that will explicitly expose which devices are available. You can then add before the python call in the start script:
    • Windows, start.bat: set CUDA_VISIBLE_DEVICES=1,0
    • Linux, start.sh: export CUDA_VISIBLE_DEVICES=1,0
  • I could add an argument/setting to override the device name passed to Torch.
    • by default, for CUDA environments, it just passes "cuda" to Torch, and Torch will, however it wants to, hand back a CUDA device.
    • this argument will instead pass whatever string you want instead; for example: setting this argument to cuda:1 will pass it to Torch and get the 1th-indexed (second) device.
    • it's not really any problem to add it since I wanted to add something to force CPU/CUDA/DirectML anyways

However, for figuring out which device, I might just add a script that's simply the following to get your device IDs:

import torch
devices = [f"cuda:{i} => {torch.cuda.get_device_name(i)" for i in range(torch.cuda.device_count())]
print(devices)

When I get some free time, I'll try and add it in.

> However there seems to be a new problem, and I'm not sure what's going on here with connections? Those have usually happened for me. They're very innocuous so I haven't bothered with resolving it. > it's using the less powerful one that I have which is taking a lot of time to run this and heats up bad, meanwhile the more powerful one is sitting idle, unfortunately. Ah, that can be remedied in a few ways: * there's an environment variable that will explicitly expose which devices are available. You can then add before the `python` call in the start script: - Windows, `start.bat`: `set CUDA_VISIBLE_DEVICES=1,0` - Linux, `start.sh`: `export CUDA_VISIBLE_DEVICES=1,0` * I could add an argument/setting to override the device name passed to Torch. - by default, for CUDA environments, it just passes "cuda" to Torch, and Torch will, however it wants to, hand back a CUDA device. - this argument will instead pass whatever string you want instead; for example: setting this argument to `cuda:1` will pass it to Torch and get the 1th-indexed (second) device. - it's not really any problem to add it since I wanted to add something to force CPU/CUDA/DirectML anyways However, for figuring out which device, I might just add a script that's simply the following to get your device IDs: ``` import torch devices = [f"cuda:{i} => {torch.cuda.get_device_name(i)" for i in range(torch.cuda.device_count())] print(devices) ``` When I get some free time, I'll try and add it in.
Owner

Added argument device-override/setting Device Override in commit 7a4460ddf0. Pass a string returned from running list_devices.py, for example, pass cuda:1 into this box. I have not extensively tested this, as I do not have a multi-NVIDIA GPU setup, just used it to test forcing CPU-mode.

Added argument `device-override`/setting `Device Override` in commit 7a4460ddf087f643b8df60b4317cf9b1cf8dd581. Pass a string returned from running `list_devices.py`, for example, pass `cuda:1` into this box. I have not extensively tested this, as I do not have a multi-NVIDIA GPU setup, just used it to test forcing CPU-mode.
Author
Contributor
Traceback (most recent call last):
  File "C:\Users\Administrator\Desktop\tortoise-tts\webui.py", line 999, in run_generation
    sample, outputs, stats = generate(
  File "C:\Users\Administrator\Desktop\tortoise-tts\webui.py", line 163, in generate
    key = int(match[0])
IndexError: list index out of range

During handling of the above exception, another exception occurred:
...
gradio.exceptions.Error: 'list index out of range'

Not sure if I'm doing it right,

  1. Ran it by updating
  2. Ran it by editing the start.bat file and adding cuda devices
  3. Ran it passing cuda:1

Just want it to use all GPUs available to make things fast.

``` Traceback (most recent call last): File "C:\Users\Administrator\Desktop\tortoise-tts\webui.py", line 999, in run_generation sample, outputs, stats = generate( File "C:\Users\Administrator\Desktop\tortoise-tts\webui.py", line 163, in generate key = int(match[0]) IndexError: list index out of range During handling of the above exception, another exception occurred: ... gradio.exceptions.Error: 'list index out of range' ``` Not sure if I'm doing it right, 1. Ran it by updating ❌ 2. Ran it by editing the `start.bat` file and adding cuda devices ❌ 3. Ran it passing `cuda:1` ❌ Just want it to use all GPUs available to make things fast.
Owner

Strange, wonder what's bugging it out there. It's an (unrelated) error related to the incrementing filename thing that I tried to fix a few hours ago. I don't think you'd have to clear out your ./results/ folder, since it should work without needing to.

If you can, edit webui.py with the following, then report back with what it prints:

        match = re.findall(rf"^{voice}_(\d+)(?:.+?)?{extension}$", filename)
        print(voice, extension, filename, match)
        if len(match) == 0:
            continue
        key = int(match[0])

so the block looks like: image

It'll debug print some info that might help me figure out why it breaks, and at worst it'll just skip it.

On the other hand, you should only need to do either 2 (the set CUDA_VISIBLE_DEVICES thing) or 3, as 2 will also mess with the index order for 3, theoretically. And you only really need to do one or the other anyhow.

Strange, wonder what's bugging it out there. It's an (unrelated) error related to the incrementing filename thing that I tried to fix a few hours ago. I don't think you'd have to clear out your `./results/` folder, since it should work without needing to. If you can, edit `webui.py` with the following, then report back with what it prints: ``` match = re.findall(rf"^{voice}_(\d+)(?:.+?)?{extension}$", filename) print(voice, extension, filename, match) if len(match) == 0: continue key = int(match[0]) ``` so the block looks like: ![image](/attachments/49b2b8a5-3041-49f2-aa30-ea6677b60c3e) It'll debug print some info that might help me figure out why it breaks, and at worst it'll just skip it. On the other hand, you should only need to do either 2 (the `set CUDA_VISIBLE_DEVICES` thing) or 3, as 2 will also mess with the index order for 3, theoretically. And you only really need to do one or the other anyhow.
Author
Contributor

Did a normal run with latest pull,

  1. For default page load where no radio buttons are active after start.bat is run
Error occurred while tring to initialize voicefixer: PytorchStreamReader failed reading zip archive: failed finding central directory

Output is still generated


2. When 'High Quality' is selected

Something went wrong
'CUDA out of memory. Tried to allocate ... reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF'

No output

Turns out it was because the voices/XXX/20.wav file was 2 mins long and it didn't like it

Did a normal run with latest pull, 1. For default page load where no radio buttons are active after `start.bat` is run ``` Error occurred while tring to initialize voicefixer: PytorchStreamReader failed reading zip archive: failed finding central directory ``` Output is still generated ----- ~~2. When 'High Quality' is selected~~ ~~Something went wrong 'CUDA out of memory. Tried to allocate ... reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF'~~ ~~No output~~ Turns out it was because the voices/XXX/20.wav file was 2 mins long and it didn't like it
Owner

Error occurred while tring to initialize voicefixer: PytorchStreamReader failed reading zip archive: failed finding central directory

Sounds like you closed the process while it was downloading the model for voicefixer. In that case, delete the %USERPROFILE%\.cache\ folder.

> Error occurred while tring to initialize voicefixer: PytorchStreamReader failed reading zip archive: failed finding central directory Sounds like you closed the process while it was downloading the model for voicefixer. In that case, delete the `%USERPROFILE%\.cache\` folder.
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/tortoise-tts#35
No description provided.