FileNotFoundError immediately after starting training #3
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#3
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I'm not sure if this is a problem with my copy of the repo, but I get this error after I generate a configuration, and try to run training using that configuration. I don't think the problem is with my dataset structure, since the same dataset works fine with 152334H's colab notebook.
The error's probably too vague to pinpoint the problem immediately, but if any other user's able to run training just fine on their computer, please tell me, and I'll try out fixes on my side.
(Also, sorry if you're not yet done implementing the entire training code, and I'm testing too early).
Should be fixed in
0dd5640a89
. It was quick to figure the issue since past-me had the brain to at least have it print the command it executes.Several brain worms went wrong with that line:
call.\train.bat ./training/TestTraining.yaml
:'call',
.\train.bat'`call
only seemed to work when I was trying to make shell=True work.\\train.bat
actually doesn't work as well forsubprocess.Popen
Nah, you're in the right time to test. I finished it up last night, but don't really have any real way to test it. The Colab notebook I was using did that
Busy
"disconnect" and lost my progress when training something. I definitely need more people to try and break it for me.I think commit
2615cafd75
broke something, because now I get this error while while running start.bat:EDIT: Also I just noticed that the setup_training.bat script, when run by setup-cuda.bat clones the training repo to a temporary folder. Thus, all dependencies are installed correctly, but the contents of the repo aren't copied to the ai-voice-cloning folder. So when I try to run training, I get
Did you manually pull with
git pull
? You'll need to re-install dependencies, as I updated [mrq/tortoise-tts] to allow selecting which autoregressive model to load. Do so with the update script.I don't exactly follow. I'll probe it later, but just move the
dlas
folder to the right place then.Sorry about that, several little brain worms happened, all should be fixed in commit
996e5217d2
.Turns out anything after
deactivate
do not get called, at all.Thanks for the fix! I still get this error.
I've run
update.bat
,update-force.bat
, andsetup-cuda.bat
one by one, but if it's working as intended for others, I'll try doing a clean install of the repo tomorrow.Run this to force install mrq/tortoise-tts in pip:
That unfortunately didn't work. Then I ran
and now it works. Weird...
Strange.
You didn't happen to migrate over your
tortoise-venv
folder, did you?If you did, I can sort of make some assumptions on what happened:
python setup.py install
, it'll install (copy) the version of tortoise as it existed at setup time to its venv folder-U
I might need to add a post-migration script to do what you did to fix it (uninstall then reinstall), or just explicitly say to not copy over the venv.
I had this error on windows. I fixed it by dropping ffmpeg.exe into the root folder of the repo.
Yup, that's exactly what I did lol
I'm doing a fresh install now, as even after that problem I had another error while training... If reinstalling doesn't fix that, I'll post it.
Now there's a fresh new error immediately after starting training:
It's definitely the last few commits that caused this, because I did a fresh install about half an hour ago, and ran the update script just now. After the fresh install, I was getting this error (after loading all necessary models for training, just before the actual training starts):
That looks like an issue that crops up during training itself rather than the web UI. I'm assuming it offers a code block to try and reproduce it:
I just ran it for shits and grins and nothing broke for me on my dingy 2060.
I'm not too sure what exactly would lead to a weird driver state like that, but as much as I hate to suggest it, I'll suggest (in no particular order):
Settings
, checkDefer TTS Load
and restart the UI to train againI doubt it's a YAML training configuration issue, as it would've complained if:
And as much as I hate providing StackOverflow references, this one mentions:
it being a rather vague OOM issue, to which I would suggest to enable
Defer TTS Load
, restarting (because forcing TorToiSe to unload and calling GC just doesn't actually make it deallocate) to free up some VRAM, and lower your batch size.Can you give me the parameters you used? I have almost the same GPU (2060 Laptop version). The program suggested a batch size of 64, which I reduced to 32, and I left all other parameters as is.
After loading the dvae, it did some "3D work", and then ran out of memory.
Also, I get this just after loading the dvae.
Yeah... you're going to have to CBT yourself and use a Colab notebook to train. I haven't gotten it to work locally on my 2060 at all, as even a batch size of 4 will cause it to OOM. I'm guessing it's a mix from all the shit that gets loaded during training and all the fragmentation that even something small like 20MiB can't get allocated.
I even tried that max_split_size_mb override but I don't think it does anything. I could be trying to set it wrong, but it hasn't gotten any different results.
Oh ok, I'll just stick to the colab then.
By the way, I think
ec550d74fd
broke the notebook, because now it saysNo module named 'tortoise'
. For some reason the tortoise-tts folder doesn't get copied to the ai-voice-cloning folder in colab. Git cloning it there manually also doesn't seem to fix it.EDIT: I moved the
ai-voice-cloning/tortoise-tts/tortoise
toai-voice-cloning/tortoise
, and now it can locate theapi.py
file.Yeah, I just realized that when I'm fucking about with my Colab. I'll need to update it.
Crashed out before I forgot to mention it: notebook updated in
3891870b5d
.