Error when running start.bat #6
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#6
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I get this error when running start.bat:
I am running:
-Python 3.9
-RTX 3080
You'll need to run:
Just for my curiosity:
.\tortoise-tts\tortoise-venv
over to.\ai-voice-cloning\venv\
?update.bat
script to update earlier, or just agit pull
?Either way, I updated mrq/tortoise-tts to add in autoregressive model loading, and I realized after the fact one of the problems with now splitting up the two is that you'll need to re-install it through PIP time I update that repo too. I might have a better way to go about it in the future instead.
I'm still getting the same error:
This is a clean install, no files were migrated over
I have used update.bat as well as update-force.bat before running start.bat
Strange.
I suppose some surgery is needed. Save tortoise/api.py and place it under
.\venv\Lib\site-packages\tortoise\api.py
. I'm not sure why it's not updating even with the-U
flag, although I wonder if you really need topip uninstall tortoise
first like someone else mentioned earlier.I'll look into a better way of integrating mrq/tortoise-tts to avoid these issues in the future.
Overwriting api.py seems to have worked.
Upon trying to prepare a dataset, I get the following error:
I have already set up ffmpeg in my environment path variables.
That's some other funky thing that sometimes crops up, especially in a colab notebook, and I swear one method worked while the next time it needed a different remedy.
Try:
I managed to prepare the dataset and generate configuration. However, I have been stuck at this step for about an hour now:
Another thing to note is that turning on voice fixer also gives the error shown above.
Right, I forgot to have it still print to console if
Verbose Output
or whatever is unchecked. Restart the UI, and run training withVerbose Output
checked, and see what it's getting hung up on.I'll push a quick commit to print all output anywaysCommit485319c2bb
will restore it to print to console regardless of setting.As for the voicefixer thing, as per the documentation, the download was interrupted and for some reason it's not smart enough to restart downloading on its own. Open
%USERPROFILE%\.cache\
and deletevoicefixer
.I deleted the voicefixer folder and upon restarting, it is stuck on this step for about 10 minutes now:Ignore what I wrote above, I restarted it and this time it downloaded successfully.
I proceeded to run the training with Verbose Output checked. This is the output it is stuck on:
I'll assume you're OOMing (system RAM not VRAM).
Outside of the obvious of closing out processes, I'll suggest checking
Defer TTS Load
in the web UI and restarting the UI, just to make sure nothing TTS gets loaded during training.I have closed most processes and checked Defer TTS Load and tried to run it again. I am stuck on this error:
Given the revamped configuration generation process and (attempting to) unload all models on training, you might be able to get it working again if you lower the
Mega Batch Factor
or whatever setting down to 1.Still no dice I'm afraid. It keeps giving me Pytorch/Memory OOM errors no matter how low I tweak the settings. For now I'll have to use colab to train I suppose.
I guess even 3080s can't train it, as there's another user that can't train off a 3080 either.
I'll cobble together a way to load and finetune it as float16 and see if that gets VRAM consumption down, maybe. I only worry about the performance/quality problems from it, but I suppose it's better than nothing.
As I mentioned in #17 (comment), I added an experimental way to train fully at half-precision that will convert the original model to one at float16. It fails with 8GiB a few steps in training, but peaks at 52% VRAM usage on a machine with 16GiB of VRAM, so it might work on a 3080. You're welcome to try, but I have zero guarantees it'll work, or produce usable output yet, as I still need to actually finetune a model at half precision.
I tried to generate configuration with half-precision enabled, but it gives me this error:
Oops, don't know how I managed to get it working for me, fixed in commit
93b061fb4d
.Updated and upon running start.bat I get this error:
Fixed, I had it right, but I copied over the typo fix.
I still get an error when trying to save the training configuration. I think it's still the same one from before?
For sure really fixed in
526a430c2a
or I'm blowing my brains out. I don't know how it reverted.I tried running a very small dataset of 16 clips on half precision. Unfortunately it still gives OOM Errors.
It seems to at least be training somewhat (albeit slowly with that 20s/it) before OOMing, try lowering the batch size to 2.
If not, then use this copium in place of your
train.bat
:And if it absolutely will not let you after trying each of those, then I suppose I'll need to find some more VRAM savings somewhere else, like training out a model with smaller network parameters.
I've had a breakthrough with being able to train even on my 2060.
Refer to #25.