Unable to complete training. #284
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#284
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I have been trying to train/finetune a model for a few hours, but everytime it gets stuck at step 13 and i get the following error:
[Training] RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:337] . unexpected pos 58478784 vs 58478736
It doesn't seem to be an OOM error, what could cause the training to get stuck always at epoch 13?
That's a pytorch error, if you google it you will find some suspected causes and possible workarounds.
Yeah, it seem to be a pytorch issue.
For anyone wondering, the issue seemed to be that pytorch needs more disk space.
After moving my folder to a hdd with lots of free space i could train normally.