Unable to complete training. #284

Closed
opened 2023-06-26 21:43:22 +00:00 by Atoli · 2 comments

I have been trying to train/finetune a model for a few hours, but everytime it gets stuck at step 13 and i get the following error:

[Training] RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:337] . unexpected pos 58478784 vs 58478736

It doesn't seem to be an OOM error, what could cause the training to get stuck always at epoch 13?

I have been trying to train/finetune a model for a few hours, but everytime it gets stuck at step 13 and i get the following error: `[Training] RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:337] . unexpected pos 58478784 vs 58478736` It doesn't seem to be an OOM error, what could cause the training to get stuck always at epoch 13?

That's a pytorch error, if you google it you will find some suspected causes and possible workarounds.

That's a pytorch error, if you google it you will find some suspected causes and possible workarounds.
Author

That's a pytorch error, if you google it you will find some suspected causes and possible workarounds.

Yeah, it seem to be a pytorch issue.

For anyone wondering, the issue seemed to be that pytorch needs more disk space.

After moving my folder to a hdd with lots of free space i could train normally.

> That's a pytorch error, if you google it you will find some suspected causes and possible workarounds. Yeah, it seem to be a pytorch issue. For anyone wondering, the issue seemed to be that pytorch needs more disk space. After moving my folder to a hdd with lots of free space i could train normally.
Atoli closed this issue 2023-06-27 14:07:06 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#284
No description provided.