Cannot reshape tensor / Training process hangs instead of closing #109

Closed
opened 2023-03-10 15:41:18 +07:00 by effluvialkraken · 2 comments

Whenever I train a model, the process hits 100.2% and then hangs there forever. No errors are displayed and Python continues to use about 4 GB of memory. Terminating the process with Ctrl+C causes the terminal to say "press any key", but doing so does nothing. Ctrl+C again, and it asks if I want to terminate a batch job. If I say yes, then it closes.

Afterwards, if I try to use any of the models generated, I get the error shown in the traceback.

Whenever I train a model, the process hits 100.2% and then hangs there forever. No errors are displayed and Python continues to use about 4 GB of memory. Terminating the process with Ctrl+C causes the terminal to say "press any key", but doing so does nothing. Ctrl+C again, and it asks if I want to terminate a batch job. If I say yes, then it closes. Afterwards, if I try to use any of the models generated, I get the error shown in the traceback.

Yeah, I'm not sure what caused it to crop up for Windows training, but sometimes the training process will hang after completing the training and won't kill itself. You'll need to enter tskill python to kill all processes as it'll still hold onto the resources.

I'll need to look into having AIVC kill the training process when it detects it's done, which is a bit tricky.

I'll leave this open until I implement the above, but I don't have a good time estimate on when.

Yeah, I'm not sure what caused it to crop up for Windows training, but sometimes the training process will hang after completing the training and won't kill itself. You'll need to enter `tskill python` to kill all processes as it'll still hold onto the resources. I'll need to look into having AIVC kill the training process when it detects it's done, which is a bit tricky. I'll leave this open until I implement the above, but I don't have a good time estimate on when.

I need to validate it, but the training script should print a message when it's done training, and AIVC should use that as an indicator to kill the process. I need to validate it later.

I need to validate it, but the training script should print a message when it's done training, and AIVC should use that as an indicator to kill the process. I need to validate it later.
mrq closed this issue 2023-03-11 16:43:28 +07:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#109
There is no content yet.