Error involving zipfile upon attempting to resume training. #170
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#170
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Python Ver: 3.10.6
OS: Windows 10
GPU: RTX 2070 Super
What I was trying to do: My training failed due to the massive number of extra models made by default, so I switched it to one every 50 epochs, then pointed the config back to the resume state "./training/desco/finetune/training_state/300.state", and the partially trained model, "./training/desco/finetune/models/300_gpt.pth". Since the webUI starts training from for the number of epochs no matter what, even if you're resuming, I had to copy over the archived fine-tuning data from the folder to the main finetune, not sure if that'd affect anything.
In any case, upon trying to resume training, I received an error that it failed to read some sort of zipfile. I'm unsure what zipfile it's trying to access, or why it is, but here's the full console output:
If anyone has any ideas what it could be or why this is happening, please let me know. It'd be a real shame to lose 300 epochs.
The .pth files are actually zips. See if you can open your 300_gpt.pth in 7z or similiar archive program. If it's corrupted you might be out of luck.