Training starts, then immediately stops and reports as "finished". #169
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#169
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Python Version: 3.10.6
GPU: RTX 2070 Super (Max Q)
OS: Windows 10
Summary of what I was trying to do: Upon trying to train a model, the model will train for (presumably) zero steps, then report back as having finished doing so. The only consistent errors I've been receiving are the gradient checkpointing one (Below, in this stack), and another one about "lr.scheduler_step()" being called before "Optimizer.step()" (See second stack.) Any ideas?
And the lr_scheduler one (Though it didn't show in this instance):
This this one occurs for me also, but AFAIK it's harmless, the only thing there that looks weird to me is:
But...
Your batch size is double your dataset size. Try reducing it to the same size.
Oh, I did the same with a batch size matching the dataset size before this! I just adjusted it to double to make it evenly divisible by the gradient acc size, since that was the solution to an unrelated issue that I was troubleshooting (A 'list index out of error range' error. The original settings are as follows:
So, basically the same, but with a batch size of 74 and a gradient acc size of 18. Curiously, I don't have the out of range error anymore, now it's just the same set you saw above in the first stack, with the instantly finishing model training.
Huh, that's weird. Here's a log for some training I did earlier today to compare with:
Only thing I can think of is try reducing your dataset size to 64 so it divides evenly.
Well, that solved it! Guess I won't go with near-prime numbers in the future, thanks a ton!