Question: What is the meaning of the blue and orange lines in training? #82
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#82
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What are we looking at in the blue and orange lines in the training graph? What does a good graph look like, and what's a bad one look like?
Per the wiki:
I don't have a better answer than that to give you at the current moment in my fleeting free time.
Alight, now that I'm in a slightly better headspace now, I can try to explain what the loss curves mean, but with a brief crash course on what the model does (to my understanding):
The autoregressive model predicts tokens in as
<speech conditioning>:<text tokens>:<MEL tokens>
string, where:Now back to the scope of answering your question. Each curve is responsible for quantifying how accurate the model is.
text
loss quantifies how well the predicted text tokens match the source text. This doesn't necessarily need to have too low of a loss. In fact, trainings that have it lower than the mel loss turns out unusuable.mel
loss quantifies how well the predicted speech tokens match the source audio. This definitely seems to benefit from low loss rates.total
loss is a bit irrelevant, and I should probably hide it since it almost always follows themel
loss, due to how thetext
loss gets weighed.There's also the validation versions of the text and mel losses, which quantifies the defacto similarity from the generated output to the source output, as the validation dataset serves as outside data (as if you're normally generating something). If there's a large deviation betweent he reported losses and the validation losses, then your model probably has started to overfit for the source material.
Below all of that is the learning rate graph, which helps to show what the current learning rate is at. It's not a huge indicator of how training is, as the learning rate curve is determinative.
Here's what a decent graph looks like for a small dataset. Here, you can see that it's probably the "best" at epoch 20 (epoch, as my batch size = dataset size here), as the defacto loss goes higher than the reported loss.
Good explanation. You should probably drop that in the wiki somewhere.