Is Cos.Annealing ever a better option? #151

New Issue

nirurin · 2023-03-17T22:04:22Z

nirurin commented

2023-03-17 22:04:22 +00:00

I often find multistep either learns too quickly (the graphs drops fast) or only gets as far as about 1.2 before plateauing (and probably not getting down to ~0.5 on the graph until like epoch 20,000 or something.

Though I'm still not sure what a good graph should be. It feels like a gradual curve down to 0.5ish over 1000-2000 epochs should give strong results, but either the curve drops down within about 100 epochs, or it flattens out and never drops lol.

Cos. Annealing tests have let me ... kind of make this curve, but the results so far haven't been stellar.

Maybe 50 epochs is enough?..

I often find multistep either learns too quickly (the graphs drops fast) or only gets as far as about 1.2 before plateauing (and probably not getting down to ~0.5 on the graph until like epoch 20,000 or something. Though I'm still not sure what a good graph should be. It feels like a gradual curve down to 0.5ish over 1000-2000 epochs should give strong results, but either the curve drops down within about 100 epochs, or it flattens out and never drops lol. Cos. Annealing tests have let me ... kind of make this curve, but the results so far haven't been stellar. Maybe 50 epochs is enough?..

nirurin commented

2023-03-17 22:17:39 +00:00

LR = 0.00005

![image](/attachments/0cc224d5-069b-47b3-986e-ee32dff20b19) LR = 0.00005

image.png

52 KiB

psammites commented

2023-03-17 23:11:43 +00:00

Maybe 50 epochs is enough?..

Hard to say without knowing your batch size and how many steps per epoch you have.

> Maybe 50 epochs is enough?.. Hard to say without knowing your batch size and how many steps per epoch you have.

nirurin commented

2023-03-17 23:29:45 +00:00

Maybe 50 epochs is enough?..

Hard to say without knowing your batch size and how many steps per epoch you have.

In this case its a small dataset, 100 files, so batches of 100. 1 step.

> > Maybe 50 epochs is enough?.. > > Hard to say without knowing your batch size and how many steps per epoch you have. > In this case its a small dataset, 100 files, so batches of 100. 1 step.

psammites commented

2023-03-17 23:37:50 +00:00

My gut feeling is that you'd want at least 100-200 epochs, but if your training set is close to a "standard" US English accent then you should be able to get away with less. How's the quality after 50?

nirurin commented

2023-03-17 23:57:33 +00:00

IT just feels like its too fast, and so shouldn't be any good haha. No actual evidence to back that up!

This is my current latest graph, which is more like what (in my warped mind) I would expect it to look like.

But of course I'm making this up as I go. I'm not even totally sure if the yellow line is a good 'goal' for the green line to aim for. I have had some examples that went below the yellow line a lot, and ended up sounding terrible, so I've just assumed really.

IT just feels like its too fast, and so shouldn't be any good haha. No actual evidence to back that up! This is my current latest graph, which is more like what (in my warped mind) I would expect it to look like. But of course I'm making this up as I go. I'm not even totally sure if the yellow line is a good 'goal' for the green line to aim for. I have had some examples that went below the yellow line a lot, and ended up sounding terrible, so I've just assumed really.

Screenshot 2023-03-17 235611.png

56 KiB