[Training] Detected call of lr_scheduler.step() before optimizer.step() #276

Closed
opened 2023-06-21 12:47:17 +07:00 by ristoman · 2 comments

I'm having issues running my own training on a local machine.
Everything else seems to work fine, but after validating training configuration and using it to train my own model, the process starts up until i get this console output:

[Training] [2023-06-21T14:37:23.001458] 23-06-21 14:37:22.994 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-06-21T14:37:25.088231] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-06-21T14:37:27.465151] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-06-21T14:37:27.908479] C:\Users\andrea\Sites\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
[Training] [2023-06-21T14:37:27.908479]   warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "

After which the process hangs and I have to kill it. I'm pretty sure I followed instructions to the letter. I understand it has to do with an older version of torch but I would assume the app already deals with that since torch 2.x is a requirement? Any help would be appreciated. Thanks!

For reference here's my setup:

OS: Win10
GPU: Nvidia GeForce GTX 1070
torch: 2.0.1+cu118

I'm having issues running my own training on a local machine. Everything else seems to work fine, but after validating training configuration and using it to train my own model, the process starts up until i get this console output: ``` [Training] [2023-06-21T14:37:23.001458] 23-06-21 14:37:22.994 - INFO: Start training from epoch: 0, iter: 0 [Training] [2023-06-21T14:37:25.088231] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-06-21T14:37:27.465151] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-06-21T14:37:27.908479] C:\Users\andrea\Sites\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate [Training] [2023-06-21T14:37:27.908479] warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " ``` After which the process hangs and I have to kill it. I'm pretty sure I followed instructions to the letter. I understand it has to do with an older version of torch but I would assume the app already deals with that since torch 2.x is a requirement? Any help would be appreciated. Thanks! For reference here's my setup: OS: Win10 GPU: Nvidia GeForce GTX 1070 torch: 2.0.1+cu118

Maybe because of [Training] [2023-06-21T14:37:27.465151] NOTE: Redirects are currently not supported in Windows or MacOs.

On linux I get that error message and the training runs fine. With 2 gpu it sometimes stops and has to be restarted though. One GPU will keep going and one will be 0 but no OOM.

Maybe because of` [Training] [2023-06-21T14:37:27.465151] NOTE: Redirects are currently not supported in Windows or MacOs.` On linux I get that error message and the training runs fine. With 2 gpu it sometimes stops and has to be restarted though. One GPU will keep going and one will be 0 but no OOM.

I actually realized this is just a warning. Even though my setup is slow by training standards, after a while the training does kick in. So it's not a blocker as I first thought.

I actually realized this is just a warning. Even though my setup is slow by training standards, after a while the training does kick in. So it's not a blocker as I first thought.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#276
There is no content yet.