Validate Training Configuration Gets Stuck in a Loop #208

Open
opened 2023-04-15 10:35:42 +07:00 by chigkim · 1 comments

First validate:

Batch size is not evenly divisible by the gradient accumulation size, adjusting gradient accumulation size to: 11
Batch ratio (10) is expected to exceed your VRAM capacity (14.748GB, suggested 4 batch size cap), adjusting gradient accumulation size to: 29
! EXPERIMENTAL ! BitsAndBytes requested.
For 500 epochs with 119 lines in batches of 119, iterating for 500 steps (1) steps per epoch)

Second validate:

Batch size is not evenly divisible by the gradient accumulation size, adjusting gradient accumulation size to: 26
! EXPERIMENTAL ! BitsAndBytes requested.
For 500 epochs with 119 lines in batches of 119, iterating for 500 steps (1) steps per epoch)

If I hit validate again, it goes to the first output and gets stuck in a loop going bak and forth between first and second.

Both configuration gives list index out of range error when training.

First validate: ``` Batch size is not evenly divisible by the gradient accumulation size, adjusting gradient accumulation size to: 11 Batch ratio (10) is expected to exceed your VRAM capacity (14.748GB, suggested 4 batch size cap), adjusting gradient accumulation size to: 29 ! EXPERIMENTAL ! BitsAndBytes requested. For 500 epochs with 119 lines in batches of 119, iterating for 500 steps (1) steps per epoch) ``` Second validate: ``` Batch size is not evenly divisible by the gradient accumulation size, adjusting gradient accumulation size to: 26 ! EXPERIMENTAL ! BitsAndBytes requested. For 500 epochs with 119 lines in batches of 119, iterating for 500 steps (1) steps per epoch) ``` If I hit validate again, it goes to the first output and gets stuck in a loop going bak and forth between first and second. Both configuration gives list index out of range error when training.

Add another sample so that your batch size is divisible by more numbers. Probably not the "correct" solution but the most expedient one.

Add another sample so that your batch size is divisible by more numbers. Probably not the "correct" solution but the most expedient one.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#208
There is no content yet.