'list index out of range' error #159

Closed
opened 2023-03-20 06:35:23 +00:00 by nirurin · 3 comments

I was trying to use a custom schedule based on another training program (figured it was worth experimenting with) - [25. 50, 100, 200]

but it throws an error and won't work -

[Training] [2023-03-20T06:32:55.052116] Using BitsAndBytes optimizations
[Training] [2023-03-20T06:32:55.052116] Disabled distributed training.
[Training] [2023-03-20T06:32:55.052116] Path already exists. Rename it to [./training\patrick\finetune_archived_230320-063221]
[Training] [2023-03-20T06:32:55.052116] Loading from ./models/tortoise/dvae.pth
[Training] [2023-03-20T06:32:55.052116] Traceback (most recent call last):
[Training] [2023-03-20T06:32:55.052116] File "C:\Users\nirin\Desktop\AIVoice\ai-voice-cloning\src\train.py", line 68, in
[Training] [2023-03-20T06:32:55.052116] train(config_path, args.launcher)
[Training] [2023-03-20T06:32:55.052116] File "C:\Users\nirin\Desktop\AIVoice\ai-voice-cloning\src\train.py", line 35, in train
[Training] [2023-03-20T06:32:55.052116] trainer.do_training()
[Training] [2023-03-20T06:32:55.052116] File "C:\Users\nirin\Desktop\AIVoice\ai-voice-cloning./modules/dlas\codes\train.py", line 374, in do_training
[Training] [2023-03-20T06:32:55.053116] metric = self.do_step(train_data)
[Training] [2023-03-20T06:32:55.053116] File "C:\Users\nirin\Desktop\AIVoice\ai-voice-cloning./modules/dlas\codes\train.py", line 242, in do_step
[Training] [2023-03-20T06:32:55.053116] gradient_norms_dict = self.model.optimize_parameters(self.current_step, return_grad_norms=will_log)
[Training] [2023-03-20T06:32:55.053116] File "C:\Users\nirin\Desktop\AIVoice\ai-voice-cloning./modules/dlas/codes\trainer\ExtensibleTrainer.py", line 303, in optimize_parameters
[Training] [2023-03-20T06:32:55.053116] ns = step.do_forward_backward(state, m, step_num, train=train_step, no_ddp_sync=(m+1 < self.batch_factor))
[Training] [2023-03-20T06:32:55.053116] File "C:\Users\nirin\Desktop\AIVoice\ai-voice-cloning./modules/dlas/codes\trainer\steps.py", line 220, in do_forward_backward
[Training] [2023-03-20T06:32:55.053116] local_state[k] = v[grad_accum_step]
[Training] [2023-03-20T06:32:55.053116] IndexError: list index out of range

I was trying to use a custom schedule based on another training program (figured it was worth experimenting with) - [25. 50, 100, 200] but it throws an error and won't work - [Training] [2023-03-20T06:32:55.052116] Using BitsAndBytes optimizations [Training] [2023-03-20T06:32:55.052116] Disabled distributed training. [Training] [2023-03-20T06:32:55.052116] Path already exists. Rename it to [./training\patrick\finetune_archived_230320-063221] [Training] [2023-03-20T06:32:55.052116] Loading from ./models/tortoise/dvae.pth [Training] [2023-03-20T06:32:55.052116] Traceback (most recent call last): [Training] [2023-03-20T06:32:55.052116] File "C:\Users\nirin\Desktop\AIVoice\ai-voice-cloning\src\train.py", line 68, in <module> [Training] [2023-03-20T06:32:55.052116] train(config_path, args.launcher) [Training] [2023-03-20T06:32:55.052116] File "C:\Users\nirin\Desktop\AIVoice\ai-voice-cloning\src\train.py", line 35, in train [Training] [2023-03-20T06:32:55.052116] trainer.do_training() [Training] [2023-03-20T06:32:55.052116] File "C:\Users\nirin\Desktop\AIVoice\ai-voice-cloning\./modules/dlas\codes\train.py", line 374, in do_training [Training] [2023-03-20T06:32:55.053116] metric = self.do_step(train_data) [Training] [2023-03-20T06:32:55.053116] File "C:\Users\nirin\Desktop\AIVoice\ai-voice-cloning\./modules/dlas\codes\train.py", line 242, in do_step [Training] [2023-03-20T06:32:55.053116] gradient_norms_dict = self.model.optimize_parameters(self.current_step, return_grad_norms=will_log) [Training] [2023-03-20T06:32:55.053116] File "C:\Users\nirin\Desktop\AIVoice\ai-voice-cloning\./modules/dlas/codes\trainer\ExtensibleTrainer.py", line 303, in optimize_parameters [Training] [2023-03-20T06:32:55.053116] ns = step.do_forward_backward(state, m, step_num, train=train_step, no_ddp_sync=(m+1 < self.batch_factor)) [Training] [2023-03-20T06:32:55.053116] File "C:\Users\nirin\Desktop\AIVoice\ai-voice-cloning\./modules/dlas/codes\trainer\steps.py", line 220, in do_forward_backward [Training] [2023-03-20T06:32:55.053116] local_state[k] = v[grad_accum_step] [Training] [2023-03-20T06:32:55.053116] IndexError: list index out of range
nirurin changed title from custom LR schedule causing 'list index out of range' error to 'list index out of range' error 2023-03-20 06:52:06 +00:00
Author

Did a forced reinstall but no change. Still won't work with a custom-entered schedule for some reason. I know it used to work. Not sure what's going on.

Did a forced reinstall but no change. Still won't work with a custom-entered schedule for some reason. I know it used to work. Not sure what's going on.
Owner

Your gradient accumulation size is either too large or not divisible enough by your batch size.

Your gradient accumulation size is either too large or not divisible enough by your batch size.
mrq closed this issue 2023-03-20 13:12:51 +00:00
Author

Your gradient accumulation size is either too large or not divisible enough by your batch size.

Really? I'm using the same gradient size for both attempts, and the default scheduling will train. It only doesn't train if I use custom scheduling. Unless it's somehow borderline and the different scheduling uses more ram or something?!

Seems its working now so must have been it! tthanks

> Your gradient accumulation size is either too large or not divisible enough by your batch size. Really? I'm using the same gradient size for both attempts, and the default scheduling will train. It only doesn't train if I use custom scheduling. Unless it's somehow borderline and the different scheduling uses more ram or something?! Seems its working now so must have been it! tthanks
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#159
No description provided.