Training IndexError: list index out of range #62
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#62
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Python 3.9
GTX 3090
Fresh install
Just trying to train. I was able to succesfully train in a previous version.
[Training] [2023-03-05T19:28:38.502912] File "C:\Users\PC\Desktop\ai-voice-cloning\src\train.py", line 80, in train
[Training] [2023-03-05T19:28:38.503908] trainer.do_training()
[Training] [2023-03-05T19:28:38.503908] File "C:\Users\PC\Desktop\ai-voice-cloning./dlas\codes\train.py", line 331, in do_training
[Training] [2023-03-05T19:28:38.503908] self.do_step(train_data)
[Training] [2023-03-05T19:28:38.504905] File "C:\Users\PC\Desktop\ai-voice-cloning./dlas\codes\train.py", line 212, in do_step
[Training] [2023-03-05T19:28:38.504905] gradient_norms_dict = self.model.optimize_parameters(self.current_step, return_grad_norms=will_log)
[Training] [2023-03-05T19:28:38.504905] File "C:\Users\PC\Desktop\ai-voice-cloning./dlas/codes\trainer\ExtensibleTrainer.py", line 303, in optimize_parameters
[Training] [2023-03-05T19:28:38.505902] ns = step.do_forward_backward(state, m, step_num, train=train_step, no_ddp_sync=(m+1 < self.batch_factor))
[Training] [2023-03-05T19:28:38.505902] File "C:\Users\PC\Desktop\ai-voice-cloning./dlas/codes\trainer\steps.py", line 220, in do_forward_backward
[Training] [2023-03-05T19:28:38.505902] local_state[k] = v[grad_accum_step]
[Training] [2023-03-05T19:28:38.506897] IndexError: list index out of range
Make sure you click
Validate Training Configuration
for your given settings before saving. desu I pretty sure this is specifically becauseBatch Size / Gradient Accumulation Size > 2
, so validation will clamp it down.Ah it's working now, I must have forgotten to click it
Different issue
Traceback (most recent call last):
File "C:\Users\PC\Desktop\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 384, in run_predict
output = await app.get_blocks().process_api(
File "C:\Users\PC\Desktop\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1032, in process_api
result = await self.call_function(
File "C:\Users\PC\Desktop\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 858, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\PC\Desktop\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\PC\Desktop\ai-voice-cloning\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Users\PC\Desktop\ai-voice-cloning\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Users\PC\Desktop\ai-voice-cloning\venv\lib\site-packages\gradio\utils.py", line 448, in async_iteration
return next(iterator)
File "C:\Users\PC\Desktop\ai-voice-cloning\src\utils.py", line 877, in run_training
result, percent, message = training_state.parse( line=line, verbose=verbose, keep_x_past_datasets=keep_x_past_datasets, progress=progress )
File "C:\Users\PC\Desktop\ai-voice-cloning\src\utils.py", line 770, in parse
self.epoch_rate = f'{"{:.3f}".format(self.epoch_time_delta)}s/epoch' if self.epoch_time_delta >= 1 else f'{"{:.3f}".format(1/self.epoch_time_delta)}epoch/s' # I doubt anyone will have it/s rates, but its here
ZeroDivisionError: float division by zero
Lazily wrapped in a try/catch block and one extra
== 0
check in commit35225a35da
.Error gone.
The training can exceed it's epoch settings
Got it to replicate, the epoch counter manages to desync when it takes one step to complete an epoch.
I'll try and probe why that's so, despite having a line to re-sync.
Think I fixed it.