Can't identify what's wrong (potentially list index out of range) #436
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#436
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
First of all, I need to mention that I'm a technical newbie and not a developer. I've had some experience with coding but nearly nothing as complex as data ai stuff. I've downloaded this helpful open source for simple production purposes.
So the problem I'm having is when I'm running a training to clone the voice, but it simply doesn't work. Surfing this issue-reporting forum and continuously reading through the output log, I've tried to trace the reason for the error. But I can't seem to find it. It seems like the index out of range problem, but I always validate the training parameters before launching it (not before I restart the app).
Here are the parameters for the training.
epochs: 200
Learning rate: 0.00001
MEL LR Ratio: 1
Text LR Ratio: 0.01
Batch size: 17
Gradient: 8
Save freq: 5
Val freq: 5
Worker processes: 2
GPUs: 1
My characterstics.
Python 3.9 and 3.12 (I installed both, but the PATH one is 3.9)
NVIDIA 2060 RTX
Spawning process: train.bat ./training/spiderman/train.yaml
[Training] [2023-11-01T14:51:23.374840]
[Training] [2023-11-01T14:51:23.381241] (venv) C:\voicecloning\ai-voice-cloning>call .\venv\Scripts\activate.bat
[Training] [2023-11-01T14:51:27.128319] [2023-11-01 14:51:27,128] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-11-01T14:51:30.376107] 23-11-01 14:51:30.376 - INFO: name: spiderman
[Training] [2023-11-01T14:51:30.382113] model: extensibletrainer
[Training] [2023-11-01T14:51:30.386630] scale: 1
[Training] [2023-11-01T14:51:30.399021] gpu_ids: [0]
[Training] [2023-11-01T14:51:30.403023] start_step: 0
[Training] [2023-11-01T14:51:30.409569] checkpointing_enabled: True
[Training] [2023-11-01T14:51:30.414570] fp16: False
[Training] [2023-11-01T14:51:30.419094] bitsandbytes: True
[Training] [2023-11-01T14:51:30.424604] gpus: 1
[Training] [2023-11-01T14:51:30.431623] datasets:[
[Training] [2023-11-01T14:51:30.438834] train:[
[Training] [2023-11-01T14:51:30.443829] name: training
[Training] [2023-11-01T14:51:30.448340] n_workers: 2
[Training] [2023-11-01T14:51:30.453342] batch_size: 17
[Training] [2023-11-01T14:51:30.459865] mode: paired_voice_audio
[Training] [2023-11-01T14:51:30.465395] path: ./training/spiderman/train.txt
[Training] [2023-11-01T14:51:30.471407] fetcher_mode: ['lj']
[Training] [2023-11-01T14:51:30.476775] phase: train
[Training] [2023-11-01T14:51:30.481764] max_wav_length: 255995
[Training] [2023-11-01T14:51:30.487284] max_text_length: 200
[Training] [2023-11-01T14:51:30.492286] sample_rate: 22050
[Training] [2023-11-01T14:51:30.499808] load_conditioning: True
[Training] [2023-11-01T14:51:30.507345] num_conditioning_candidates: 2
[Training] [2023-11-01T14:51:30.513342] conditioning_length: 44000
[Training] [2023-11-01T14:51:30.519874] use_bpe_tokenizer: True
[Training] [2023-11-01T14:51:30.523855] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-11-01T14:51:30.529010] load_aligned_codes: False
[Training] [2023-11-01T14:51:30.535537] data_type: img
[Training] [2023-11-01T14:51:30.540538] ]
[Training] [2023-11-01T14:51:30.545038] val:[
[Training] [2023-11-01T14:51:30.551081] name: validation
[Training] [2023-11-01T14:51:30.556580] n_workers: 2
[Training] [2023-11-01T14:51:30.561575] batch_size: 0
[Training] [2023-11-01T14:51:30.566664] mode: paired_voice_audio
[Training] [2023-11-01T14:51:30.571662] path: ./training/spiderman/validation.txt
[Training] [2023-11-01T14:51:30.576181] fetcher_mode: ['lj']
[Training] [2023-11-01T14:51:30.581181] phase: val
[Training] [2023-11-01T14:51:30.585699] max_wav_length: 255995
[Training] [2023-11-01T14:51:30.590697] max_text_length: 200
[Training] [2023-11-01T14:51:30.595881] sample_rate: 22050
[Training] [2023-11-01T14:51:30.600882] load_conditioning: True
[Training] [2023-11-01T14:51:30.606003] num_conditioning_candidates: 2
[Training] [2023-11-01T14:51:30.612005] conditioning_length: 44000
[Training] [2023-11-01T14:51:30.616531] use_bpe_tokenizer: True
[Training] [2023-11-01T14:51:30.620516] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-11-01T14:51:30.627081] load_aligned_codes: False
[Training] [2023-11-01T14:51:30.633073] data_type: img
[Training] [2023-11-01T14:51:30.637588] ]
[Training] [2023-11-01T14:51:30.643591] ]
[Training] [2023-11-01T14:51:30.648118] steps:[
[Training] [2023-11-01T14:51:30.653118] gpt_train:[
[Training] [2023-11-01T14:51:30.658645] training: gpt
[Training] [2023-11-01T14:51:30.662645] loss_log_buffer: 500
[Training] [2023-11-01T14:51:30.669179] optimizer: adamw
[Training] [2023-11-01T14:51:30.674693] optimizer_params:[
[Training] [2023-11-01T14:51:30.680703] lr: 1e-05
[Training] [2023-11-01T14:51:30.687226] weight_decay: 0.01
[Training] [2023-11-01T14:51:30.693229] beta1: 0.9
[Training] [2023-11-01T14:51:30.697744] beta2: 0.96
[Training] [2023-11-01T14:51:30.702745] ]
[Training] [2023-11-01T14:51:30.709278] clip_grad_eps: 4
[Training] [2023-11-01T14:51:30.714786] injectors:[
[Training] [2023-11-01T14:51:30.719808] paired_to_mel:[
[Training] [2023-11-01T14:51:30.726339] type: torch_mel_spectrogram
[Training] [2023-11-01T14:51:30.730867] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-11-01T14:51:30.738391] in: wav
[Training] [2023-11-01T14:51:30.744908] out: paired_mel
[Training] [2023-11-01T14:51:30.752922] ]
[Training] [2023-11-01T14:51:30.759439] paired_cond_to_mel:[
[Training] [2023-11-01T14:51:30.764437] type: for_each
[Training] [2023-11-01T14:51:30.769541] subtype: torch_mel_spectrogram
[Training] [2023-11-01T14:51:30.774539] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-11-01T14:51:30.779776] in: conditioning
[Training] [2023-11-01T14:51:30.784774] out: paired_conditioning_mel
[Training] [2023-11-01T14:51:30.789967] ]
[Training] [2023-11-01T14:51:30.795497] to_codes:[
[Training] [2023-11-01T14:51:30.800495] type: discrete_token
[Training] [2023-11-01T14:51:30.806012] in: paired_mel
[Training] [2023-11-01T14:51:30.810013] out: paired_mel_codes
[Training] [2023-11-01T14:51:30.815540] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml
[Training] [2023-11-01T14:51:30.821558] ]
[Training] [2023-11-01T14:51:30.827080] paired_fwd_text:[
[Training] [2023-11-01T14:51:30.834598] type: generator
[Training] [2023-11-01T14:51:30.838605] generator: gpt
[Training] [2023-11-01T14:51:30.842608] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
[Training] [2023-11-01T14:51:30.848136] out: ['loss_text_ce', 'loss_mel_ce', 'logits']
[Training] [2023-11-01T14:51:30.853136] ]
[Training] [2023-11-01T14:51:30.858658] ]
[Training] [2023-11-01T14:51:30.863656] losses:[
[Training] [2023-11-01T14:51:30.870092] text_ce:[
[Training] [2023-11-01T14:51:30.875604] type: direct
[Training] [2023-11-01T14:51:30.881601] weight: 0.01
[Training] [2023-11-01T14:51:30.886002] key: loss_text_ce
[Training] [2023-11-01T14:51:30.891015] ]
[Training] [2023-11-01T14:51:30.896525] mel_ce:[
[Training] [2023-11-01T14:51:30.901530] type: direct
[Training] [2023-11-01T14:51:30.907051] weight: 1
[Training] [2023-11-01T14:51:30.917570] key: loss_mel_ce
[Training] [2023-11-01T14:51:30.921569] ]
[Training] [2023-11-01T14:51:30.927091] ]
[Training] [2023-11-01T14:51:30.932090] ]
[Training] [2023-11-01T14:51:30.937611] ]
[Training] [2023-11-01T14:51:30.943607] networks:[
[Training] [2023-11-01T14:51:30.948127] gpt:[
[Training] [2023-11-01T14:51:30.953127] type: generator
[Training] [2023-11-01T14:51:30.957645] which_model_G: unified_voice2
[Training] [2023-11-01T14:51:30.962642] kwargs:[
[Training] [2023-11-01T14:51:30.967168] layers: 30
[Training] [2023-11-01T14:51:30.972167] model_dim: 1024
[Training] [2023-11-01T14:51:30.976683] heads: 16
[Training] [2023-11-01T14:51:30.981681] max_text_tokens: 402
[Training] [2023-11-01T14:51:30.987216] max_mel_tokens: 604
[Training] [2023-11-01T14:51:30.992216] max_conditioning_inputs: 2
[Training] [2023-11-01T14:51:30.996737] mel_length_compression: 1024
[Training] [2023-11-01T14:51:31.000734] number_text_tokens: 256
[Training] [2023-11-01T14:51:31.007257] number_mel_codes: 8194
[Training] [2023-11-01T14:51:31.011257] start_mel_token: 8192
[Training] [2023-11-01T14:51:31.014761] stop_mel_token: 8193
[Training] [2023-11-01T14:51:31.019776] start_text_token: 255
[Training] [2023-11-01T14:51:31.023774] train_solo_embeddings: False
[Training] [2023-11-01T14:51:31.030304] use_mel_codes_as_input: True
[Training] [2023-11-01T14:51:31.034821] checkpointing: True
[Training] [2023-11-01T14:51:31.039841] tortoise_compat: True
[Training] [2023-11-01T14:51:31.043837] ]
[Training] [2023-11-01T14:51:31.048360] ]
[Training] [2023-11-01T14:51:31.052363] ]
[Training] [2023-11-01T14:51:31.056884] path:[
[Training] [2023-11-01T14:51:31.060884] strict_load: True
[Training] [2023-11-01T14:51:31.064394] pretrain_model_gpt: ./models/tortoise/autoregressive.pth
[Training] [2023-11-01T14:51:31.070401] root: ./
[Training] [2023-11-01T14:51:31.074404] experiments_root: ./training\spiderman\finetune
[Training] [2023-11-01T14:51:31.079681] models: ./training\spiderman\finetune\models
[Training] [2023-11-01T14:51:31.084684] training_state: ./training\spiderman\finetune\training_state
[Training] [2023-11-01T14:51:31.088205] log: ./training\spiderman\finetune
[Training] [2023-11-01T14:51:31.092203] val_images: ./training\spiderman\finetune\val_images
[Training] [2023-11-01T14:51:31.095719] ]
[Training] [2023-11-01T14:51:31.102729] train:[
[Training] [2023-11-01T14:51:31.107246] niter: 200
[Training] [2023-11-01T14:51:31.112247] warmup_iter: -1
[Training] [2023-11-01T14:51:31.116777] mega_batch_factor: 8
[Training] [2023-11-01T14:51:31.119779] val_freq: 5
[Training] [2023-11-01T14:51:31.123779] ema_enabled: False
[Training] [2023-11-01T14:51:31.127299] default_lr_scheme: MultiStepLR
[Training] [2023-11-01T14:51:31.132300] gen_lr_steps: [2, 4, 9, 18, 25, 33, 50]
[Training] [2023-11-01T14:51:31.135838] lr_gamma: 0.5
[Training] [2023-11-01T14:51:31.139826] ]
[Training] [2023-11-01T14:51:31.144824] eval:[
[Training] [2023-11-01T14:51:31.149938] pure: False
[Training] [2023-11-01T14:51:31.154993] output_state: gen
[Training] [2023-11-01T14:51:31.159004] ]
[Training] [2023-11-01T14:51:31.163001] logger:[
[Training] [2023-11-01T14:51:31.168537] save_checkpoint_freq: 5
[Training] [2023-11-01T14:51:31.172536] visuals: ['gen', 'mel']
[Training] [2023-11-01T14:51:31.177752] visual_debug_rate: 5
[Training] [2023-11-01T14:51:31.181754] is_mel_spectrogram: True
[Training] [2023-11-01T14:51:31.186266] ]
[Training] [2023-11-01T14:51:31.190275] is_train: True
[Training] [2023-11-01T14:51:31.196799] dist: False
[Training] [2023-11-01T14:51:31.202799]
[Training] [2023-11-01T14:51:31.208319] 23-11-01 14:51:30.376 - INFO: Random seed: 522
[Training] [2023-11-01T14:51:32.133146] 23-11-01 14:51:32.133 - INFO: Number of training data elements: 17, iters: 1
[Training] [2023-11-01T14:51:32.138661] 23-11-01 14:51:32.133 - INFO: Total epochs needed: 200 for iters 200
[Training] [2023-11-01T14:51:33.402351] C:\voicecloning\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing
gradient_checkpointing
to a config initialization is deprecated and will be removed in v5 Transformers. Usingmodel.gradient_checkpointing_enable()
instead, or if you are using theTrainer
API, passgradient_checkpointing=True
in yourTrainingArguments
.[Training] [2023-11-01T14:51:33.406865] warnings.warn(
[Training] [2023-11-01T14:51:44.558355] 23-11-01 14:51:44.558 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2023-11-01T14:51:45.722417] 23-11-01 14:51:45.707 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-11-01T14:51:48.280470] [2023-11-01 14:51:48,280] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-11-01T14:51:48.341754] [2023-11-01 14:51:48,341] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-11-01T14:51:52.106996] C:\voicecloning\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:136: UserWarning: Detected call of
lr_scheduler.step()
beforeoptimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()
beforelr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate[Training] [2023-11-01T14:51:52.107980] warnings.warn("Detected call of
lr_scheduler.step()
beforeoptimizer.step()
. "[Training] [2023-11-01T14:51:54.674351] C:\voicecloning\ai-voice-cloning\venv\lib\site-packages\torch\utils\checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
[Training] [2023-11-01T14:51:54.674351] warnings.warn(
[Training] [2023-11-01T14:54:03.905786] Disabled distributed training.
[Training] [2023-11-01T14:54:03.905786] Path already exists. Rename it to [./training\spiderman\finetune_archived_231101-145130]
[Training] [2023-11-01T14:54:03.906913] Loading from ./models/tortoise/dvae.pth
[Training] [2023-11-01T14:54:03.906913] Traceback (most recent call last):
[Training] [2023-11-01T14:54:03.906913] File "C:\voicecloning\ai-voice-cloning\src\train.py", line 64, in
[Training] [2023-11-01T14:54:03.916383] train(config_path, args.launcher)
[Training] [2023-11-01T14:54:03.916383] File "C:\voicecloning\ai-voice-cloning\src\train.py", line 31, in train
[Training] [2023-11-01T14:54:03.916383] trainer.do_training()
[Training] [2023-11-01T14:54:03.916383] File "C:\voicecloning\ai-voice-cloning\modules\dlas\dlas\train.py", line 408, in do_training
[Training] [2023-11-01T14:54:03.917387] metric = self.do_step(train_data)
[Training] [2023-11-01T14:54:03.917387] File "C:\voicecloning\ai-voice-cloning\modules\dlas\dlas\train.py", line 271, in do_step
[Training] [2023-11-01T14:54:03.917387] gradient_norms_dict = self.model.optimize_parameters(
[Training] [2023-11-01T14:54:03.917387] File "C:\voicecloning\ai-voice-cloning\modules\dlas\dlas\trainer\ExtensibleTrainer.py", line 321, in optimize_parameters
[Training] [2023-11-01T14:54:03.918397] ns = step.do_forward_backward(
[Training] [2023-11-01T14:54:03.918397] File "C:\voicecloning\ai-voice-cloning\modules\dlas\dlas\trainer\steps.py", line 242, in do_forward_backward
[Training] [2023-11-01T14:54:03.918397] local_state[k] = v[grad_accum_step]
[Training] [2023-11-01T14:54:03.918397] IndexError: list index out of range