Can't identify what's wrong (potentially list index out of range) #436

Open
opened 2023-11-01 09:14:57 +00:00 by AmirkhanIdel · 0 comments

First of all, I need to mention that I'm a technical newbie and not a developer. I've had some experience with coding but nearly nothing as complex as data ai stuff. I've downloaded this helpful open source for simple production purposes.

So the problem I'm having is when I'm running a training to clone the voice, but it simply doesn't work. Surfing this issue-reporting forum and continuously reading through the output log, I've tried to trace the reason for the error. But I can't seem to find it. It seems like the index out of range problem, but I always validate the training parameters before launching it (not before I restart the app).

Here are the parameters for the training.
epochs: 200
Learning rate: 0.00001
MEL LR Ratio: 1
Text LR Ratio: 0.01
Batch size: 17
Gradient: 8
Save freq: 5
Val freq: 5
Worker processes: 2
GPUs: 1

My characterstics.
Python 3.9 and 3.12 (I installed both, but the PATH one is 3.9)
NVIDIA 2060 RTX

Spawning process: train.bat ./training/spiderman/train.yaml
[Training] [2023-11-01T14:51:23.374840]
[Training] [2023-11-01T14:51:23.381241] (venv) C:\voicecloning\ai-voice-cloning>call .\venv\Scripts\activate.bat
[Training] [2023-11-01T14:51:27.128319] [2023-11-01 14:51:27,128] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-11-01T14:51:30.376107] 23-11-01 14:51:30.376 - INFO: name: spiderman
[Training] [2023-11-01T14:51:30.382113] model: extensibletrainer
[Training] [2023-11-01T14:51:30.386630] scale: 1
[Training] [2023-11-01T14:51:30.399021] gpu_ids: [0]
[Training] [2023-11-01T14:51:30.403023] start_step: 0
[Training] [2023-11-01T14:51:30.409569] checkpointing_enabled: True
[Training] [2023-11-01T14:51:30.414570] fp16: False
[Training] [2023-11-01T14:51:30.419094] bitsandbytes: True
[Training] [2023-11-01T14:51:30.424604] gpus: 1
[Training] [2023-11-01T14:51:30.431623] datasets:[
[Training] [2023-11-01T14:51:30.438834] train:[
[Training] [2023-11-01T14:51:30.443829] name: training
[Training] [2023-11-01T14:51:30.448340] n_workers: 2
[Training] [2023-11-01T14:51:30.453342] batch_size: 17
[Training] [2023-11-01T14:51:30.459865] mode: paired_voice_audio
[Training] [2023-11-01T14:51:30.465395] path: ./training/spiderman/train.txt
[Training] [2023-11-01T14:51:30.471407] fetcher_mode: ['lj']
[Training] [2023-11-01T14:51:30.476775] phase: train
[Training] [2023-11-01T14:51:30.481764] max_wav_length: 255995
[Training] [2023-11-01T14:51:30.487284] max_text_length: 200
[Training] [2023-11-01T14:51:30.492286] sample_rate: 22050
[Training] [2023-11-01T14:51:30.499808] load_conditioning: True
[Training] [2023-11-01T14:51:30.507345] num_conditioning_candidates: 2
[Training] [2023-11-01T14:51:30.513342] conditioning_length: 44000
[Training] [2023-11-01T14:51:30.519874] use_bpe_tokenizer: True
[Training] [2023-11-01T14:51:30.523855] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-11-01T14:51:30.529010] load_aligned_codes: False
[Training] [2023-11-01T14:51:30.535537] data_type: img
[Training] [2023-11-01T14:51:30.540538] ]
[Training] [2023-11-01T14:51:30.545038] val:[
[Training] [2023-11-01T14:51:30.551081] name: validation
[Training] [2023-11-01T14:51:30.556580] n_workers: 2
[Training] [2023-11-01T14:51:30.561575] batch_size: 0
[Training] [2023-11-01T14:51:30.566664] mode: paired_voice_audio
[Training] [2023-11-01T14:51:30.571662] path: ./training/spiderman/validation.txt
[Training] [2023-11-01T14:51:30.576181] fetcher_mode: ['lj']
[Training] [2023-11-01T14:51:30.581181] phase: val
[Training] [2023-11-01T14:51:30.585699] max_wav_length: 255995
[Training] [2023-11-01T14:51:30.590697] max_text_length: 200
[Training] [2023-11-01T14:51:30.595881] sample_rate: 22050
[Training] [2023-11-01T14:51:30.600882] load_conditioning: True
[Training] [2023-11-01T14:51:30.606003] num_conditioning_candidates: 2
[Training] [2023-11-01T14:51:30.612005] conditioning_length: 44000
[Training] [2023-11-01T14:51:30.616531] use_bpe_tokenizer: True
[Training] [2023-11-01T14:51:30.620516] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-11-01T14:51:30.627081] load_aligned_codes: False
[Training] [2023-11-01T14:51:30.633073] data_type: img
[Training] [2023-11-01T14:51:30.637588] ]
[Training] [2023-11-01T14:51:30.643591] ]
[Training] [2023-11-01T14:51:30.648118] steps:[
[Training] [2023-11-01T14:51:30.653118] gpt_train:[
[Training] [2023-11-01T14:51:30.658645] training: gpt
[Training] [2023-11-01T14:51:30.662645] loss_log_buffer: 500
[Training] [2023-11-01T14:51:30.669179] optimizer: adamw
[Training] [2023-11-01T14:51:30.674693] optimizer_params:[
[Training] [2023-11-01T14:51:30.680703] lr: 1e-05
[Training] [2023-11-01T14:51:30.687226] weight_decay: 0.01
[Training] [2023-11-01T14:51:30.693229] beta1: 0.9
[Training] [2023-11-01T14:51:30.697744] beta2: 0.96
[Training] [2023-11-01T14:51:30.702745] ]
[Training] [2023-11-01T14:51:30.709278] clip_grad_eps: 4
[Training] [2023-11-01T14:51:30.714786] injectors:[
[Training] [2023-11-01T14:51:30.719808] paired_to_mel:[
[Training] [2023-11-01T14:51:30.726339] type: torch_mel_spectrogram
[Training] [2023-11-01T14:51:30.730867] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-11-01T14:51:30.738391] in: wav
[Training] [2023-11-01T14:51:30.744908] out: paired_mel
[Training] [2023-11-01T14:51:30.752922] ]
[Training] [2023-11-01T14:51:30.759439] paired_cond_to_mel:[
[Training] [2023-11-01T14:51:30.764437] type: for_each
[Training] [2023-11-01T14:51:30.769541] subtype: torch_mel_spectrogram
[Training] [2023-11-01T14:51:30.774539] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-11-01T14:51:30.779776] in: conditioning
[Training] [2023-11-01T14:51:30.784774] out: paired_conditioning_mel
[Training] [2023-11-01T14:51:30.789967] ]
[Training] [2023-11-01T14:51:30.795497] to_codes:[
[Training] [2023-11-01T14:51:30.800495] type: discrete_token
[Training] [2023-11-01T14:51:30.806012] in: paired_mel
[Training] [2023-11-01T14:51:30.810013] out: paired_mel_codes
[Training] [2023-11-01T14:51:30.815540] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml
[Training] [2023-11-01T14:51:30.821558] ]
[Training] [2023-11-01T14:51:30.827080] paired_fwd_text:[
[Training] [2023-11-01T14:51:30.834598] type: generator
[Training] [2023-11-01T14:51:30.838605] generator: gpt
[Training] [2023-11-01T14:51:30.842608] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
[Training] [2023-11-01T14:51:30.848136] out: ['loss_text_ce', 'loss_mel_ce', 'logits']
[Training] [2023-11-01T14:51:30.853136] ]
[Training] [2023-11-01T14:51:30.858658] ]
[Training] [2023-11-01T14:51:30.863656] losses:[
[Training] [2023-11-01T14:51:30.870092] text_ce:[
[Training] [2023-11-01T14:51:30.875604] type: direct
[Training] [2023-11-01T14:51:30.881601] weight: 0.01
[Training] [2023-11-01T14:51:30.886002] key: loss_text_ce
[Training] [2023-11-01T14:51:30.891015] ]
[Training] [2023-11-01T14:51:30.896525] mel_ce:[
[Training] [2023-11-01T14:51:30.901530] type: direct
[Training] [2023-11-01T14:51:30.907051] weight: 1
[Training] [2023-11-01T14:51:30.917570] key: loss_mel_ce
[Training] [2023-11-01T14:51:30.921569] ]
[Training] [2023-11-01T14:51:30.927091] ]
[Training] [2023-11-01T14:51:30.932090] ]
[Training] [2023-11-01T14:51:30.937611] ]
[Training] [2023-11-01T14:51:30.943607] networks:[
[Training] [2023-11-01T14:51:30.948127] gpt:[
[Training] [2023-11-01T14:51:30.953127] type: generator
[Training] [2023-11-01T14:51:30.957645] which_model_G: unified_voice2
[Training] [2023-11-01T14:51:30.962642] kwargs:[
[Training] [2023-11-01T14:51:30.967168] layers: 30
[Training] [2023-11-01T14:51:30.972167] model_dim: 1024
[Training] [2023-11-01T14:51:30.976683] heads: 16
[Training] [2023-11-01T14:51:30.981681] max_text_tokens: 402
[Training] [2023-11-01T14:51:30.987216] max_mel_tokens: 604
[Training] [2023-11-01T14:51:30.992216] max_conditioning_inputs: 2
[Training] [2023-11-01T14:51:30.996737] mel_length_compression: 1024
[Training] [2023-11-01T14:51:31.000734] number_text_tokens: 256
[Training] [2023-11-01T14:51:31.007257] number_mel_codes: 8194
[Training] [2023-11-01T14:51:31.011257] start_mel_token: 8192
[Training] [2023-11-01T14:51:31.014761] stop_mel_token: 8193
[Training] [2023-11-01T14:51:31.019776] start_text_token: 255
[Training] [2023-11-01T14:51:31.023774] train_solo_embeddings: False
[Training] [2023-11-01T14:51:31.030304] use_mel_codes_as_input: True
[Training] [2023-11-01T14:51:31.034821] checkpointing: True
[Training] [2023-11-01T14:51:31.039841] tortoise_compat: True
[Training] [2023-11-01T14:51:31.043837] ]
[Training] [2023-11-01T14:51:31.048360] ]
[Training] [2023-11-01T14:51:31.052363] ]
[Training] [2023-11-01T14:51:31.056884] path:[
[Training] [2023-11-01T14:51:31.060884] strict_load: True
[Training] [2023-11-01T14:51:31.064394] pretrain_model_gpt: ./models/tortoise/autoregressive.pth
[Training] [2023-11-01T14:51:31.070401] root: ./
[Training] [2023-11-01T14:51:31.074404] experiments_root: ./training\spiderman\finetune
[Training] [2023-11-01T14:51:31.079681] models: ./training\spiderman\finetune\models
[Training] [2023-11-01T14:51:31.084684] training_state: ./training\spiderman\finetune\training_state
[Training] [2023-11-01T14:51:31.088205] log: ./training\spiderman\finetune
[Training] [2023-11-01T14:51:31.092203] val_images: ./training\spiderman\finetune\val_images
[Training] [2023-11-01T14:51:31.095719] ]
[Training] [2023-11-01T14:51:31.102729] train:[
[Training] [2023-11-01T14:51:31.107246] niter: 200
[Training] [2023-11-01T14:51:31.112247] warmup_iter: -1
[Training] [2023-11-01T14:51:31.116777] mega_batch_factor: 8
[Training] [2023-11-01T14:51:31.119779] val_freq: 5
[Training] [2023-11-01T14:51:31.123779] ema_enabled: False
[Training] [2023-11-01T14:51:31.127299] default_lr_scheme: MultiStepLR
[Training] [2023-11-01T14:51:31.132300] gen_lr_steps: [2, 4, 9, 18, 25, 33, 50]
[Training] [2023-11-01T14:51:31.135838] lr_gamma: 0.5
[Training] [2023-11-01T14:51:31.139826] ]
[Training] [2023-11-01T14:51:31.144824] eval:[
[Training] [2023-11-01T14:51:31.149938] pure: False
[Training] [2023-11-01T14:51:31.154993] output_state: gen
[Training] [2023-11-01T14:51:31.159004] ]
[Training] [2023-11-01T14:51:31.163001] logger:[
[Training] [2023-11-01T14:51:31.168537] save_checkpoint_freq: 5
[Training] [2023-11-01T14:51:31.172536] visuals: ['gen', 'mel']
[Training] [2023-11-01T14:51:31.177752] visual_debug_rate: 5
[Training] [2023-11-01T14:51:31.181754] is_mel_spectrogram: True
[Training] [2023-11-01T14:51:31.186266] ]
[Training] [2023-11-01T14:51:31.190275] is_train: True
[Training] [2023-11-01T14:51:31.196799] dist: False
[Training] [2023-11-01T14:51:31.202799]
[Training] [2023-11-01T14:51:31.208319] 23-11-01 14:51:30.376 - INFO: Random seed: 522
[Training] [2023-11-01T14:51:32.133146] 23-11-01 14:51:32.133 - INFO: Number of training data elements: 17, iters: 1
[Training] [2023-11-01T14:51:32.138661] 23-11-01 14:51:32.133 - INFO: Total epochs needed: 200 for iters 200
[Training] [2023-11-01T14:51:33.402351] C:\voicecloning\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing gradient_checkpointing to a config initialization is deprecated and will be removed in v5 Transformers. Using model.gradient_checkpointing_enable() instead, or if you are using the Trainer API, pass gradient_checkpointing=True in your TrainingArguments.
[Training] [2023-11-01T14:51:33.406865] warnings.warn(
[Training] [2023-11-01T14:51:44.558355] 23-11-01 14:51:44.558 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2023-11-01T14:51:45.722417] 23-11-01 14:51:45.707 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-11-01T14:51:48.280470] [2023-11-01 14:51:48,280] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-11-01T14:51:48.341754] [2023-11-01 14:51:48,341] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-11-01T14:51:52.106996] C:\voicecloning\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
[Training] [2023-11-01T14:51:52.107980] warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
[Training] [2023-11-01T14:51:54.674351] C:\voicecloning\ai-voice-cloning\venv\lib\site-packages\torch\utils\checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
[Training] [2023-11-01T14:51:54.674351] warnings.warn(
[Training] [2023-11-01T14:54:03.905786] Disabled distributed training.
[Training] [2023-11-01T14:54:03.905786] Path already exists. Rename it to [./training\spiderman\finetune_archived_231101-145130]
[Training] [2023-11-01T14:54:03.906913] Loading from ./models/tortoise/dvae.pth
[Training] [2023-11-01T14:54:03.906913] Traceback (most recent call last):
[Training] [2023-11-01T14:54:03.906913] File "C:\voicecloning\ai-voice-cloning\src\train.py", line 64, in
[Training] [2023-11-01T14:54:03.916383] train(config_path, args.launcher)
[Training] [2023-11-01T14:54:03.916383] File "C:\voicecloning\ai-voice-cloning\src\train.py", line 31, in train
[Training] [2023-11-01T14:54:03.916383] trainer.do_training()
[Training] [2023-11-01T14:54:03.916383] File "C:\voicecloning\ai-voice-cloning\modules\dlas\dlas\train.py", line 408, in do_training
[Training] [2023-11-01T14:54:03.917387] metric = self.do_step(train_data)
[Training] [2023-11-01T14:54:03.917387] File "C:\voicecloning\ai-voice-cloning\modules\dlas\dlas\train.py", line 271, in do_step
[Training] [2023-11-01T14:54:03.917387] gradient_norms_dict = self.model.optimize_parameters(
[Training] [2023-11-01T14:54:03.917387] File "C:\voicecloning\ai-voice-cloning\modules\dlas\dlas\trainer\ExtensibleTrainer.py", line 321, in optimize_parameters
[Training] [2023-11-01T14:54:03.918397] ns = step.do_forward_backward(
[Training] [2023-11-01T14:54:03.918397] File "C:\voicecloning\ai-voice-cloning\modules\dlas\dlas\trainer\steps.py", line 242, in do_forward_backward
[Training] [2023-11-01T14:54:03.918397] local_state[k] = v[grad_accum_step]
[Training] [2023-11-01T14:54:03.918397] IndexError: list index out of range

First of all, I need to mention that I'm a technical newbie and not a developer. I've had some experience with coding but nearly nothing as complex as data ai stuff. I've downloaded this helpful open source for simple production purposes. So the problem I'm having is when I'm running a training to clone the voice, but it simply doesn't work. Surfing this issue-reporting forum and continuously reading through the output log, I've tried to trace the reason for the error. But I can't seem to find it. It seems like the index out of range problem, but I always validate the training parameters before launching it (not before I restart the app). Here are the parameters for the training. epochs: 200 Learning rate: 0.00001 MEL LR Ratio: 1 Text LR Ratio: 0.01 Batch size: 17 Gradient: 8 Save freq: 5 Val freq: 5 Worker processes: 2 GPUs: 1 My characterstics. Python 3.9 and 3.12 (I installed both, but the PATH one is 3.9) NVIDIA 2060 RTX Spawning process: train.bat ./training/spiderman/train.yaml [Training] [2023-11-01T14:51:23.374840] [Training] [2023-11-01T14:51:23.381241] (venv) C:\voicecloning\ai-voice-cloning>call .\venv\Scripts\activate.bat [Training] [2023-11-01T14:51:27.128319] [2023-11-01 14:51:27,128] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-11-01T14:51:30.376107] 23-11-01 14:51:30.376 - INFO: name: spiderman [Training] [2023-11-01T14:51:30.382113] model: extensibletrainer [Training] [2023-11-01T14:51:30.386630] scale: 1 [Training] [2023-11-01T14:51:30.399021] gpu_ids: [0] [Training] [2023-11-01T14:51:30.403023] start_step: 0 [Training] [2023-11-01T14:51:30.409569] checkpointing_enabled: True [Training] [2023-11-01T14:51:30.414570] fp16: False [Training] [2023-11-01T14:51:30.419094] bitsandbytes: True [Training] [2023-11-01T14:51:30.424604] gpus: 1 [Training] [2023-11-01T14:51:30.431623] datasets:[ [Training] [2023-11-01T14:51:30.438834] train:[ [Training] [2023-11-01T14:51:30.443829] name: training [Training] [2023-11-01T14:51:30.448340] n_workers: 2 [Training] [2023-11-01T14:51:30.453342] batch_size: 17 [Training] [2023-11-01T14:51:30.459865] mode: paired_voice_audio [Training] [2023-11-01T14:51:30.465395] path: ./training/spiderman/train.txt [Training] [2023-11-01T14:51:30.471407] fetcher_mode: ['lj'] [Training] [2023-11-01T14:51:30.476775] phase: train [Training] [2023-11-01T14:51:30.481764] max_wav_length: 255995 [Training] [2023-11-01T14:51:30.487284] max_text_length: 200 [Training] [2023-11-01T14:51:30.492286] sample_rate: 22050 [Training] [2023-11-01T14:51:30.499808] load_conditioning: True [Training] [2023-11-01T14:51:30.507345] num_conditioning_candidates: 2 [Training] [2023-11-01T14:51:30.513342] conditioning_length: 44000 [Training] [2023-11-01T14:51:30.519874] use_bpe_tokenizer: True [Training] [2023-11-01T14:51:30.523855] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json [Training] [2023-11-01T14:51:30.529010] load_aligned_codes: False [Training] [2023-11-01T14:51:30.535537] data_type: img [Training] [2023-11-01T14:51:30.540538] ] [Training] [2023-11-01T14:51:30.545038] val:[ [Training] [2023-11-01T14:51:30.551081] name: validation [Training] [2023-11-01T14:51:30.556580] n_workers: 2 [Training] [2023-11-01T14:51:30.561575] batch_size: 0 [Training] [2023-11-01T14:51:30.566664] mode: paired_voice_audio [Training] [2023-11-01T14:51:30.571662] path: ./training/spiderman/validation.txt [Training] [2023-11-01T14:51:30.576181] fetcher_mode: ['lj'] [Training] [2023-11-01T14:51:30.581181] phase: val [Training] [2023-11-01T14:51:30.585699] max_wav_length: 255995 [Training] [2023-11-01T14:51:30.590697] max_text_length: 200 [Training] [2023-11-01T14:51:30.595881] sample_rate: 22050 [Training] [2023-11-01T14:51:30.600882] load_conditioning: True [Training] [2023-11-01T14:51:30.606003] num_conditioning_candidates: 2 [Training] [2023-11-01T14:51:30.612005] conditioning_length: 44000 [Training] [2023-11-01T14:51:30.616531] use_bpe_tokenizer: True [Training] [2023-11-01T14:51:30.620516] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json [Training] [2023-11-01T14:51:30.627081] load_aligned_codes: False [Training] [2023-11-01T14:51:30.633073] data_type: img [Training] [2023-11-01T14:51:30.637588] ] [Training] [2023-11-01T14:51:30.643591] ] [Training] [2023-11-01T14:51:30.648118] steps:[ [Training] [2023-11-01T14:51:30.653118] gpt_train:[ [Training] [2023-11-01T14:51:30.658645] training: gpt [Training] [2023-11-01T14:51:30.662645] loss_log_buffer: 500 [Training] [2023-11-01T14:51:30.669179] optimizer: adamw [Training] [2023-11-01T14:51:30.674693] optimizer_params:[ [Training] [2023-11-01T14:51:30.680703] lr: 1e-05 [Training] [2023-11-01T14:51:30.687226] weight_decay: 0.01 [Training] [2023-11-01T14:51:30.693229] beta1: 0.9 [Training] [2023-11-01T14:51:30.697744] beta2: 0.96 [Training] [2023-11-01T14:51:30.702745] ] [Training] [2023-11-01T14:51:30.709278] clip_grad_eps: 4 [Training] [2023-11-01T14:51:30.714786] injectors:[ [Training] [2023-11-01T14:51:30.719808] paired_to_mel:[ [Training] [2023-11-01T14:51:30.726339] type: torch_mel_spectrogram [Training] [2023-11-01T14:51:30.730867] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth [Training] [2023-11-01T14:51:30.738391] in: wav [Training] [2023-11-01T14:51:30.744908] out: paired_mel [Training] [2023-11-01T14:51:30.752922] ] [Training] [2023-11-01T14:51:30.759439] paired_cond_to_mel:[ [Training] [2023-11-01T14:51:30.764437] type: for_each [Training] [2023-11-01T14:51:30.769541] subtype: torch_mel_spectrogram [Training] [2023-11-01T14:51:30.774539] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth [Training] [2023-11-01T14:51:30.779776] in: conditioning [Training] [2023-11-01T14:51:30.784774] out: paired_conditioning_mel [Training] [2023-11-01T14:51:30.789967] ] [Training] [2023-11-01T14:51:30.795497] to_codes:[ [Training] [2023-11-01T14:51:30.800495] type: discrete_token [Training] [2023-11-01T14:51:30.806012] in: paired_mel [Training] [2023-11-01T14:51:30.810013] out: paired_mel_codes [Training] [2023-11-01T14:51:30.815540] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml [Training] [2023-11-01T14:51:30.821558] ] [Training] [2023-11-01T14:51:30.827080] paired_fwd_text:[ [Training] [2023-11-01T14:51:30.834598] type: generator [Training] [2023-11-01T14:51:30.838605] generator: gpt [Training] [2023-11-01T14:51:30.842608] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths'] [Training] [2023-11-01T14:51:30.848136] out: ['loss_text_ce', 'loss_mel_ce', 'logits'] [Training] [2023-11-01T14:51:30.853136] ] [Training] [2023-11-01T14:51:30.858658] ] [Training] [2023-11-01T14:51:30.863656] losses:[ [Training] [2023-11-01T14:51:30.870092] text_ce:[ [Training] [2023-11-01T14:51:30.875604] type: direct [Training] [2023-11-01T14:51:30.881601] weight: 0.01 [Training] [2023-11-01T14:51:30.886002] key: loss_text_ce [Training] [2023-11-01T14:51:30.891015] ] [Training] [2023-11-01T14:51:30.896525] mel_ce:[ [Training] [2023-11-01T14:51:30.901530] type: direct [Training] [2023-11-01T14:51:30.907051] weight: 1 [Training] [2023-11-01T14:51:30.917570] key: loss_mel_ce [Training] [2023-11-01T14:51:30.921569] ] [Training] [2023-11-01T14:51:30.927091] ] [Training] [2023-11-01T14:51:30.932090] ] [Training] [2023-11-01T14:51:30.937611] ] [Training] [2023-11-01T14:51:30.943607] networks:[ [Training] [2023-11-01T14:51:30.948127] gpt:[ [Training] [2023-11-01T14:51:30.953127] type: generator [Training] [2023-11-01T14:51:30.957645] which_model_G: unified_voice2 [Training] [2023-11-01T14:51:30.962642] kwargs:[ [Training] [2023-11-01T14:51:30.967168] layers: 30 [Training] [2023-11-01T14:51:30.972167] model_dim: 1024 [Training] [2023-11-01T14:51:30.976683] heads: 16 [Training] [2023-11-01T14:51:30.981681] max_text_tokens: 402 [Training] [2023-11-01T14:51:30.987216] max_mel_tokens: 604 [Training] [2023-11-01T14:51:30.992216] max_conditioning_inputs: 2 [Training] [2023-11-01T14:51:30.996737] mel_length_compression: 1024 [Training] [2023-11-01T14:51:31.000734] number_text_tokens: 256 [Training] [2023-11-01T14:51:31.007257] number_mel_codes: 8194 [Training] [2023-11-01T14:51:31.011257] start_mel_token: 8192 [Training] [2023-11-01T14:51:31.014761] stop_mel_token: 8193 [Training] [2023-11-01T14:51:31.019776] start_text_token: 255 [Training] [2023-11-01T14:51:31.023774] train_solo_embeddings: False [Training] [2023-11-01T14:51:31.030304] use_mel_codes_as_input: True [Training] [2023-11-01T14:51:31.034821] checkpointing: True [Training] [2023-11-01T14:51:31.039841] tortoise_compat: True [Training] [2023-11-01T14:51:31.043837] ] [Training] [2023-11-01T14:51:31.048360] ] [Training] [2023-11-01T14:51:31.052363] ] [Training] [2023-11-01T14:51:31.056884] path:[ [Training] [2023-11-01T14:51:31.060884] strict_load: True [Training] [2023-11-01T14:51:31.064394] pretrain_model_gpt: ./models/tortoise/autoregressive.pth [Training] [2023-11-01T14:51:31.070401] root: ./ [Training] [2023-11-01T14:51:31.074404] experiments_root: ./training\spiderman\finetune [Training] [2023-11-01T14:51:31.079681] models: ./training\spiderman\finetune\models [Training] [2023-11-01T14:51:31.084684] training_state: ./training\spiderman\finetune\training_state [Training] [2023-11-01T14:51:31.088205] log: ./training\spiderman\finetune [Training] [2023-11-01T14:51:31.092203] val_images: ./training\spiderman\finetune\val_images [Training] [2023-11-01T14:51:31.095719] ] [Training] [2023-11-01T14:51:31.102729] train:[ [Training] [2023-11-01T14:51:31.107246] niter: 200 [Training] [2023-11-01T14:51:31.112247] warmup_iter: -1 [Training] [2023-11-01T14:51:31.116777] mega_batch_factor: 8 [Training] [2023-11-01T14:51:31.119779] val_freq: 5 [Training] [2023-11-01T14:51:31.123779] ema_enabled: False [Training] [2023-11-01T14:51:31.127299] default_lr_scheme: MultiStepLR [Training] [2023-11-01T14:51:31.132300] gen_lr_steps: [2, 4, 9, 18, 25, 33, 50] [Training] [2023-11-01T14:51:31.135838] lr_gamma: 0.5 [Training] [2023-11-01T14:51:31.139826] ] [Training] [2023-11-01T14:51:31.144824] eval:[ [Training] [2023-11-01T14:51:31.149938] pure: False [Training] [2023-11-01T14:51:31.154993] output_state: gen [Training] [2023-11-01T14:51:31.159004] ] [Training] [2023-11-01T14:51:31.163001] logger:[ [Training] [2023-11-01T14:51:31.168537] save_checkpoint_freq: 5 [Training] [2023-11-01T14:51:31.172536] visuals: ['gen', 'mel'] [Training] [2023-11-01T14:51:31.177752] visual_debug_rate: 5 [Training] [2023-11-01T14:51:31.181754] is_mel_spectrogram: True [Training] [2023-11-01T14:51:31.186266] ] [Training] [2023-11-01T14:51:31.190275] is_train: True [Training] [2023-11-01T14:51:31.196799] dist: False [Training] [2023-11-01T14:51:31.202799] [Training] [2023-11-01T14:51:31.208319] 23-11-01 14:51:30.376 - INFO: Random seed: 522 [Training] [2023-11-01T14:51:32.133146] 23-11-01 14:51:32.133 - INFO: Number of training data elements: 17, iters: 1 [Training] [2023-11-01T14:51:32.138661] 23-11-01 14:51:32.133 - INFO: Total epochs needed: 200 for iters 200 [Training] [2023-11-01T14:51:33.402351] C:\voicecloning\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`. [Training] [2023-11-01T14:51:33.406865] warnings.warn( [Training] [2023-11-01T14:51:44.558355] 23-11-01 14:51:44.558 - INFO: Loading model for [./models/tortoise/autoregressive.pth] [Training] [2023-11-01T14:51:45.722417] 23-11-01 14:51:45.707 - INFO: Start training from epoch: 0, iter: 0 [Training] [2023-11-01T14:51:48.280470] [2023-11-01 14:51:48,280] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-11-01T14:51:48.341754] [2023-11-01 14:51:48,341] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-11-01T14:51:52.106996] C:\voicecloning\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate [Training] [2023-11-01T14:51:52.107980] warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " [Training] [2023-11-01T14:51:54.674351] C:\voicecloning\ai-voice-cloning\venv\lib\site-packages\torch\utils\checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. [Training] [2023-11-01T14:51:54.674351] warnings.warn( [Training] [2023-11-01T14:54:03.905786] Disabled distributed training. [Training] [2023-11-01T14:54:03.905786] Path already exists. Rename it to [./training\spiderman\finetune_archived_231101-145130] [Training] [2023-11-01T14:54:03.906913] Loading from ./models/tortoise/dvae.pth [Training] [2023-11-01T14:54:03.906913] Traceback (most recent call last): [Training] [2023-11-01T14:54:03.906913] File "C:\voicecloning\ai-voice-cloning\src\train.py", line 64, in <module> [Training] [2023-11-01T14:54:03.916383] train(config_path, args.launcher) [Training] [2023-11-01T14:54:03.916383] File "C:\voicecloning\ai-voice-cloning\src\train.py", line 31, in train [Training] [2023-11-01T14:54:03.916383] trainer.do_training() [Training] [2023-11-01T14:54:03.916383] File "C:\voicecloning\ai-voice-cloning\modules\dlas\dlas\train.py", line 408, in do_training [Training] [2023-11-01T14:54:03.917387] metric = self.do_step(train_data) [Training] [2023-11-01T14:54:03.917387] File "C:\voicecloning\ai-voice-cloning\modules\dlas\dlas\train.py", line 271, in do_step [Training] [2023-11-01T14:54:03.917387] gradient_norms_dict = self.model.optimize_parameters( [Training] [2023-11-01T14:54:03.917387] File "C:\voicecloning\ai-voice-cloning\modules\dlas\dlas\trainer\ExtensibleTrainer.py", line 321, in optimize_parameters [Training] [2023-11-01T14:54:03.918397] ns = step.do_forward_backward( [Training] [2023-11-01T14:54:03.918397] File "C:\voicecloning\ai-voice-cloning\modules\dlas\dlas\trainer\steps.py", line 242, in do_forward_backward [Training] [2023-11-01T14:54:03.918397] local_state[k] = v[grad_accum_step] [Training] [2023-11-01T14:54:03.918397] IndexError: list index out of range
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#436
No description provided.