IndexError: list index out of range #239

Open
opened 2023-05-16 07:06:47 +00:00 by NekoDArk · 2 comments

Hello,
Since I updated this App around 2 weeks ago, training is not working anymore. I always get "IndexError: list index out of range" when I try to start the training. This happened with multiple voices and training datasets, which worked before as well as newly created datasets. The full log is:

D:\voice\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL: http://127.0.0.1:7860
To create a public link, set share=True in launch().
Loading TorToiSe... (AR: None, diffusion: None, vocoder: bigvgan_24khz_100band)
Hardware acceleration found: cuda
Loading tokenizer JSON: D:\voice\ai-voice-cloning\modules\tortoise-tts\tortoise../tortoise/data/tokenizer.json
Loaded tokenizer
Loading autoregressive model: D:\voice\ai-voice-cloning\models\tortoise\autoregressive.pth
Loaded autoregressive model
Loaded diffusion model
Loading vocoder model: bigvgan_24khz_100band
Loading vocoder model: bigvgan_24khz_100band.pth
Removing weight norm...
Loaded vocoder model
Loaded TTS, ready for generation.
Downloading dvae.pth from 3704aea616/.models/dvae.pth...
100% |########################################################################|
Done.
Unloaded TTS
Spawning process: train.bat ./training/Neeko_2/train.yaml
[Training] [2023-05-16T09:02:15.266589]
[Training] [2023-05-16T09:02:15.271590] (venv) D:\voice\ai-voice-cloning>call .\venv\Scripts\activate.bat
[Training] [2023-05-16T09:02:17.973603] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-05-16T09:02:24.989978] 23-05-16 09:02:24.989 - INFO: name: Neeko_2
[Training] [2023-05-16T09:02:24.993980] model: extensibletrainer
[Training] [2023-05-16T09:02:24.996995] scale: 1
[Training] [2023-05-16T09:02:25.000982] gpu_ids: [0]
[Training] [2023-05-16T09:02:25.004982] start_step: 0
[Training] [2023-05-16T09:02:25.008985] checkpointing_enabled: True
[Training] [2023-05-16T09:02:25.011984] fp16: False
[Training] [2023-05-16T09:02:25.015985] bitsandbytes: True
[Training] [2023-05-16T09:02:25.018985] gpus: 1
[Training] [2023-05-16T09:02:25.022986] datasets:[
[Training] [2023-05-16T09:02:25.026988] train:[
[Training] [2023-05-16T09:02:25.029989] name: training
[Training] [2023-05-16T09:02:25.033990] n_workers: 2
[Training] [2023-05-16T09:02:25.036990] batch_size: 71
[Training] [2023-05-16T09:02:25.039990] mode: paired_voice_audio
[Training] [2023-05-16T09:02:25.043991] path: ./training/Neeko_2/train.txt
[Training] [2023-05-16T09:02:25.046992] fetcher_mode: ['lj']
[Training] [2023-05-16T09:02:25.049993] phase: train
[Training] [2023-05-16T09:02:25.052993] max_wav_length: 255995
[Training] [2023-05-16T09:02:25.056994] max_text_length: 200
[Training] [2023-05-16T09:02:25.059995] sample_rate: 22050
[Training] [2023-05-16T09:02:25.062995] load_conditioning: True
[Training] [2023-05-16T09:02:25.065996] num_conditioning_candidates: 2
[Training] [2023-05-16T09:02:25.068997] conditioning_length: 44000
[Training] [2023-05-16T09:02:25.071998] use_bpe_tokenizer: True
[Training] [2023-05-16T09:02:25.073998] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-05-16T09:02:25.079000] load_aligned_codes: False
[Training] [2023-05-16T09:02:25.082000] data_type: img
[Training] [2023-05-16T09:02:25.085001] ]
[Training] [2023-05-16T09:02:25.089003] val:[
[Training] [2023-05-16T09:02:25.092003] name: validation
[Training] [2023-05-16T09:02:25.095003] n_workers: 2
[Training] [2023-05-16T09:02:25.098003] batch_size: 4
[Training] [2023-05-16T09:02:25.101004] mode: paired_voice_audio
[Training] [2023-05-16T09:02:25.105005] path: ./training/Neeko_2/validation.txt
[Training] [2023-05-16T09:02:25.107006] fetcher_mode: ['lj']
[Training] [2023-05-16T09:02:25.110007] phase: val
[Training] [2023-05-16T09:02:25.112007] max_wav_length: 255995
[Training] [2023-05-16T09:02:25.115007] max_text_length: 200
[Training] [2023-05-16T09:02:25.118008] sample_rate: 22050
[Training] [2023-05-16T09:02:25.121009] load_conditioning: True
[Training] [2023-05-16T09:02:25.124009] num_conditioning_candidates: 2
[Training] [2023-05-16T09:02:25.128010] conditioning_length: 44000
[Training] [2023-05-16T09:02:25.131011] use_bpe_tokenizer: True
[Training] [2023-05-16T09:02:25.134012] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-05-16T09:02:25.138013] load_aligned_codes: False
[Training] [2023-05-16T09:02:25.141013] data_type: img
[Training] [2023-05-16T09:02:25.144014] ]
[Training] [2023-05-16T09:02:25.148015] ]
[Training] [2023-05-16T09:02:25.150015] steps:[
[Training] [2023-05-16T09:02:25.153016] gpt_train:[
[Training] [2023-05-16T09:02:25.156017] training: gpt
[Training] [2023-05-16T09:02:25.160017] loss_log_buffer: 500
[Training] [2023-05-16T09:02:25.162018] optimizer: adamw
[Training] [2023-05-16T09:02:25.165019] optimizer_params:[
[Training] [2023-05-16T09:02:25.169019] lr: 0.0001
[Training] [2023-05-16T09:02:25.171020] weight_decay: 0.01
[Training] [2023-05-16T09:02:25.174021] beta1: 0.9
[Training] [2023-05-16T09:02:25.177021] beta2: 0.96
[Training] [2023-05-16T09:02:25.180023] ]
[Training] [2023-05-16T09:02:25.184023] clip_grad_eps: 4
[Training] [2023-05-16T09:02:25.188024] injectors:[
[Training] [2023-05-16T09:02:25.192025] paired_to_mel:[
[Training] [2023-05-16T09:02:25.195025] type: torch_mel_spectrogram
[Training] [2023-05-16T09:02:25.198026] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-05-16T09:02:25.202027] in: wav
[Training] [2023-05-16T09:02:25.204028] out: paired_mel
[Training] [2023-05-16T09:02:25.207029] ]
[Training] [2023-05-16T09:02:25.211029] paired_cond_to_mel:[
[Training] [2023-05-16T09:02:25.215030] type: for_each
[Training] [2023-05-16T09:02:25.218031] subtype: torch_mel_spectrogram
[Training] [2023-05-16T09:02:25.222032] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-05-16T09:02:25.225032] in: conditioning
[Training] [2023-05-16T09:02:25.229033] out: paired_conditioning_mel
[Training] [2023-05-16T09:02:25.232034] ]
[Training] [2023-05-16T09:02:25.236035] to_codes:[
[Training] [2023-05-16T09:02:25.238037] type: discrete_token
[Training] [2023-05-16T09:02:25.241037] in: paired_mel
[Training] [2023-05-16T09:02:25.244037] out: paired_mel_codes
[Training] [2023-05-16T09:02:25.246038] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml
[Training] [2023-05-16T09:02:25.249038] ]
[Training] [2023-05-16T09:02:25.252039] paired_fwd_text:[
[Training] [2023-05-16T09:02:25.254040] type: generator
[Training] [2023-05-16T09:02:25.257040] generator: gpt
[Training] [2023-05-16T09:02:25.260040] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
[Training] [2023-05-16T09:02:25.263041] out: ['loss_text_ce', 'loss_mel_ce', 'logits']
[Training] [2023-05-16T09:02:25.265041] ]
[Training] [2023-05-16T09:02:25.268042] ]
[Training] [2023-05-16T09:02:25.271042] losses:[
[Training] [2023-05-16T09:02:25.274043] text_ce:[
[Training] [2023-05-16T09:02:25.277044] type: direct
[Training] [2023-05-16T09:02:25.279044] weight: 0.01
[Training] [2023-05-16T09:02:25.282045] key: loss_text_ce
[Training] [2023-05-16T09:02:25.285047] ]
[Training] [2023-05-16T09:02:25.288048] mel_ce:[
[Training] [2023-05-16T09:02:25.291047] type: direct
[Training] [2023-05-16T09:02:25.296049] weight: 1
[Training] [2023-05-16T09:02:25.299049] key: loss_mel_ce
[Training] [2023-05-16T09:02:25.302050] ]
[Training] [2023-05-16T09:02:25.305051] ]
[Training] [2023-05-16T09:02:25.308053] ]
[Training] [2023-05-16T09:02:25.311052] ]
[Training] [2023-05-16T09:02:25.316054] networks:[
[Training] [2023-05-16T09:02:25.319054] gpt:[
[Training] [2023-05-16T09:02:25.324055] type: generator
[Training] [2023-05-16T09:02:25.331057] which_model_G: unified_voice2
[Training] [2023-05-16T09:02:25.335058] kwargs:[
[Training] [2023-05-16T09:02:25.339058] layers: 30
[Training] [2023-05-16T09:02:25.345060] model_dim: 1024
[Training] [2023-05-16T09:02:25.350060] heads: 16
[Training] [2023-05-16T09:02:25.355062] max_text_tokens: 402
[Training] [2023-05-16T09:02:25.358064] max_mel_tokens: 604
[Training] [2023-05-16T09:02:25.363064] max_conditioning_inputs: 2
[Training] [2023-05-16T09:02:25.366064] mel_length_compression: 1024
[Training] [2023-05-16T09:02:25.369065] number_text_tokens: 256
[Training] [2023-05-16T09:02:25.371065] number_mel_codes: 8194
[Training] [2023-05-16T09:02:25.374066] start_mel_token: 8192
[Training] [2023-05-16T09:02:25.378067] stop_mel_token: 8193
[Training] [2023-05-16T09:02:25.382068] start_text_token: 255
[Training] [2023-05-16T09:02:25.385069] train_solo_embeddings: False
[Training] [2023-05-16T09:02:25.388070] use_mel_codes_as_input: True
[Training] [2023-05-16T09:02:25.391071] checkpointing: True
[Training] [2023-05-16T09:02:25.395071] tortoise_compat: True
[Training] [2023-05-16T09:02:25.398072] ]
[Training] [2023-05-16T09:02:25.400072] ]
[Training] [2023-05-16T09:02:25.403073] ]
[Training] [2023-05-16T09:02:25.406073] path:[
[Training] [2023-05-16T09:02:25.409074] strict_load: True
[Training] [2023-05-16T09:02:25.413075] pretrain_model_gpt: ./models/tortoise/autoregressive.pth
[Training] [2023-05-16T09:02:25.417076] root: ./
[Training] [2023-05-16T09:02:25.421078] experiments_root: ./training\Neeko_2\finetune
[Training] [2023-05-16T09:02:25.425078] models: ./training\Neeko_2\finetune\models
[Training] [2023-05-16T09:02:25.428078] training_state: ./training\Neeko_2\finetune\training_state
[Training] [2023-05-16T09:02:25.431079] log: ./training\Neeko_2\finetune
[Training] [2023-05-16T09:02:25.433079] val_images: ./training\Neeko_2\finetune\val_images
[Training] [2023-05-16T09:02:25.435080] ]
[Training] [2023-05-16T09:02:25.439082] train:[
[Training] [2023-05-16T09:02:25.442082] niter: 500
[Training] [2023-05-16T09:02:25.445083] warmup_iter: -1
[Training] [2023-05-16T09:02:25.447083] mega_batch_factor: 17
[Training] [2023-05-16T09:02:25.451083] val_freq: 5
[Training] [2023-05-16T09:02:25.454085] ema_enabled: False
[Training] [2023-05-16T09:02:25.456085] default_lr_scheme: MultiStepLR
[Training] [2023-05-16T09:02:25.459085] gen_lr_steps: [2, 4, 9, 18, 25, 33, 50]
[Training] [2023-05-16T09:02:25.461085] lr_gamma: 0.5
[Training] [2023-05-16T09:02:25.464086] ]
[Training] [2023-05-16T09:02:25.468087] eval:[
[Training] [2023-05-16T09:02:25.470087] pure: False
[Training] [2023-05-16T09:02:25.474089] output_state: gen
[Training] [2023-05-16T09:02:25.476089] ]
[Training] [2023-05-16T09:02:25.480091] logger:[
[Training] [2023-05-16T09:02:25.483091] save_checkpoint_freq: 5
[Training] [2023-05-16T09:02:25.485091] visuals: ['gen', 'mel']
[Training] [2023-05-16T09:02:25.489093] visual_debug_rate: 5
[Training] [2023-05-16T09:02:25.492093] is_mel_spectrogram: True
[Training] [2023-05-16T09:02:25.495094] ]
[Training] [2023-05-16T09:02:25.497094] is_train: True
[Training] [2023-05-16T09:02:25.500095] dist: False
[Training] [2023-05-16T09:02:25.503096]
[Training] [2023-05-16T09:02:25.505097] 23-05-16 09:02:24.989 - INFO: Random seed: 185
[Training] [2023-05-16T09:02:26.690350] 23-05-16 09:02:26.690 - INFO: Number of training data elements: 71, iters: 1
[Training] [2023-05-16T09:02:26.694350] 23-05-16 09:02:26.690 - INFO: Total epochs needed: 500 for iters 500
[Training] [2023-05-16T09:02:35.598365] D:\voice\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing gradient_checkpointing to a config initialization is deprecated and will be removed in v5 Transformers. Using model.gradient_checkpointing_enable() instead, or if you are using the Trainer API, pass gradient_checkpointing=True in your TrainingArguments.
[Training] [2023-05-16T09:02:35.602365] warnings.warn(
[Training] [2023-05-16T09:02:44.044487] 23-05-16 09:02:44.044 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2023-05-16T09:02:44.830665] 23-05-16 09:02:44.829 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-05-16T09:02:47.048611] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-05-16T09:02:49.394141] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-05-16T09:02:51.181547] D:\voice\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
[Training] [2023-05-16T09:02:51.182546] warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
[Training] [2023-05-16T09:03:11.201538] Disabled distributed training.
[Training] [2023-05-16T09:03:11.201538] Path already exists. Rename it to [./training\Neeko_2\finetune_archived_230516-090224]
[Training] [2023-05-16T09:03:11.202537] Loading from ./models/tortoise/dvae.pth
[Training] [2023-05-16T09:03:11.203538] Traceback (most recent call last):
[Training] [2023-05-16T09:03:11.204538] File "D:\voice\ai-voice-cloning\src\train.py", line 64, in
[Training] [2023-05-16T09:03:11.204538] train(config_path, args.launcher)
[Training] [2023-05-16T09:03:11.204538] File "D:\voice\ai-voice-cloning\src\train.py", line 31, in train
[Training] [2023-05-16T09:03:11.205538] trainer.do_training()
[Training] [2023-05-16T09:03:11.205538] File "d:\voice\ai-voice-cloning\modules\dlas\dlas\train.py", line 408, in do_training
[Training] [2023-05-16T09:03:11.206538] metric = self.do_step(train_data)
[Training] [2023-05-16T09:03:11.206538] File "d:\voice\ai-voice-cloning\modules\dlas\dlas\train.py", line 271, in do_step
[Training] [2023-05-16T09:03:11.207538] gradient_norms_dict = self.model.optimize_parameters(
[Training] [2023-05-16T09:03:11.207538] File "d:\voice\ai-voice-cloning\modules\dlas\dlas\trainer\ExtensibleTrainer.py", line 321, in optimize_parameters
[Training] [2023-05-16T09:03:11.208539] ns = step.do_forward_backward(
[Training] [2023-05-16T09:03:11.208539] File "d:\voice\ai-voice-cloning\modules\dlas\dlas\trainer\steps.py", line 242, in do_forward_backward
[Training] [2023-05-16T09:03:11.209539] local_state[k] = v[grad_accum_step]
[Training] [2023-05-16T09:03:11.209539] IndexError: list index out of range`

Hello, Since I updated this App around 2 weeks ago, training is not working anymore. I always get "IndexError: list index out of range" when I try to start the training. This happened with multiple voices and training datasets, which worked before as well as newly created datasets. The full log is: > D:\voice\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Loading TorToiSe... (AR: None, diffusion: None, vocoder: bigvgan_24khz_100band) Hardware acceleration found: cuda Loading tokenizer JSON: D:\voice\ai-voice-cloning\modules\tortoise-tts\tortoise\../tortoise/data/tokenizer.json Loaded tokenizer Loading autoregressive model: D:\voice\ai-voice-cloning\models\tortoise\autoregressive.pth Loaded autoregressive model Loaded diffusion model Loading vocoder model: bigvgan_24khz_100band Loading vocoder model: bigvgan_24khz_100band.pth Removing weight norm... Loaded vocoder model Loaded TTS, ready for generation. Downloading dvae.pth from https://huggingface.co/jbetker/tortoise-tts-v2/resolve/3704aea61678e7e468a06d8eea121dba368a798e/.models/dvae.pth... 100% |########################################################################| Done. Unloaded TTS Spawning process: train.bat ./training/Neeko_2/train.yaml [Training] [2023-05-16T09:02:15.266589] [Training] [2023-05-16T09:02:15.271590] (venv) D:\voice\ai-voice-cloning>call .\venv\Scripts\activate.bat [Training] [2023-05-16T09:02:17.973603] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-05-16T09:02:24.989978] 23-05-16 09:02:24.989 - INFO: name: Neeko_2 [Training] [2023-05-16T09:02:24.993980] model: extensibletrainer [Training] [2023-05-16T09:02:24.996995] scale: 1 [Training] [2023-05-16T09:02:25.000982] gpu_ids: [0] [Training] [2023-05-16T09:02:25.004982] start_step: 0 [Training] [2023-05-16T09:02:25.008985] checkpointing_enabled: True [Training] [2023-05-16T09:02:25.011984] fp16: False [Training] [2023-05-16T09:02:25.015985] bitsandbytes: True [Training] [2023-05-16T09:02:25.018985] gpus: 1 [Training] [2023-05-16T09:02:25.022986] datasets:[ [Training] [2023-05-16T09:02:25.026988] train:[ [Training] [2023-05-16T09:02:25.029989] name: training [Training] [2023-05-16T09:02:25.033990] n_workers: 2 [Training] [2023-05-16T09:02:25.036990] batch_size: 71 [Training] [2023-05-16T09:02:25.039990] mode: paired_voice_audio [Training] [2023-05-16T09:02:25.043991] path: ./training/Neeko_2/train.txt [Training] [2023-05-16T09:02:25.046992] fetcher_mode: ['lj'] [Training] [2023-05-16T09:02:25.049993] phase: train [Training] [2023-05-16T09:02:25.052993] max_wav_length: 255995 [Training] [2023-05-16T09:02:25.056994] max_text_length: 200 [Training] [2023-05-16T09:02:25.059995] sample_rate: 22050 [Training] [2023-05-16T09:02:25.062995] load_conditioning: True [Training] [2023-05-16T09:02:25.065996] num_conditioning_candidates: 2 [Training] [2023-05-16T09:02:25.068997] conditioning_length: 44000 [Training] [2023-05-16T09:02:25.071998] use_bpe_tokenizer: True [Training] [2023-05-16T09:02:25.073998] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json [Training] [2023-05-16T09:02:25.079000] load_aligned_codes: False [Training] [2023-05-16T09:02:25.082000] data_type: img [Training] [2023-05-16T09:02:25.085001] ] [Training] [2023-05-16T09:02:25.089003] val:[ [Training] [2023-05-16T09:02:25.092003] name: validation [Training] [2023-05-16T09:02:25.095003] n_workers: 2 [Training] [2023-05-16T09:02:25.098003] batch_size: 4 [Training] [2023-05-16T09:02:25.101004] mode: paired_voice_audio [Training] [2023-05-16T09:02:25.105005] path: ./training/Neeko_2/validation.txt [Training] [2023-05-16T09:02:25.107006] fetcher_mode: ['lj'] [Training] [2023-05-16T09:02:25.110007] phase: val [Training] [2023-05-16T09:02:25.112007] max_wav_length: 255995 [Training] [2023-05-16T09:02:25.115007] max_text_length: 200 [Training] [2023-05-16T09:02:25.118008] sample_rate: 22050 [Training] [2023-05-16T09:02:25.121009] load_conditioning: True [Training] [2023-05-16T09:02:25.124009] num_conditioning_candidates: 2 [Training] [2023-05-16T09:02:25.128010] conditioning_length: 44000 [Training] [2023-05-16T09:02:25.131011] use_bpe_tokenizer: True [Training] [2023-05-16T09:02:25.134012] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json [Training] [2023-05-16T09:02:25.138013] load_aligned_codes: False [Training] [2023-05-16T09:02:25.141013] data_type: img [Training] [2023-05-16T09:02:25.144014] ] [Training] [2023-05-16T09:02:25.148015] ] [Training] [2023-05-16T09:02:25.150015] steps:[ [Training] [2023-05-16T09:02:25.153016] gpt_train:[ [Training] [2023-05-16T09:02:25.156017] training: gpt [Training] [2023-05-16T09:02:25.160017] loss_log_buffer: 500 [Training] [2023-05-16T09:02:25.162018] optimizer: adamw [Training] [2023-05-16T09:02:25.165019] optimizer_params:[ [Training] [2023-05-16T09:02:25.169019] lr: 0.0001 [Training] [2023-05-16T09:02:25.171020] weight_decay: 0.01 [Training] [2023-05-16T09:02:25.174021] beta1: 0.9 [Training] [2023-05-16T09:02:25.177021] beta2: 0.96 [Training] [2023-05-16T09:02:25.180023] ] [Training] [2023-05-16T09:02:25.184023] clip_grad_eps: 4 [Training] [2023-05-16T09:02:25.188024] injectors:[ [Training] [2023-05-16T09:02:25.192025] paired_to_mel:[ [Training] [2023-05-16T09:02:25.195025] type: torch_mel_spectrogram [Training] [2023-05-16T09:02:25.198026] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth [Training] [2023-05-16T09:02:25.202027] in: wav [Training] [2023-05-16T09:02:25.204028] out: paired_mel [Training] [2023-05-16T09:02:25.207029] ] [Training] [2023-05-16T09:02:25.211029] paired_cond_to_mel:[ [Training] [2023-05-16T09:02:25.215030] type: for_each [Training] [2023-05-16T09:02:25.218031] subtype: torch_mel_spectrogram [Training] [2023-05-16T09:02:25.222032] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth [Training] [2023-05-16T09:02:25.225032] in: conditioning [Training] [2023-05-16T09:02:25.229033] out: paired_conditioning_mel [Training] [2023-05-16T09:02:25.232034] ] [Training] [2023-05-16T09:02:25.236035] to_codes:[ [Training] [2023-05-16T09:02:25.238037] type: discrete_token [Training] [2023-05-16T09:02:25.241037] in: paired_mel [Training] [2023-05-16T09:02:25.244037] out: paired_mel_codes [Training] [2023-05-16T09:02:25.246038] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml [Training] [2023-05-16T09:02:25.249038] ] [Training] [2023-05-16T09:02:25.252039] paired_fwd_text:[ [Training] [2023-05-16T09:02:25.254040] type: generator [Training] [2023-05-16T09:02:25.257040] generator: gpt [Training] [2023-05-16T09:02:25.260040] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths'] [Training] [2023-05-16T09:02:25.263041] out: ['loss_text_ce', 'loss_mel_ce', 'logits'] [Training] [2023-05-16T09:02:25.265041] ] [Training] [2023-05-16T09:02:25.268042] ] [Training] [2023-05-16T09:02:25.271042] losses:[ [Training] [2023-05-16T09:02:25.274043] text_ce:[ [Training] [2023-05-16T09:02:25.277044] type: direct [Training] [2023-05-16T09:02:25.279044] weight: 0.01 [Training] [2023-05-16T09:02:25.282045] key: loss_text_ce [Training] [2023-05-16T09:02:25.285047] ] [Training] [2023-05-16T09:02:25.288048] mel_ce:[ [Training] [2023-05-16T09:02:25.291047] type: direct [Training] [2023-05-16T09:02:25.296049] weight: 1 [Training] [2023-05-16T09:02:25.299049] key: loss_mel_ce [Training] [2023-05-16T09:02:25.302050] ] [Training] [2023-05-16T09:02:25.305051] ] [Training] [2023-05-16T09:02:25.308053] ] [Training] [2023-05-16T09:02:25.311052] ] [Training] [2023-05-16T09:02:25.316054] networks:[ [Training] [2023-05-16T09:02:25.319054] gpt:[ [Training] [2023-05-16T09:02:25.324055] type: generator [Training] [2023-05-16T09:02:25.331057] which_model_G: unified_voice2 [Training] [2023-05-16T09:02:25.335058] kwargs:[ [Training] [2023-05-16T09:02:25.339058] layers: 30 [Training] [2023-05-16T09:02:25.345060] model_dim: 1024 [Training] [2023-05-16T09:02:25.350060] heads: 16 [Training] [2023-05-16T09:02:25.355062] max_text_tokens: 402 [Training] [2023-05-16T09:02:25.358064] max_mel_tokens: 604 [Training] [2023-05-16T09:02:25.363064] max_conditioning_inputs: 2 [Training] [2023-05-16T09:02:25.366064] mel_length_compression: 1024 [Training] [2023-05-16T09:02:25.369065] number_text_tokens: 256 [Training] [2023-05-16T09:02:25.371065] number_mel_codes: 8194 [Training] [2023-05-16T09:02:25.374066] start_mel_token: 8192 [Training] [2023-05-16T09:02:25.378067] stop_mel_token: 8193 [Training] [2023-05-16T09:02:25.382068] start_text_token: 255 [Training] [2023-05-16T09:02:25.385069] train_solo_embeddings: False [Training] [2023-05-16T09:02:25.388070] use_mel_codes_as_input: True [Training] [2023-05-16T09:02:25.391071] checkpointing: True [Training] [2023-05-16T09:02:25.395071] tortoise_compat: True [Training] [2023-05-16T09:02:25.398072] ] [Training] [2023-05-16T09:02:25.400072] ] [Training] [2023-05-16T09:02:25.403073] ] [Training] [2023-05-16T09:02:25.406073] path:[ [Training] [2023-05-16T09:02:25.409074] strict_load: True [Training] [2023-05-16T09:02:25.413075] pretrain_model_gpt: ./models/tortoise/autoregressive.pth [Training] [2023-05-16T09:02:25.417076] root: ./ [Training] [2023-05-16T09:02:25.421078] experiments_root: ./training\Neeko_2\finetune [Training] [2023-05-16T09:02:25.425078] models: ./training\Neeko_2\finetune\models [Training] [2023-05-16T09:02:25.428078] training_state: ./training\Neeko_2\finetune\training_state [Training] [2023-05-16T09:02:25.431079] log: ./training\Neeko_2\finetune [Training] [2023-05-16T09:02:25.433079] val_images: ./training\Neeko_2\finetune\val_images [Training] [2023-05-16T09:02:25.435080] ] [Training] [2023-05-16T09:02:25.439082] train:[ [Training] [2023-05-16T09:02:25.442082] niter: 500 [Training] [2023-05-16T09:02:25.445083] warmup_iter: -1 [Training] [2023-05-16T09:02:25.447083] mega_batch_factor: 17 [Training] [2023-05-16T09:02:25.451083] val_freq: 5 [Training] [2023-05-16T09:02:25.454085] ema_enabled: False [Training] [2023-05-16T09:02:25.456085] default_lr_scheme: MultiStepLR [Training] [2023-05-16T09:02:25.459085] gen_lr_steps: [2, 4, 9, 18, 25, 33, 50] [Training] [2023-05-16T09:02:25.461085] lr_gamma: 0.5 [Training] [2023-05-16T09:02:25.464086] ] [Training] [2023-05-16T09:02:25.468087] eval:[ [Training] [2023-05-16T09:02:25.470087] pure: False [Training] [2023-05-16T09:02:25.474089] output_state: gen [Training] [2023-05-16T09:02:25.476089] ] [Training] [2023-05-16T09:02:25.480091] logger:[ [Training] [2023-05-16T09:02:25.483091] save_checkpoint_freq: 5 [Training] [2023-05-16T09:02:25.485091] visuals: ['gen', 'mel'] [Training] [2023-05-16T09:02:25.489093] visual_debug_rate: 5 [Training] [2023-05-16T09:02:25.492093] is_mel_spectrogram: True [Training] [2023-05-16T09:02:25.495094] ] [Training] [2023-05-16T09:02:25.497094] is_train: True [Training] [2023-05-16T09:02:25.500095] dist: False [Training] [2023-05-16T09:02:25.503096] [Training] [2023-05-16T09:02:25.505097] 23-05-16 09:02:24.989 - INFO: Random seed: 185 [Training] [2023-05-16T09:02:26.690350] 23-05-16 09:02:26.690 - INFO: Number of training data elements: 71, iters: 1 [Training] [2023-05-16T09:02:26.694350] 23-05-16 09:02:26.690 - INFO: Total epochs needed: 500 for iters 500 [Training] [2023-05-16T09:02:35.598365] D:\voice\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`. [Training] [2023-05-16T09:02:35.602365] warnings.warn( [Training] [2023-05-16T09:02:44.044487] 23-05-16 09:02:44.044 - INFO: Loading model for [./models/tortoise/autoregressive.pth] [Training] [2023-05-16T09:02:44.830665] 23-05-16 09:02:44.829 - INFO: Start training from epoch: 0, iter: 0 [Training] [2023-05-16T09:02:47.048611] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-05-16T09:02:49.394141] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-05-16T09:02:51.181547] D:\voice\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate [Training] [2023-05-16T09:02:51.182546] warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " [Training] [2023-05-16T09:03:11.201538] Disabled distributed training. [Training] [2023-05-16T09:03:11.201538] Path already exists. Rename it to [./training\Neeko_2\finetune_archived_230516-090224] [Training] [2023-05-16T09:03:11.202537] Loading from ./models/tortoise/dvae.pth [Training] [2023-05-16T09:03:11.203538] Traceback (most recent call last): [Training] [2023-05-16T09:03:11.204538] File "D:\voice\ai-voice-cloning\src\train.py", line 64, in <module> [Training] [2023-05-16T09:03:11.204538] train(config_path, args.launcher) [Training] [2023-05-16T09:03:11.204538] File "D:\voice\ai-voice-cloning\src\train.py", line 31, in train [Training] [2023-05-16T09:03:11.205538] trainer.do_training() [Training] [2023-05-16T09:03:11.205538] File "d:\voice\ai-voice-cloning\modules\dlas\dlas\train.py", line 408, in do_training [Training] [2023-05-16T09:03:11.206538] metric = self.do_step(train_data) [Training] [2023-05-16T09:03:11.206538] File "d:\voice\ai-voice-cloning\modules\dlas\dlas\train.py", line 271, in do_step [Training] [2023-05-16T09:03:11.207538] gradient_norms_dict = self.model.optimize_parameters( [Training] [2023-05-16T09:03:11.207538] File "d:\voice\ai-voice-cloning\modules\dlas\dlas\trainer\ExtensibleTrainer.py", line 321, in optimize_parameters [Training] [2023-05-16T09:03:11.208539] ns = step.do_forward_backward( [Training] [2023-05-16T09:03:11.208539] File "d:\voice\ai-voice-cloning\modules\dlas\dlas\trainer\steps.py", line 242, in do_forward_backward [Training] [2023-05-16T09:03:11.209539] local_state[k] = v[grad_accum_step] [Training] [2023-05-16T09:03:11.209539] IndexError: list index out of range`
Owner

Your batch size isn't evenly divisible by your gradient accumulation size. Stick to even numbers for both values.

Your batch size isn't evenly divisible by your gradient accumulation size. Stick to even numbers for both values.

Your batch size isn't evenly divisible by your gradient accumulation size. Stick to even numbers for both values.

I got the same issue today after updating for the first time in a while. Is this even divisibility constraint something we could add to the "Validate Training Configuration" option when it determines our recommended settings?

> Your batch size isn't evenly divisible by your gradient accumulation size. Stick to even numbers for both values. I got the same issue today after updating for the first time in a while. Is this even divisibility constraint something we could add to the "Validate Training Configuration" option when it determines our recommended settings?
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#239
No description provided.