training #295

Open
opened 2023-07-05 20:53:43 +00:00 by Gura_Shark · 5 comments

suddenly says "error"

Spawning process: train.bat ./training/white mask varre/train.yaml
[Training] [2023-07-05T22:34:14.925564]
[Training] [2023-07-05T22:34:14.929151] (venv) C:\Users\A\Desktop\T TTS\ai-voice-cloning>call .\venv\Scripts\activate.bat
[Training] [2023-07-05T22:34:16.715450] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-07-05T22:34:17.738527] 23-07-05 22:34:17.738 - INFO: name: white mask varre
[Training] [2023-07-05T22:34:17.741528] model: extensibletrainer
[Training] [2023-07-05T22:34:17.744039] scale: 1
[Training] [2023-07-05T22:34:17.746036] gpu_ids: [0]
[Training] [2023-07-05T22:34:17.748058] start_step: 0
[Training] [2023-07-05T22:34:17.750056] checkpointing_enabled: True
[Training] [2023-07-05T22:34:17.752576] fp16: False
[Training] [2023-07-05T22:34:17.753575] bitsandbytes: True
[Training] [2023-07-05T22:34:17.756082] gpus: 1
[Training] [2023-07-05T22:34:17.758094] datasets:[
[Training] [2023-07-05T22:34:17.759086] train:[
[Training] [2023-07-05T22:34:17.761598] name: training
[Training] [2023-07-05T22:34:17.763595] n_workers: 2
[Training] [2023-07-05T22:34:17.764992] batch_size: 102
[Training] [2023-07-05T22:34:17.766985] mode: paired_voice_audio
[Training] [2023-07-05T22:34:17.768989] path: ./training/white mask varre/train.txt
[Training] [2023-07-05T22:34:17.769990] fetcher_mode: ['lj']
[Training] [2023-07-05T22:34:17.772172] phase: train
[Training] [2023-07-05T22:34:17.774170] max_wav_length: 255995
[Training] [2023-07-05T22:34:17.775679] max_text_length: 200
[Training] [2023-07-05T22:34:17.777677] sample_rate: 22050
[Training] [2023-07-05T22:34:17.779183] load_conditioning: True
[Training] [2023-07-05T22:34:17.781693] num_conditioning_candidates: 2
[Training] [2023-07-05T22:34:17.782690] conditioning_length: 44000
[Training] [2023-07-05T22:34:17.785200] use_bpe_tokenizer: True
[Training] [2023-07-05T22:34:17.787198] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-07-05T22:34:17.788718] load_aligned_codes: False
[Training] [2023-07-05T22:34:17.790715] data_type: img
[Training] [2023-07-05T22:34:17.792718] ]
[Training] [2023-07-05T22:34:17.795228] val:[
[Training] [2023-07-05T22:34:17.796226] name: validation
[Training] [2023-07-05T22:34:17.798736] n_workers: 2
[Training] [2023-07-05T22:34:17.800734] batch_size: 0
[Training] [2023-07-05T22:34:17.802739] mode: paired_voice_audio
[Training] [2023-07-05T22:34:17.805251] path: ./training/white mask varre/validation.txt
[Training] [2023-07-05T22:34:17.806247] fetcher_mode: ['lj']
[Training] [2023-07-05T22:34:17.808758] phase: val
[Training] [2023-07-05T22:34:17.810756] max_wav_length: 255995
[Training] [2023-07-05T22:34:17.812778] max_text_length: 200
[Training] [2023-07-05T22:34:17.815286] sample_rate: 22050
[Training] [2023-07-05T22:34:17.817284] load_conditioning: True
[Training] [2023-07-05T22:34:17.818797] num_conditioning_candidates: 2
[Training] [2023-07-05T22:34:17.820793] conditioning_length: 44000
[Training] [2023-07-05T22:34:17.822797] use_bpe_tokenizer: True
[Training] [2023-07-05T22:34:17.825306] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-07-05T22:34:17.826304] load_aligned_codes: False
[Training] [2023-07-05T22:34:17.828813] data_type: img
[Training] [2023-07-05T22:34:17.830812] ]
[Training] [2023-07-05T22:34:17.832815] ]
[Training] [2023-07-05T22:34:17.833816] steps:[
[Training] [2023-07-05T22:34:17.835818] gpt_train:[
[Training] [2023-07-05T22:34:17.838325] training: gpt
[Training] [2023-07-05T22:34:17.840829] loss_log_buffer: 500
[Training] [2023-07-05T22:34:17.841845] optimizer: adamw
[Training] [2023-07-05T22:34:17.843843] optimizer_params:[
[Training] [2023-07-05T22:34:17.845351] lr: 0.0001
[Training] [2023-07-05T22:34:17.847352] weight_decay: 0.01
[Training] [2023-07-05T22:34:17.849393] beta1: 0.9
[Training] [2023-07-05T22:34:17.851900] beta2: 0.96
[Training] [2023-07-05T22:34:17.853413] ]
[Training] [2023-07-05T22:34:17.855409] clip_grad_eps: 4
[Training] [2023-07-05T22:34:17.857915] injectors:[
[Training] [2023-07-05T22:34:17.858916] paired_to_mel:[
[Training] [2023-07-05T22:34:17.861426] type: torch_mel_spectrogram
[Training] [2023-07-05T22:34:17.862427] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-07-05T22:34:17.864933] in: wav
[Training] [2023-07-05T22:34:17.866935] out: paired_mel
[Training] [2023-07-05T22:34:17.868937] ]
[Training] [2023-07-05T22:34:17.870444] paired_cond_to_mel:[
[Training] [2023-07-05T22:34:17.872463] type: for_each
[Training] [2023-07-05T22:34:17.874461] subtype: torch_mel_spectrogram
[Training] [2023-07-05T22:34:17.875969] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-07-05T22:34:17.877969] in: conditioning
[Training] [2023-07-05T22:34:17.879474] out: paired_conditioning_mel
[Training] [2023-07-05T22:34:17.881988] ]
[Training] [2023-07-05T22:34:17.884002] to_codes:[
[Training] [2023-07-05T22:34:17.886004] type: discrete_token
[Training] [2023-07-05T22:34:17.888513] in: paired_mel
[Training] [2023-07-05T22:34:17.889511] out: paired_mel_codes
[Training] [2023-07-05T22:34:17.892024] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml
[Training] [2023-07-05T22:34:17.894035] ]
[Training] [2023-07-05T22:34:17.896038] paired_fwd_text:[
[Training] [2023-07-05T22:34:17.897544] type: generator
[Training] [2023-07-05T22:34:17.899546] generator: gpt
[Training] [2023-07-05T22:34:17.901066] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
[Training] [2023-07-05T22:34:17.903069] out: ['loss_text_ce', 'loss_mel_ce', 'logits']
[Training] [2023-07-05T22:34:17.905069] ]
[Training] [2023-07-05T22:34:17.906575] ]
[Training] [2023-07-05T22:34:17.908082] losses:[
[Training] [2023-07-05T22:34:17.910081] text_ce:[
[Training] [2023-07-05T22:34:17.912655] type: direct
[Training] [2023-07-05T22:34:17.914160] weight: 0.01
[Training] [2023-07-05T22:34:17.916162] key: loss_text_ce
[Training] [2023-07-05T22:34:17.917666] ]
[Training] [2023-07-05T22:34:17.919668] mel_ce:[
[Training] [2023-07-05T22:34:17.921173] type: direct
[Training] [2023-07-05T22:34:17.923182] weight: 1
[Training] [2023-07-05T22:34:17.925180] key: loss_mel_ce
[Training] [2023-07-05T22:34:17.926724] ]
[Training] [2023-07-05T22:34:17.928686] ]
[Training] [2023-07-05T22:34:17.930864] ]
[Training] [2023-07-05T22:34:17.931865] ]
[Training] [2023-07-05T22:34:17.934383] networks:[
[Training] [2023-07-05T22:34:17.936384] gpt:[
[Training] [2023-07-05T22:34:17.937889] type: generator
[Training] [2023-07-05T22:34:17.939396] which_model_G: unified_voice2
[Training] [2023-07-05T22:34:17.941397] kwargs:[
[Training] [2023-07-05T22:34:17.943903] layers: 30
[Training] [2023-07-05T22:34:17.945414] model_dim: 1024
[Training] [2023-07-05T22:34:17.947412] heads: 16
[Training] [2023-07-05T22:34:17.949413] max_text_tokens: 402
[Training] [2023-07-05T22:34:17.950921] max_mel_tokens: 604
[Training] [2023-07-05T22:34:17.952925] max_conditioning_inputs: 2
[Training] [2023-07-05T22:34:17.954929] mel_length_compression: 1024
[Training] [2023-07-05T22:34:17.956927] number_text_tokens: 256
[Training] [2023-07-05T22:34:17.957970] number_mel_codes: 8194
[Training] [2023-07-05T22:34:17.959931] start_mel_token: 8192
[Training] [2023-07-05T22:34:17.961948] stop_mel_token: 8193
[Training] [2023-07-05T22:34:17.963452] start_text_token: 255
[Training] [2023-07-05T22:34:17.965457] train_solo_embeddings: False
[Training] [2023-07-05T22:34:17.967459] use_mel_codes_as_input: True
[Training] [2023-07-05T22:34:17.969964] checkpointing: True
[Training] [2023-07-05T22:34:17.971473] tortoise_compat: True
[Training] [2023-07-05T22:34:17.973488] ]
[Training] [2023-07-05T22:34:17.974489] ]
[Training] [2023-07-05T22:34:17.976994] ]
[Training] [2023-07-05T22:34:17.977996] path:[
[Training] [2023-07-05T22:34:17.980502] strict_load: True
[Training] [2023-07-05T22:34:17.982512] pretrain_model_gpt: ./models/tortoise/autoregressive.pth
[Training] [2023-07-05T22:34:17.984507] root: ./
[Training] [2023-07-05T22:34:17.985989] experiments_root: ./training\white mask varre\finetune
[Training] [2023-07-05T22:34:17.987984] models: ./training\white mask varre\finetune\models
[Training] [2023-07-05T22:34:17.989492] training_state: ./training\white mask varre\finetune\training_state
[Training] [2023-07-05T22:34:17.991492] log: ./training\white mask varre\finetune
[Training] [2023-07-05T22:34:17.993027] val_images: ./training\white mask varre\finetune\val_images
[Training] [2023-07-05T22:34:17.996026] ]
[Training] [2023-07-05T22:34:17.997532] train:[
[Training] [2023-07-05T22:34:17.999534] niter: 200
[Training] [2023-07-05T22:34:18.001038] warmup_iter: -1
[Training] [2023-07-05T22:34:18.003048] mega_batch_factor: 25
[Training] [2023-07-05T22:34:18.005044] val_freq: 5
[Training] [2023-07-05T22:34:18.006553] ema_enabled: False
[Training] [2023-07-05T22:34:18.009059] default_lr_scheme: MultiStepLR
[Training] [2023-07-05T22:34:18.011057] gen_lr_steps: [2, 4, 9, 18, 25, 33, 50, 59]
[Training] [2023-07-05T22:34:18.013061] lr_gamma: 0.5
[Training] [2023-07-05T22:34:18.015064] ]
[Training] [2023-07-05T22:34:18.017068] eval:[
[Training] [2023-07-05T22:34:18.019070] pure: False
[Training] [2023-07-05T22:34:18.020576] output_state: gen
[Training] [2023-07-05T22:34:18.022579] ]
[Training] [2023-07-05T22:34:18.024084] logger:[
[Training] [2023-07-05T22:34:18.026086] save_checkpoint_freq: 5
[Training] [2023-07-05T22:34:18.027591] visuals: ['gen', 'mel']
[Training] [2023-07-05T22:34:18.030100] visual_debug_rate: 5
[Training] [2023-07-05T22:34:18.031097] is_mel_spectrogram: True
[Training] [2023-07-05T22:34:18.033107] ]
[Training] [2023-07-05T22:34:18.035103] is_train: True
[Training] [2023-07-05T22:34:18.036616] dist: False
[Training] [2023-07-05T22:34:18.038611]
[Training] [2023-07-05T22:34:18.040118] 23-07-05 22:34:17.738 - INFO: Random seed: 1217
[Training] [2023-07-05T22:34:18.785112] 23-07-05 22:34:18.785 - INFO: Number of training data elements: 102, iters: 1
[Training] [2023-07-05T22:34:18.787634] 23-07-05 22:34:18.785 - INFO: Total epochs needed: 200 for iters 200
[Training] [2023-07-05T22:34:19.647798] C:\Users\A\Desktop\T TTS\ai-voice-cloning\venv\Lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing gradient_checkpointing to a config initialization is deprecated and will be removed in v5 Transformers. Using model.gradient_checkpointing_enable() instead, or if you are using the Trainer API, pass gradient_checkpointing=True in your TrainingArguments.
[Training] [2023-07-05T22:34:19.651310] warnings.warn(
[Training] [2023-07-05T22:34:26.250082] 23-07-05 22:34:26.250 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2023-07-05T22:34:26.911135] 23-07-05 22:34:26.905 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-07-05T22:34:28.832499] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-07-05T22:34:30.872644] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-07-05T22:34:31.550691] C:\Users\A\Desktop\T TTS\ai-voice-cloning\venv\Lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
[Training] [2023-07-05T22:34:31.551197] warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
[Training] [2023-07-05T22:35:03.830276] Disabled distributed training.
[Training] [2023-07-05T22:35:03.830276] Path already exists. Rename it to [./training\white mask varre\finetune_archived_230705-223417]
[Training] [2023-07-05T22:35:03.830276] Loading from ./models/tortoise/dvae.pth
[Training] [2023-07-05T22:35:03.831277] Traceback (most recent call last):
[Training] [2023-07-05T22:35:03.831277] File "C:\Users\A\Desktop\T TTS\ai-voice-cloning\src\train.py", line 64, in
[Training] [2023-07-05T22:35:03.831277] train(config_path, args.launcher)
[Training] [2023-07-05T22:35:03.831277] File "C:\Users\A\Desktop\T TTS\ai-voice-cloning\src\train.py", line 31, in train
[Training] [2023-07-05T22:35:03.831277] trainer.do_training()
[Training] [2023-07-05T22:35:03.831277] File "c:\users\a\desktop\t tts\ai-voice-cloning\modules\dlas\dlas\train.py", line 408, in do_training
[Training] [2023-07-05T22:35:03.831277] metric = self.do_step(train_data)
[Training] [2023-07-05T22:35:03.831277] ^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2023-07-05T22:35:03.831277] File "c:\users\a\desktop\t tts\ai-voice-cloning\modules\dlas\dlas\train.py", line 271, in do_step
[Training] [2023-07-05T22:35:03.832278] gradient_norms_dict = self.model.optimize_parameters(
[Training] [2023-07-05T22:35:03.832278] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2023-07-05T22:35:03.832278] File "c:\users\a\desktop\t tts\ai-voice-cloning\modules\dlas\dlas\trainer\ExtensibleTrainer.py", line 321, in optimize_parameters
[Training] [2023-07-05T22:35:03.832278] ns = step.do_forward_backward(
[Training] [2023-07-05T22:35:03.832783] ^^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2023-07-05T22:35:03.832783] File "c:\users\a\desktop\t tts\ai-voice-cloning\modules\dlas\dlas\trainer\steps.py", line 242, in do_forward_backward
[Training] [2023-07-05T22:35:03.832783] local_state[k] = v[grad_accum_step]
[Training] [2023-07-05T22:35:03.832783] ~^^^^^^^^^^^^^^^^^
[Training] [2023-07-05T22:35:03.832783] IndexError: list index out of range

suddenly says "error" Spawning process: train.bat ./training/white mask varre/train.yaml [Training] [2023-07-05T22:34:14.925564] [Training] [2023-07-05T22:34:14.929151] (venv) C:\Users\A\Desktop\T TTS\ai-voice-cloning>call .\venv\Scripts\activate.bat [Training] [2023-07-05T22:34:16.715450] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-07-05T22:34:17.738527] 23-07-05 22:34:17.738 - INFO: name: white mask varre [Training] [2023-07-05T22:34:17.741528] model: extensibletrainer [Training] [2023-07-05T22:34:17.744039] scale: 1 [Training] [2023-07-05T22:34:17.746036] gpu_ids: [0] [Training] [2023-07-05T22:34:17.748058] start_step: 0 [Training] [2023-07-05T22:34:17.750056] checkpointing_enabled: True [Training] [2023-07-05T22:34:17.752576] fp16: False [Training] [2023-07-05T22:34:17.753575] bitsandbytes: True [Training] [2023-07-05T22:34:17.756082] gpus: 1 [Training] [2023-07-05T22:34:17.758094] datasets:[ [Training] [2023-07-05T22:34:17.759086] train:[ [Training] [2023-07-05T22:34:17.761598] name: training [Training] [2023-07-05T22:34:17.763595] n_workers: 2 [Training] [2023-07-05T22:34:17.764992] batch_size: 102 [Training] [2023-07-05T22:34:17.766985] mode: paired_voice_audio [Training] [2023-07-05T22:34:17.768989] path: ./training/white mask varre/train.txt [Training] [2023-07-05T22:34:17.769990] fetcher_mode: ['lj'] [Training] [2023-07-05T22:34:17.772172] phase: train [Training] [2023-07-05T22:34:17.774170] max_wav_length: 255995 [Training] [2023-07-05T22:34:17.775679] max_text_length: 200 [Training] [2023-07-05T22:34:17.777677] sample_rate: 22050 [Training] [2023-07-05T22:34:17.779183] load_conditioning: True [Training] [2023-07-05T22:34:17.781693] num_conditioning_candidates: 2 [Training] [2023-07-05T22:34:17.782690] conditioning_length: 44000 [Training] [2023-07-05T22:34:17.785200] use_bpe_tokenizer: True [Training] [2023-07-05T22:34:17.787198] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json [Training] [2023-07-05T22:34:17.788718] load_aligned_codes: False [Training] [2023-07-05T22:34:17.790715] data_type: img [Training] [2023-07-05T22:34:17.792718] ] [Training] [2023-07-05T22:34:17.795228] val:[ [Training] [2023-07-05T22:34:17.796226] name: validation [Training] [2023-07-05T22:34:17.798736] n_workers: 2 [Training] [2023-07-05T22:34:17.800734] batch_size: 0 [Training] [2023-07-05T22:34:17.802739] mode: paired_voice_audio [Training] [2023-07-05T22:34:17.805251] path: ./training/white mask varre/validation.txt [Training] [2023-07-05T22:34:17.806247] fetcher_mode: ['lj'] [Training] [2023-07-05T22:34:17.808758] phase: val [Training] [2023-07-05T22:34:17.810756] max_wav_length: 255995 [Training] [2023-07-05T22:34:17.812778] max_text_length: 200 [Training] [2023-07-05T22:34:17.815286] sample_rate: 22050 [Training] [2023-07-05T22:34:17.817284] load_conditioning: True [Training] [2023-07-05T22:34:17.818797] num_conditioning_candidates: 2 [Training] [2023-07-05T22:34:17.820793] conditioning_length: 44000 [Training] [2023-07-05T22:34:17.822797] use_bpe_tokenizer: True [Training] [2023-07-05T22:34:17.825306] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json [Training] [2023-07-05T22:34:17.826304] load_aligned_codes: False [Training] [2023-07-05T22:34:17.828813] data_type: img [Training] [2023-07-05T22:34:17.830812] ] [Training] [2023-07-05T22:34:17.832815] ] [Training] [2023-07-05T22:34:17.833816] steps:[ [Training] [2023-07-05T22:34:17.835818] gpt_train:[ [Training] [2023-07-05T22:34:17.838325] training: gpt [Training] [2023-07-05T22:34:17.840829] loss_log_buffer: 500 [Training] [2023-07-05T22:34:17.841845] optimizer: adamw [Training] [2023-07-05T22:34:17.843843] optimizer_params:[ [Training] [2023-07-05T22:34:17.845351] lr: 0.0001 [Training] [2023-07-05T22:34:17.847352] weight_decay: 0.01 [Training] [2023-07-05T22:34:17.849393] beta1: 0.9 [Training] [2023-07-05T22:34:17.851900] beta2: 0.96 [Training] [2023-07-05T22:34:17.853413] ] [Training] [2023-07-05T22:34:17.855409] clip_grad_eps: 4 [Training] [2023-07-05T22:34:17.857915] injectors:[ [Training] [2023-07-05T22:34:17.858916] paired_to_mel:[ [Training] [2023-07-05T22:34:17.861426] type: torch_mel_spectrogram [Training] [2023-07-05T22:34:17.862427] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth [Training] [2023-07-05T22:34:17.864933] in: wav [Training] [2023-07-05T22:34:17.866935] out: paired_mel [Training] [2023-07-05T22:34:17.868937] ] [Training] [2023-07-05T22:34:17.870444] paired_cond_to_mel:[ [Training] [2023-07-05T22:34:17.872463] type: for_each [Training] [2023-07-05T22:34:17.874461] subtype: torch_mel_spectrogram [Training] [2023-07-05T22:34:17.875969] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth [Training] [2023-07-05T22:34:17.877969] in: conditioning [Training] [2023-07-05T22:34:17.879474] out: paired_conditioning_mel [Training] [2023-07-05T22:34:17.881988] ] [Training] [2023-07-05T22:34:17.884002] to_codes:[ [Training] [2023-07-05T22:34:17.886004] type: discrete_token [Training] [2023-07-05T22:34:17.888513] in: paired_mel [Training] [2023-07-05T22:34:17.889511] out: paired_mel_codes [Training] [2023-07-05T22:34:17.892024] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml [Training] [2023-07-05T22:34:17.894035] ] [Training] [2023-07-05T22:34:17.896038] paired_fwd_text:[ [Training] [2023-07-05T22:34:17.897544] type: generator [Training] [2023-07-05T22:34:17.899546] generator: gpt [Training] [2023-07-05T22:34:17.901066] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths'] [Training] [2023-07-05T22:34:17.903069] out: ['loss_text_ce', 'loss_mel_ce', 'logits'] [Training] [2023-07-05T22:34:17.905069] ] [Training] [2023-07-05T22:34:17.906575] ] [Training] [2023-07-05T22:34:17.908082] losses:[ [Training] [2023-07-05T22:34:17.910081] text_ce:[ [Training] [2023-07-05T22:34:17.912655] type: direct [Training] [2023-07-05T22:34:17.914160] weight: 0.01 [Training] [2023-07-05T22:34:17.916162] key: loss_text_ce [Training] [2023-07-05T22:34:17.917666] ] [Training] [2023-07-05T22:34:17.919668] mel_ce:[ [Training] [2023-07-05T22:34:17.921173] type: direct [Training] [2023-07-05T22:34:17.923182] weight: 1 [Training] [2023-07-05T22:34:17.925180] key: loss_mel_ce [Training] [2023-07-05T22:34:17.926724] ] [Training] [2023-07-05T22:34:17.928686] ] [Training] [2023-07-05T22:34:17.930864] ] [Training] [2023-07-05T22:34:17.931865] ] [Training] [2023-07-05T22:34:17.934383] networks:[ [Training] [2023-07-05T22:34:17.936384] gpt:[ [Training] [2023-07-05T22:34:17.937889] type: generator [Training] [2023-07-05T22:34:17.939396] which_model_G: unified_voice2 [Training] [2023-07-05T22:34:17.941397] kwargs:[ [Training] [2023-07-05T22:34:17.943903] layers: 30 [Training] [2023-07-05T22:34:17.945414] model_dim: 1024 [Training] [2023-07-05T22:34:17.947412] heads: 16 [Training] [2023-07-05T22:34:17.949413] max_text_tokens: 402 [Training] [2023-07-05T22:34:17.950921] max_mel_tokens: 604 [Training] [2023-07-05T22:34:17.952925] max_conditioning_inputs: 2 [Training] [2023-07-05T22:34:17.954929] mel_length_compression: 1024 [Training] [2023-07-05T22:34:17.956927] number_text_tokens: 256 [Training] [2023-07-05T22:34:17.957970] number_mel_codes: 8194 [Training] [2023-07-05T22:34:17.959931] start_mel_token: 8192 [Training] [2023-07-05T22:34:17.961948] stop_mel_token: 8193 [Training] [2023-07-05T22:34:17.963452] start_text_token: 255 [Training] [2023-07-05T22:34:17.965457] train_solo_embeddings: False [Training] [2023-07-05T22:34:17.967459] use_mel_codes_as_input: True [Training] [2023-07-05T22:34:17.969964] checkpointing: True [Training] [2023-07-05T22:34:17.971473] tortoise_compat: True [Training] [2023-07-05T22:34:17.973488] ] [Training] [2023-07-05T22:34:17.974489] ] [Training] [2023-07-05T22:34:17.976994] ] [Training] [2023-07-05T22:34:17.977996] path:[ [Training] [2023-07-05T22:34:17.980502] strict_load: True [Training] [2023-07-05T22:34:17.982512] pretrain_model_gpt: ./models/tortoise/autoregressive.pth [Training] [2023-07-05T22:34:17.984507] root: ./ [Training] [2023-07-05T22:34:17.985989] experiments_root: ./training\white mask varre\finetune [Training] [2023-07-05T22:34:17.987984] models: ./training\white mask varre\finetune\models [Training] [2023-07-05T22:34:17.989492] training_state: ./training\white mask varre\finetune\training_state [Training] [2023-07-05T22:34:17.991492] log: ./training\white mask varre\finetune [Training] [2023-07-05T22:34:17.993027] val_images: ./training\white mask varre\finetune\val_images [Training] [2023-07-05T22:34:17.996026] ] [Training] [2023-07-05T22:34:17.997532] train:[ [Training] [2023-07-05T22:34:17.999534] niter: 200 [Training] [2023-07-05T22:34:18.001038] warmup_iter: -1 [Training] [2023-07-05T22:34:18.003048] mega_batch_factor: 25 [Training] [2023-07-05T22:34:18.005044] val_freq: 5 [Training] [2023-07-05T22:34:18.006553] ema_enabled: False [Training] [2023-07-05T22:34:18.009059] default_lr_scheme: MultiStepLR [Training] [2023-07-05T22:34:18.011057] gen_lr_steps: [2, 4, 9, 18, 25, 33, 50, 59] [Training] [2023-07-05T22:34:18.013061] lr_gamma: 0.5 [Training] [2023-07-05T22:34:18.015064] ] [Training] [2023-07-05T22:34:18.017068] eval:[ [Training] [2023-07-05T22:34:18.019070] pure: False [Training] [2023-07-05T22:34:18.020576] output_state: gen [Training] [2023-07-05T22:34:18.022579] ] [Training] [2023-07-05T22:34:18.024084] logger:[ [Training] [2023-07-05T22:34:18.026086] save_checkpoint_freq: 5 [Training] [2023-07-05T22:34:18.027591] visuals: ['gen', 'mel'] [Training] [2023-07-05T22:34:18.030100] visual_debug_rate: 5 [Training] [2023-07-05T22:34:18.031097] is_mel_spectrogram: True [Training] [2023-07-05T22:34:18.033107] ] [Training] [2023-07-05T22:34:18.035103] is_train: True [Training] [2023-07-05T22:34:18.036616] dist: False [Training] [2023-07-05T22:34:18.038611] [Training] [2023-07-05T22:34:18.040118] 23-07-05 22:34:17.738 - INFO: Random seed: 1217 [Training] [2023-07-05T22:34:18.785112] 23-07-05 22:34:18.785 - INFO: Number of training data elements: 102, iters: 1 [Training] [2023-07-05T22:34:18.787634] 23-07-05 22:34:18.785 - INFO: Total epochs needed: 200 for iters 200 [Training] [2023-07-05T22:34:19.647798] C:\Users\A\Desktop\T TTS\ai-voice-cloning\venv\Lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`. [Training] [2023-07-05T22:34:19.651310] warnings.warn( [Training] [2023-07-05T22:34:26.250082] 23-07-05 22:34:26.250 - INFO: Loading model for [./models/tortoise/autoregressive.pth] [Training] [2023-07-05T22:34:26.911135] 23-07-05 22:34:26.905 - INFO: Start training from epoch: 0, iter: 0 [Training] [2023-07-05T22:34:28.832499] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-07-05T22:34:30.872644] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-07-05T22:34:31.550691] C:\Users\A\Desktop\T TTS\ai-voice-cloning\venv\Lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate [Training] [2023-07-05T22:34:31.551197] warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " [Training] [2023-07-05T22:35:03.830276] Disabled distributed training. [Training] [2023-07-05T22:35:03.830276] Path already exists. Rename it to [./training\white mask varre\finetune_archived_230705-223417] [Training] [2023-07-05T22:35:03.830276] Loading from ./models/tortoise/dvae.pth [Training] [2023-07-05T22:35:03.831277] Traceback (most recent call last): [Training] [2023-07-05T22:35:03.831277] File "C:\Users\A\Desktop\T TTS\ai-voice-cloning\src\train.py", line 64, in <module> [Training] [2023-07-05T22:35:03.831277] train(config_path, args.launcher) [Training] [2023-07-05T22:35:03.831277] File "C:\Users\A\Desktop\T TTS\ai-voice-cloning\src\train.py", line 31, in train [Training] [2023-07-05T22:35:03.831277] trainer.do_training() [Training] [2023-07-05T22:35:03.831277] File "c:\users\a\desktop\t tts\ai-voice-cloning\modules\dlas\dlas\train.py", line 408, in do_training [Training] [2023-07-05T22:35:03.831277] metric = self.do_step(train_data) [Training] [2023-07-05T22:35:03.831277] ^^^^^^^^^^^^^^^^^^^^^^^^ [Training] [2023-07-05T22:35:03.831277] File "c:\users\a\desktop\t tts\ai-voice-cloning\modules\dlas\dlas\train.py", line 271, in do_step [Training] [2023-07-05T22:35:03.832278] gradient_norms_dict = self.model.optimize_parameters( [Training] [2023-07-05T22:35:03.832278] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Training] [2023-07-05T22:35:03.832278] File "c:\users\a\desktop\t tts\ai-voice-cloning\modules\dlas\dlas\trainer\ExtensibleTrainer.py", line 321, in optimize_parameters [Training] [2023-07-05T22:35:03.832278] ns = step.do_forward_backward( [Training] [2023-07-05T22:35:03.832783] ^^^^^^^^^^^^^^^^^^^^^^^^^ [Training] [2023-07-05T22:35:03.832783] File "c:\users\a\desktop\t tts\ai-voice-cloning\modules\dlas\dlas\trainer\steps.py", line 242, in do_forward_backward [Training] [2023-07-05T22:35:03.832783] local_state[k] = v[grad_accum_step] [Training] [2023-07-05T22:35:03.832783] ~^^^^^^^^^^^^^^^^^ [Training] [2023-07-05T22:35:03.832783] IndexError: list index out of range
Author

and then it just freezes everything

and then it just freezes everything

How much VRAM do you have? If it's 8GB or less then knock the # of training elements down to 96 and try with a batch size of 32.

How much VRAM do you have? If it's 8GB or less then knock the # of training elements down to 96 and try with a batch size of 32.
Author

16gb ddr6x

16gb ddr6x

That should be more than enough but wouldn't hurt to try 96 anyway.

That should be more than enough but wouldn't hurt to try 96 anyway.

I have same issue. 12 gb ram, tried batch no resolve.

I have same issue. 12 gb ram, tried batch no resolve.
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#295
No description provided.