training #295
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#295
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
suddenly says "error"
Spawning process: train.bat ./training/white mask varre/train.yaml
[Training] [2023-07-05T22:34:14.925564]
[Training] [2023-07-05T22:34:14.929151] (venv) C:\Users\A\Desktop\T TTS\ai-voice-cloning>call .\venv\Scripts\activate.bat
[Training] [2023-07-05T22:34:16.715450] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-07-05T22:34:17.738527] 23-07-05 22:34:17.738 - INFO: name: white mask varre
[Training] [2023-07-05T22:34:17.741528] model: extensibletrainer
[Training] [2023-07-05T22:34:17.744039] scale: 1
[Training] [2023-07-05T22:34:17.746036] gpu_ids: [0]
[Training] [2023-07-05T22:34:17.748058] start_step: 0
[Training] [2023-07-05T22:34:17.750056] checkpointing_enabled: True
[Training] [2023-07-05T22:34:17.752576] fp16: False
[Training] [2023-07-05T22:34:17.753575] bitsandbytes: True
[Training] [2023-07-05T22:34:17.756082] gpus: 1
[Training] [2023-07-05T22:34:17.758094] datasets:[
[Training] [2023-07-05T22:34:17.759086] train:[
[Training] [2023-07-05T22:34:17.761598] name: training
[Training] [2023-07-05T22:34:17.763595] n_workers: 2
[Training] [2023-07-05T22:34:17.764992] batch_size: 102
[Training] [2023-07-05T22:34:17.766985] mode: paired_voice_audio
[Training] [2023-07-05T22:34:17.768989] path: ./training/white mask varre/train.txt
[Training] [2023-07-05T22:34:17.769990] fetcher_mode: ['lj']
[Training] [2023-07-05T22:34:17.772172] phase: train
[Training] [2023-07-05T22:34:17.774170] max_wav_length: 255995
[Training] [2023-07-05T22:34:17.775679] max_text_length: 200
[Training] [2023-07-05T22:34:17.777677] sample_rate: 22050
[Training] [2023-07-05T22:34:17.779183] load_conditioning: True
[Training] [2023-07-05T22:34:17.781693] num_conditioning_candidates: 2
[Training] [2023-07-05T22:34:17.782690] conditioning_length: 44000
[Training] [2023-07-05T22:34:17.785200] use_bpe_tokenizer: True
[Training] [2023-07-05T22:34:17.787198] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-07-05T22:34:17.788718] load_aligned_codes: False
[Training] [2023-07-05T22:34:17.790715] data_type: img
[Training] [2023-07-05T22:34:17.792718] ]
[Training] [2023-07-05T22:34:17.795228] val:[
[Training] [2023-07-05T22:34:17.796226] name: validation
[Training] [2023-07-05T22:34:17.798736] n_workers: 2
[Training] [2023-07-05T22:34:17.800734] batch_size: 0
[Training] [2023-07-05T22:34:17.802739] mode: paired_voice_audio
[Training] [2023-07-05T22:34:17.805251] path: ./training/white mask varre/validation.txt
[Training] [2023-07-05T22:34:17.806247] fetcher_mode: ['lj']
[Training] [2023-07-05T22:34:17.808758] phase: val
[Training] [2023-07-05T22:34:17.810756] max_wav_length: 255995
[Training] [2023-07-05T22:34:17.812778] max_text_length: 200
[Training] [2023-07-05T22:34:17.815286] sample_rate: 22050
[Training] [2023-07-05T22:34:17.817284] load_conditioning: True
[Training] [2023-07-05T22:34:17.818797] num_conditioning_candidates: 2
[Training] [2023-07-05T22:34:17.820793] conditioning_length: 44000
[Training] [2023-07-05T22:34:17.822797] use_bpe_tokenizer: True
[Training] [2023-07-05T22:34:17.825306] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-07-05T22:34:17.826304] load_aligned_codes: False
[Training] [2023-07-05T22:34:17.828813] data_type: img
[Training] [2023-07-05T22:34:17.830812] ]
[Training] [2023-07-05T22:34:17.832815] ]
[Training] [2023-07-05T22:34:17.833816] steps:[
[Training] [2023-07-05T22:34:17.835818] gpt_train:[
[Training] [2023-07-05T22:34:17.838325] training: gpt
[Training] [2023-07-05T22:34:17.840829] loss_log_buffer: 500
[Training] [2023-07-05T22:34:17.841845] optimizer: adamw
[Training] [2023-07-05T22:34:17.843843] optimizer_params:[
[Training] [2023-07-05T22:34:17.845351] lr: 0.0001
[Training] [2023-07-05T22:34:17.847352] weight_decay: 0.01
[Training] [2023-07-05T22:34:17.849393] beta1: 0.9
[Training] [2023-07-05T22:34:17.851900] beta2: 0.96
[Training] [2023-07-05T22:34:17.853413] ]
[Training] [2023-07-05T22:34:17.855409] clip_grad_eps: 4
[Training] [2023-07-05T22:34:17.857915] injectors:[
[Training] [2023-07-05T22:34:17.858916] paired_to_mel:[
[Training] [2023-07-05T22:34:17.861426] type: torch_mel_spectrogram
[Training] [2023-07-05T22:34:17.862427] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-07-05T22:34:17.864933] in: wav
[Training] [2023-07-05T22:34:17.866935] out: paired_mel
[Training] [2023-07-05T22:34:17.868937] ]
[Training] [2023-07-05T22:34:17.870444] paired_cond_to_mel:[
[Training] [2023-07-05T22:34:17.872463] type: for_each
[Training] [2023-07-05T22:34:17.874461] subtype: torch_mel_spectrogram
[Training] [2023-07-05T22:34:17.875969] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-07-05T22:34:17.877969] in: conditioning
[Training] [2023-07-05T22:34:17.879474] out: paired_conditioning_mel
[Training] [2023-07-05T22:34:17.881988] ]
[Training] [2023-07-05T22:34:17.884002] to_codes:[
[Training] [2023-07-05T22:34:17.886004] type: discrete_token
[Training] [2023-07-05T22:34:17.888513] in: paired_mel
[Training] [2023-07-05T22:34:17.889511] out: paired_mel_codes
[Training] [2023-07-05T22:34:17.892024] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml
[Training] [2023-07-05T22:34:17.894035] ]
[Training] [2023-07-05T22:34:17.896038] paired_fwd_text:[
[Training] [2023-07-05T22:34:17.897544] type: generator
[Training] [2023-07-05T22:34:17.899546] generator: gpt
[Training] [2023-07-05T22:34:17.901066] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
[Training] [2023-07-05T22:34:17.903069] out: ['loss_text_ce', 'loss_mel_ce', 'logits']
[Training] [2023-07-05T22:34:17.905069] ]
[Training] [2023-07-05T22:34:17.906575] ]
[Training] [2023-07-05T22:34:17.908082] losses:[
[Training] [2023-07-05T22:34:17.910081] text_ce:[
[Training] [2023-07-05T22:34:17.912655] type: direct
[Training] [2023-07-05T22:34:17.914160] weight: 0.01
[Training] [2023-07-05T22:34:17.916162] key: loss_text_ce
[Training] [2023-07-05T22:34:17.917666] ]
[Training] [2023-07-05T22:34:17.919668] mel_ce:[
[Training] [2023-07-05T22:34:17.921173] type: direct
[Training] [2023-07-05T22:34:17.923182] weight: 1
[Training] [2023-07-05T22:34:17.925180] key: loss_mel_ce
[Training] [2023-07-05T22:34:17.926724] ]
[Training] [2023-07-05T22:34:17.928686] ]
[Training] [2023-07-05T22:34:17.930864] ]
[Training] [2023-07-05T22:34:17.931865] ]
[Training] [2023-07-05T22:34:17.934383] networks:[
[Training] [2023-07-05T22:34:17.936384] gpt:[
[Training] [2023-07-05T22:34:17.937889] type: generator
[Training] [2023-07-05T22:34:17.939396] which_model_G: unified_voice2
[Training] [2023-07-05T22:34:17.941397] kwargs:[
[Training] [2023-07-05T22:34:17.943903] layers: 30
[Training] [2023-07-05T22:34:17.945414] model_dim: 1024
[Training] [2023-07-05T22:34:17.947412] heads: 16
[Training] [2023-07-05T22:34:17.949413] max_text_tokens: 402
[Training] [2023-07-05T22:34:17.950921] max_mel_tokens: 604
[Training] [2023-07-05T22:34:17.952925] max_conditioning_inputs: 2
[Training] [2023-07-05T22:34:17.954929] mel_length_compression: 1024
[Training] [2023-07-05T22:34:17.956927] number_text_tokens: 256
[Training] [2023-07-05T22:34:17.957970] number_mel_codes: 8194
[Training] [2023-07-05T22:34:17.959931] start_mel_token: 8192
[Training] [2023-07-05T22:34:17.961948] stop_mel_token: 8193
[Training] [2023-07-05T22:34:17.963452] start_text_token: 255
[Training] [2023-07-05T22:34:17.965457] train_solo_embeddings: False
[Training] [2023-07-05T22:34:17.967459] use_mel_codes_as_input: True
[Training] [2023-07-05T22:34:17.969964] checkpointing: True
[Training] [2023-07-05T22:34:17.971473] tortoise_compat: True
[Training] [2023-07-05T22:34:17.973488] ]
[Training] [2023-07-05T22:34:17.974489] ]
[Training] [2023-07-05T22:34:17.976994] ]
[Training] [2023-07-05T22:34:17.977996] path:[
[Training] [2023-07-05T22:34:17.980502] strict_load: True
[Training] [2023-07-05T22:34:17.982512] pretrain_model_gpt: ./models/tortoise/autoregressive.pth
[Training] [2023-07-05T22:34:17.984507] root: ./
[Training] [2023-07-05T22:34:17.985989] experiments_root: ./training\white mask varre\finetune
[Training] [2023-07-05T22:34:17.987984] models: ./training\white mask varre\finetune\models
[Training] [2023-07-05T22:34:17.989492] training_state: ./training\white mask varre\finetune\training_state
[Training] [2023-07-05T22:34:17.991492] log: ./training\white mask varre\finetune
[Training] [2023-07-05T22:34:17.993027] val_images: ./training\white mask varre\finetune\val_images
[Training] [2023-07-05T22:34:17.996026] ]
[Training] [2023-07-05T22:34:17.997532] train:[
[Training] [2023-07-05T22:34:17.999534] niter: 200
[Training] [2023-07-05T22:34:18.001038] warmup_iter: -1
[Training] [2023-07-05T22:34:18.003048] mega_batch_factor: 25
[Training] [2023-07-05T22:34:18.005044] val_freq: 5
[Training] [2023-07-05T22:34:18.006553] ema_enabled: False
[Training] [2023-07-05T22:34:18.009059] default_lr_scheme: MultiStepLR
[Training] [2023-07-05T22:34:18.011057] gen_lr_steps: [2, 4, 9, 18, 25, 33, 50, 59]
[Training] [2023-07-05T22:34:18.013061] lr_gamma: 0.5
[Training] [2023-07-05T22:34:18.015064] ]
[Training] [2023-07-05T22:34:18.017068] eval:[
[Training] [2023-07-05T22:34:18.019070] pure: False
[Training] [2023-07-05T22:34:18.020576] output_state: gen
[Training] [2023-07-05T22:34:18.022579] ]
[Training] [2023-07-05T22:34:18.024084] logger:[
[Training] [2023-07-05T22:34:18.026086] save_checkpoint_freq: 5
[Training] [2023-07-05T22:34:18.027591] visuals: ['gen', 'mel']
[Training] [2023-07-05T22:34:18.030100] visual_debug_rate: 5
[Training] [2023-07-05T22:34:18.031097] is_mel_spectrogram: True
[Training] [2023-07-05T22:34:18.033107] ]
[Training] [2023-07-05T22:34:18.035103] is_train: True
[Training] [2023-07-05T22:34:18.036616] dist: False
[Training] [2023-07-05T22:34:18.038611]
[Training] [2023-07-05T22:34:18.040118] 23-07-05 22:34:17.738 - INFO: Random seed: 1217
[Training] [2023-07-05T22:34:18.785112] 23-07-05 22:34:18.785 - INFO: Number of training data elements: 102, iters: 1
[Training] [2023-07-05T22:34:18.787634] 23-07-05 22:34:18.785 - INFO: Total epochs needed: 200 for iters 200
[Training] [2023-07-05T22:34:19.647798] C:\Users\A\Desktop\T TTS\ai-voice-cloning\venv\Lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing
gradient_checkpointing
to a config initialization is deprecated and will be removed in v5 Transformers. Usingmodel.gradient_checkpointing_enable()
instead, or if you are using theTrainer
API, passgradient_checkpointing=True
in yourTrainingArguments
.[Training] [2023-07-05T22:34:19.651310] warnings.warn(
[Training] [2023-07-05T22:34:26.250082] 23-07-05 22:34:26.250 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2023-07-05T22:34:26.911135] 23-07-05 22:34:26.905 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-07-05T22:34:28.832499] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-07-05T22:34:30.872644] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-07-05T22:34:31.550691] C:\Users\A\Desktop\T TTS\ai-voice-cloning\venv\Lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of
lr_scheduler.step()
beforeoptimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()
beforelr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate[Training] [2023-07-05T22:34:31.551197] warnings.warn("Detected call of
lr_scheduler.step()
beforeoptimizer.step()
. "[Training] [2023-07-05T22:35:03.830276] Disabled distributed training.
[Training] [2023-07-05T22:35:03.830276] Path already exists. Rename it to [./training\white mask varre\finetune_archived_230705-223417]
[Training] [2023-07-05T22:35:03.830276] Loading from ./models/tortoise/dvae.pth
[Training] [2023-07-05T22:35:03.831277] Traceback (most recent call last):
[Training] [2023-07-05T22:35:03.831277] File "C:\Users\A\Desktop\T TTS\ai-voice-cloning\src\train.py", line 64, in
[Training] [2023-07-05T22:35:03.831277] train(config_path, args.launcher)
[Training] [2023-07-05T22:35:03.831277] File "C:\Users\A\Desktop\T TTS\ai-voice-cloning\src\train.py", line 31, in train
[Training] [2023-07-05T22:35:03.831277] trainer.do_training()
[Training] [2023-07-05T22:35:03.831277] File "c:\users\a\desktop\t tts\ai-voice-cloning\modules\dlas\dlas\train.py", line 408, in do_training
[Training] [2023-07-05T22:35:03.831277] metric = self.do_step(train_data)
[Training] [2023-07-05T22:35:03.831277] ^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2023-07-05T22:35:03.831277] File "c:\users\a\desktop\t tts\ai-voice-cloning\modules\dlas\dlas\train.py", line 271, in do_step
[Training] [2023-07-05T22:35:03.832278] gradient_norms_dict = self.model.optimize_parameters(
[Training] [2023-07-05T22:35:03.832278] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2023-07-05T22:35:03.832278] File "c:\users\a\desktop\t tts\ai-voice-cloning\modules\dlas\dlas\trainer\ExtensibleTrainer.py", line 321, in optimize_parameters
[Training] [2023-07-05T22:35:03.832278] ns = step.do_forward_backward(
[Training] [2023-07-05T22:35:03.832783] ^^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2023-07-05T22:35:03.832783] File "c:\users\a\desktop\t tts\ai-voice-cloning\modules\dlas\dlas\trainer\steps.py", line 242, in do_forward_backward
[Training] [2023-07-05T22:35:03.832783] local_state[k] = v[grad_accum_step]
[Training] [2023-07-05T22:35:03.832783] ~^^^^^^^^^^^^^^^^^
[Training] [2023-07-05T22:35:03.832783] IndexError: list index out of range
and then it just freezes everything
How much VRAM do you have? If it's 8GB or less then knock the # of training elements down to 96 and try with a batch size of 32.
16gb ddr6x
That should be more than enough but wouldn't hurt to try 96 anyway.
I have same issue. 12 gb ram, tried batch no resolve.