Training starts, then immediately stops and reports as "finished". #169

Closed
opened 2023-03-24 02:36:50 +00:00 by sazandora · 4 comments

Python Version: 3.10.6
GPU: RTX 2070 Super (Max Q)
OS: Windows 10
Summary of what I was trying to do: Upon trying to train a model, the model will train for (presumably) zero steps, then report back as having finished doing so. The only consistent errors I've been receiving are the gradient checkpointing one (Below, in this stack), and another one about "lr.scheduler_step()" being called before "Optimizer.step()" (See second stack.) Any ideas?

E:\ai-voice-cloning>call .\venv\Scripts\activate.bat
!WARNING! Automatically deduced sample batch size returned 1.
!WARNING! Automatically deduced sample batch size returned 1.
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Spawning process:  train.bat ./training/desco/train.yaml
[Training] [2023-03-23T22:21:14.829083]
[Training] [2023-03-23T22:21:14.833072] (venv) E:\ai-voice-cloning>call .\venv\Scripts\activate.bat
[Training] [2023-03-23T22:21:17.392331] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-03-23T22:21:20.531822] 23-03-23 22:21:20.524 - INFO:   name: desco
[Training] [2023-03-23T22:21:20.536770]   model: extensibletrainer
[Training] [2023-03-23T22:21:20.539762]   scale: 1
[Training] [2023-03-23T22:21:20.542754]   gpu_ids: [0]
[Training] [2023-03-23T22:21:20.548739]   start_step: 0
[Training] [2023-03-23T22:21:20.551731]   checkpointing_enabled: True
[Training] [2023-03-23T22:21:20.555720]   fp16: False
[Training] [2023-03-23T22:21:20.558712]   bitsandbytes: True
[Training] [2023-03-23T22:21:20.562701]   gpus: 1
[Training] [2023-03-23T22:21:20.566690]   datasets:[
[Training] [2023-03-23T22:21:20.570680]     train:[
[Training] [2023-03-23T22:21:20.573671]       name: training
[Training] [2023-03-23T22:21:20.577662]       n_workers: 2
[Training] [2023-03-23T22:21:20.580653]       batch_size: 148
[Training] [2023-03-23T22:21:20.584642]       mode: paired_voice_audio
[Training] [2023-03-23T22:21:20.587633]       path: ./training/desco/train.txt
[Training] [2023-03-23T22:21:20.590640]       fetcher_mode: ['lj']
[Training] [2023-03-23T22:21:20.594616]       phase: train
[Training] [2023-03-23T22:21:20.597608]       max_wav_length: 255995
[Training] [2023-03-23T22:21:20.601596]       max_text_length: 200
[Training] [2023-03-23T22:21:20.605587]       sample_rate: 22050
[Training] [2023-03-23T22:21:20.608578]       load_conditioning: True
[Training] [2023-03-23T22:21:20.612568]       num_conditioning_candidates: 2
[Training] [2023-03-23T22:21:20.615560]       conditioning_length: 44000
[Training] [2023-03-23T22:21:20.619550]       use_bpe_tokenizer: True
[Training] [2023-03-23T22:21:20.623538]       tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-03-23T22:21:20.627542]       load_aligned_codes: False
[Training] [2023-03-23T22:21:20.631516]       data_type: img
[Training] [2023-03-23T22:21:20.634508]     ]
[Training] [2023-03-23T22:21:20.638514]     val:[
[Training] [2023-03-23T22:21:20.642498]       name: validation
[Training] [2023-03-23T22:21:20.646476]       n_workers: 2
[Training] [2023-03-23T22:21:20.649468]       batch_size: 3
[Training] [2023-03-23T22:21:20.653457]       mode: paired_voice_audio
[Training] [2023-03-23T22:21:20.656450]       path: ./training/desco/validation.txt
[Training] [2023-03-23T22:21:20.659442]       fetcher_mode: ['lj']
[Training] [2023-03-23T22:21:20.662433]       phase: val
[Training] [2023-03-23T22:21:20.666423]       max_wav_length: 255995
[Training] [2023-03-23T22:21:20.670414]       max_text_length: 200
[Training] [2023-03-23T22:21:20.673431]       sample_rate: 22050
[Training] [2023-03-23T22:21:20.676423]       load_conditioning: True
[Training] [2023-03-23T22:21:20.679388]       num_conditioning_candidates: 2
[Training] [2023-03-23T22:21:20.682380]       conditioning_length: 44000
[Training] [2023-03-23T22:21:20.687371]       use_bpe_tokenizer: True
[Training] [2023-03-23T22:21:20.690359]       tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-03-23T22:21:20.693351]       load_aligned_codes: False
[Training] [2023-03-23T22:21:20.696343]       data_type: img
[Training] [2023-03-23T22:21:20.700332]     ]
[Training] [2023-03-23T22:21:20.703325]   ]
[Training] [2023-03-23T22:21:20.707314]   steps:[
[Training] [2023-03-23T22:21:20.710308]     gpt_train:[
[Training] [2023-03-23T22:21:20.713298]       training: gpt
[Training] [2023-03-23T22:21:20.718285]       loss_log_buffer: 500
[Training] [2023-03-23T22:21:20.721276]       optimizer: adamw
[Training] [2023-03-23T22:21:20.725266]       optimizer_params:[
[Training] [2023-03-23T22:21:20.728258]         lr: 1e-05
[Training] [2023-03-23T22:21:20.732248]         weight_decay: 0.01
[Training] [2023-03-23T22:21:20.735240]         beta1: 0.9
[Training] [2023-03-23T22:21:20.739228]         beta2: 0.96
[Training] [2023-03-23T22:21:20.742220]       ]
[Training] [2023-03-23T22:21:20.745212]       clip_grad_eps: 4
[Training] [2023-03-23T22:21:20.749202]       injectors:[
[Training] [2023-03-23T22:21:20.752194]         paired_to_mel:[
[Training] [2023-03-23T22:21:20.756183]           type: torch_mel_spectrogram
[Training] [2023-03-23T22:21:20.759178]           mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-03-23T22:21:20.763164]           in: wav
[Training] [2023-03-23T22:21:20.766157]           out: paired_mel
[Training] [2023-03-23T22:21:20.770146]         ]
[Training] [2023-03-23T22:21:20.773137]         paired_cond_to_mel:[
[Training] [2023-03-23T22:21:20.779121]           type: for_each
[Training] [2023-03-23T22:21:20.784109]           subtype: torch_mel_spectrogram
[Training] [2023-03-23T22:21:20.787101]           mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-03-23T22:21:20.791090]           in: conditioning
[Training] [2023-03-23T22:21:20.795080]           out: paired_conditioning_mel
[Training] [2023-03-23T22:21:20.799068]         ]
[Training] [2023-03-23T22:21:20.802060]         to_codes:[
[Training] [2023-03-23T22:21:20.806050]           type: discrete_token
[Training] [2023-03-23T22:21:20.810039]           in: paired_mel
[Training] [2023-03-23T22:21:20.813030]           out: paired_mel_codes
[Training] [2023-03-23T22:21:20.816022]           dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml
[Training] [2023-03-23T22:21:20.820013]         ]
[Training] [2023-03-23T22:21:20.823005]         paired_fwd_text:[
[Training] [2023-03-23T22:21:20.825996]           type: generator
[Training] [2023-03-23T22:21:20.829986]           generator: gpt
[Training] [2023-03-23T22:21:20.832978]           in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
[Training] [2023-03-23T22:21:20.835992]           out: ['loss_text_ce', 'loss_mel_ce', 'logits']
[Training] [2023-03-23T22:21:20.838962]         ]
[Training] [2023-03-23T22:21:20.842952]       ]
[Training] [2023-03-23T22:21:20.845944]       losses:[
[Training] [2023-03-23T22:21:20.848941]         text_ce:[
[Training] [2023-03-23T22:21:20.852926]           type: direct
[Training] [2023-03-23T22:21:20.855917]           weight: 0.01
[Training] [2023-03-23T22:21:20.858935]           key: loss_text_ce
[Training] [2023-03-23T22:21:20.862898]         ]
[Training] [2023-03-23T22:21:20.865890]         mel_ce:[
[Training] [2023-03-23T22:21:20.868881]           type: direct
[Training] [2023-03-23T22:21:20.871874]           weight: 1
[Training] [2023-03-23T22:21:20.874866]           key: loss_mel_ce
[Training] [2023-03-23T22:21:20.877858]         ]
[Training] [2023-03-23T22:21:20.880850]       ]
[Training] [2023-03-23T22:21:20.883842]     ]
[Training] [2023-03-23T22:21:20.887831]   ]
[Training] [2023-03-23T22:21:20.890823]   networks:[
[Training] [2023-03-23T22:21:20.893815]     gpt:[
[Training] [2023-03-23T22:21:20.896807]       type: generator
[Training] [2023-03-23T22:21:20.899799]       which_model_G: unified_voice2
[Training] [2023-03-23T22:21:20.903788]       kwargs:[
[Training] [2023-03-23T22:21:20.906780]         layers: 30
[Training] [2023-03-23T22:21:20.909772]         model_dim: 1024
[Training] [2023-03-23T22:21:20.912764]         heads: 16
[Training] [2023-03-23T22:21:20.915756]         max_text_tokens: 402
[Training] [2023-03-23T22:21:20.919746]         max_mel_tokens: 604
[Training] [2023-03-23T22:21:20.923734]         max_conditioning_inputs: 2
[Training] [2023-03-23T22:21:20.925729]         mel_length_compression: 1024
[Training] [2023-03-23T22:21:20.929718]         number_text_tokens: 256
[Training] [2023-03-23T22:21:20.932711]         number_mel_codes: 8194
[Training] [2023-03-23T22:21:20.936700]         start_mel_token: 8192
[Training] [2023-03-23T22:21:20.939692]         stop_mel_token: 8193
[Training] [2023-03-23T22:21:20.942684]         start_text_token: 255
[Training] [2023-03-23T22:21:20.946674]         train_solo_embeddings: False
[Training] [2023-03-23T22:21:20.949666]         use_mel_codes_as_input: True
[Training] [2023-03-23T22:21:20.953654]         checkpointing: True
[Training] [2023-03-23T22:21:20.956647]         tortoise_compat: True
[Training] [2023-03-23T22:21:20.960636]       ]
[Training] [2023-03-23T22:21:20.963628]     ]
[Training] [2023-03-23T22:21:20.967618]   ]
[Training] [2023-03-23T22:21:20.971607]   path:[
[Training] [2023-03-23T22:21:20.974616]     strict_load: True
[Training] [2023-03-23T22:21:20.977591]     pretrain_model_gpt: ./models/tortoise/autoregressive.pth
[Training] [2023-03-23T22:21:20.981580]     root: ./
[Training] [2023-03-23T22:21:20.984572]     experiments_root: ./training\desco\finetune
[Training] [2023-03-23T22:21:20.987564]     models: ./training\desco\finetune\models
[Training] [2023-03-23T22:21:20.991553]     training_state: ./training\desco\finetune\training_state
[Training] [2023-03-23T22:21:20.994546]     log: ./training\desco\finetune
[Training] [2023-03-23T22:21:20.997537]     val_images: ./training\desco\finetune\val_images
[Training] [2023-03-23T22:21:21.000529]   ]
[Training] [2023-03-23T22:21:21.004518]   train:[
[Training] [2023-03-23T22:21:21.007510]     niter: 500
[Training] [2023-03-23T22:21:21.011500]     warmup_iter: -1
[Training] [2023-03-23T22:21:21.015522]     mega_batch_factor: 37
[Training] [2023-03-23T22:21:21.018492]     val_freq: 5
[Training] [2023-03-23T22:21:21.022471]     ema_enabled: False
[Training] [2023-03-23T22:21:21.025463]     default_lr_scheme: MultiStepLR
[Training] [2023-03-23T22:21:21.028473]     gen_lr_steps: [2, 4, 9, 18, 25, 33, 50]
[Training] [2023-03-23T22:21:21.032445]     lr_gamma: 0.5
[Training] [2023-03-23T22:21:21.035436]   ]
[Training] [2023-03-23T22:21:21.038428]   eval:[
[Training] [2023-03-23T22:21:21.041420]     pure: False
[Training] [2023-03-23T22:21:21.045409]     output_state: gen
[Training] [2023-03-23T22:21:21.048401]   ]
[Training] [2023-03-23T22:21:21.051393]   logger:[
[Training] [2023-03-23T22:21:21.054385]     save_checkpoint_freq: 5
[Training] [2023-03-23T22:21:21.057377]     visuals: ['gen', 'mel']
[Training] [2023-03-23T22:21:21.060369]     visual_debug_rate: 5
[Training] [2023-03-23T22:21:21.063361]     is_mel_spectrogram: True
[Training] [2023-03-23T22:21:21.066354]   ]
[Training] [2023-03-23T22:21:21.069345]   is_train: True
[Training] [2023-03-23T22:21:21.072337]   dist: False
[Training] [2023-03-23T22:21:21.076327]
[Training] [2023-03-23T22:21:21.079338] 23-03-23 22:21:20.531 - INFO: Random seed: 9214
[Training] [2023-03-23T22:21:24.363664] 23-03-23 22:21:24.358 - INFO: Number of training data elements: 74, iters: 1
[Training] [2023-03-23T22:21:24.368661] 23-03-23 22:21:24.363 - INFO: Total epochs needed: 500 for iters 500
[Training] [2023-03-23T22:21:26.067199] E:\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
[Training] [2023-03-23T22:21:26.072200]   warnings.warn(
[Training] [2023-03-23T22:21:38.064530] 23-03-23 22:21:38.060 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2023-03-23T22:21:39.024503] 23-03-23 22:21:39.008 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-03-23T22:21:41.296396] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-03-23T22:21:44.228557] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-03-23T22:22:17.483483] 23-03-23 22:22:17.474 - INFO: Saving models and training states.
[Training] [2023-03-23T22:22:17.491461] 23-03-23 22:22:17.483 - INFO: Finished training!

And the lr_scheduler one (Though it didn't show in this instance):

Warning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.
Python Version: 3.10.6 GPU: RTX 2070 Super (Max Q) OS: Windows 10 Summary of what I was trying to do: Upon trying to train a model, the model will train for (presumably) zero steps, then report back as having finished doing so. The only consistent errors I've been receiving are the gradient checkpointing one (Below, in this stack), and another one about "lr.scheduler_step()" being called before "Optimizer.step()" (See second stack.) Any ideas? ``` E:\ai-voice-cloning>call .\venv\Scripts\activate.bat !WARNING! Automatically deduced sample batch size returned 1. !WARNING! Automatically deduced sample batch size returned 1. Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Spawning process: train.bat ./training/desco/train.yaml [Training] [2023-03-23T22:21:14.829083] [Training] [2023-03-23T22:21:14.833072] (venv) E:\ai-voice-cloning>call .\venv\Scripts\activate.bat [Training] [2023-03-23T22:21:17.392331] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-03-23T22:21:20.531822] 23-03-23 22:21:20.524 - INFO: name: desco [Training] [2023-03-23T22:21:20.536770] model: extensibletrainer [Training] [2023-03-23T22:21:20.539762] scale: 1 [Training] [2023-03-23T22:21:20.542754] gpu_ids: [0] [Training] [2023-03-23T22:21:20.548739] start_step: 0 [Training] [2023-03-23T22:21:20.551731] checkpointing_enabled: True [Training] [2023-03-23T22:21:20.555720] fp16: False [Training] [2023-03-23T22:21:20.558712] bitsandbytes: True [Training] [2023-03-23T22:21:20.562701] gpus: 1 [Training] [2023-03-23T22:21:20.566690] datasets:[ [Training] [2023-03-23T22:21:20.570680] train:[ [Training] [2023-03-23T22:21:20.573671] name: training [Training] [2023-03-23T22:21:20.577662] n_workers: 2 [Training] [2023-03-23T22:21:20.580653] batch_size: 148 [Training] [2023-03-23T22:21:20.584642] mode: paired_voice_audio [Training] [2023-03-23T22:21:20.587633] path: ./training/desco/train.txt [Training] [2023-03-23T22:21:20.590640] fetcher_mode: ['lj'] [Training] [2023-03-23T22:21:20.594616] phase: train [Training] [2023-03-23T22:21:20.597608] max_wav_length: 255995 [Training] [2023-03-23T22:21:20.601596] max_text_length: 200 [Training] [2023-03-23T22:21:20.605587] sample_rate: 22050 [Training] [2023-03-23T22:21:20.608578] load_conditioning: True [Training] [2023-03-23T22:21:20.612568] num_conditioning_candidates: 2 [Training] [2023-03-23T22:21:20.615560] conditioning_length: 44000 [Training] [2023-03-23T22:21:20.619550] use_bpe_tokenizer: True [Training] [2023-03-23T22:21:20.623538] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json [Training] [2023-03-23T22:21:20.627542] load_aligned_codes: False [Training] [2023-03-23T22:21:20.631516] data_type: img [Training] [2023-03-23T22:21:20.634508] ] [Training] [2023-03-23T22:21:20.638514] val:[ [Training] [2023-03-23T22:21:20.642498] name: validation [Training] [2023-03-23T22:21:20.646476] n_workers: 2 [Training] [2023-03-23T22:21:20.649468] batch_size: 3 [Training] [2023-03-23T22:21:20.653457] mode: paired_voice_audio [Training] [2023-03-23T22:21:20.656450] path: ./training/desco/validation.txt [Training] [2023-03-23T22:21:20.659442] fetcher_mode: ['lj'] [Training] [2023-03-23T22:21:20.662433] phase: val [Training] [2023-03-23T22:21:20.666423] max_wav_length: 255995 [Training] [2023-03-23T22:21:20.670414] max_text_length: 200 [Training] [2023-03-23T22:21:20.673431] sample_rate: 22050 [Training] [2023-03-23T22:21:20.676423] load_conditioning: True [Training] [2023-03-23T22:21:20.679388] num_conditioning_candidates: 2 [Training] [2023-03-23T22:21:20.682380] conditioning_length: 44000 [Training] [2023-03-23T22:21:20.687371] use_bpe_tokenizer: True [Training] [2023-03-23T22:21:20.690359] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json [Training] [2023-03-23T22:21:20.693351] load_aligned_codes: False [Training] [2023-03-23T22:21:20.696343] data_type: img [Training] [2023-03-23T22:21:20.700332] ] [Training] [2023-03-23T22:21:20.703325] ] [Training] [2023-03-23T22:21:20.707314] steps:[ [Training] [2023-03-23T22:21:20.710308] gpt_train:[ [Training] [2023-03-23T22:21:20.713298] training: gpt [Training] [2023-03-23T22:21:20.718285] loss_log_buffer: 500 [Training] [2023-03-23T22:21:20.721276] optimizer: adamw [Training] [2023-03-23T22:21:20.725266] optimizer_params:[ [Training] [2023-03-23T22:21:20.728258] lr: 1e-05 [Training] [2023-03-23T22:21:20.732248] weight_decay: 0.01 [Training] [2023-03-23T22:21:20.735240] beta1: 0.9 [Training] [2023-03-23T22:21:20.739228] beta2: 0.96 [Training] [2023-03-23T22:21:20.742220] ] [Training] [2023-03-23T22:21:20.745212] clip_grad_eps: 4 [Training] [2023-03-23T22:21:20.749202] injectors:[ [Training] [2023-03-23T22:21:20.752194] paired_to_mel:[ [Training] [2023-03-23T22:21:20.756183] type: torch_mel_spectrogram [Training] [2023-03-23T22:21:20.759178] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth [Training] [2023-03-23T22:21:20.763164] in: wav [Training] [2023-03-23T22:21:20.766157] out: paired_mel [Training] [2023-03-23T22:21:20.770146] ] [Training] [2023-03-23T22:21:20.773137] paired_cond_to_mel:[ [Training] [2023-03-23T22:21:20.779121] type: for_each [Training] [2023-03-23T22:21:20.784109] subtype: torch_mel_spectrogram [Training] [2023-03-23T22:21:20.787101] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth [Training] [2023-03-23T22:21:20.791090] in: conditioning [Training] [2023-03-23T22:21:20.795080] out: paired_conditioning_mel [Training] [2023-03-23T22:21:20.799068] ] [Training] [2023-03-23T22:21:20.802060] to_codes:[ [Training] [2023-03-23T22:21:20.806050] type: discrete_token [Training] [2023-03-23T22:21:20.810039] in: paired_mel [Training] [2023-03-23T22:21:20.813030] out: paired_mel_codes [Training] [2023-03-23T22:21:20.816022] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml [Training] [2023-03-23T22:21:20.820013] ] [Training] [2023-03-23T22:21:20.823005] paired_fwd_text:[ [Training] [2023-03-23T22:21:20.825996] type: generator [Training] [2023-03-23T22:21:20.829986] generator: gpt [Training] [2023-03-23T22:21:20.832978] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths'] [Training] [2023-03-23T22:21:20.835992] out: ['loss_text_ce', 'loss_mel_ce', 'logits'] [Training] [2023-03-23T22:21:20.838962] ] [Training] [2023-03-23T22:21:20.842952] ] [Training] [2023-03-23T22:21:20.845944] losses:[ [Training] [2023-03-23T22:21:20.848941] text_ce:[ [Training] [2023-03-23T22:21:20.852926] type: direct [Training] [2023-03-23T22:21:20.855917] weight: 0.01 [Training] [2023-03-23T22:21:20.858935] key: loss_text_ce [Training] [2023-03-23T22:21:20.862898] ] [Training] [2023-03-23T22:21:20.865890] mel_ce:[ [Training] [2023-03-23T22:21:20.868881] type: direct [Training] [2023-03-23T22:21:20.871874] weight: 1 [Training] [2023-03-23T22:21:20.874866] key: loss_mel_ce [Training] [2023-03-23T22:21:20.877858] ] [Training] [2023-03-23T22:21:20.880850] ] [Training] [2023-03-23T22:21:20.883842] ] [Training] [2023-03-23T22:21:20.887831] ] [Training] [2023-03-23T22:21:20.890823] networks:[ [Training] [2023-03-23T22:21:20.893815] gpt:[ [Training] [2023-03-23T22:21:20.896807] type: generator [Training] [2023-03-23T22:21:20.899799] which_model_G: unified_voice2 [Training] [2023-03-23T22:21:20.903788] kwargs:[ [Training] [2023-03-23T22:21:20.906780] layers: 30 [Training] [2023-03-23T22:21:20.909772] model_dim: 1024 [Training] [2023-03-23T22:21:20.912764] heads: 16 [Training] [2023-03-23T22:21:20.915756] max_text_tokens: 402 [Training] [2023-03-23T22:21:20.919746] max_mel_tokens: 604 [Training] [2023-03-23T22:21:20.923734] max_conditioning_inputs: 2 [Training] [2023-03-23T22:21:20.925729] mel_length_compression: 1024 [Training] [2023-03-23T22:21:20.929718] number_text_tokens: 256 [Training] [2023-03-23T22:21:20.932711] number_mel_codes: 8194 [Training] [2023-03-23T22:21:20.936700] start_mel_token: 8192 [Training] [2023-03-23T22:21:20.939692] stop_mel_token: 8193 [Training] [2023-03-23T22:21:20.942684] start_text_token: 255 [Training] [2023-03-23T22:21:20.946674] train_solo_embeddings: False [Training] [2023-03-23T22:21:20.949666] use_mel_codes_as_input: True [Training] [2023-03-23T22:21:20.953654] checkpointing: True [Training] [2023-03-23T22:21:20.956647] tortoise_compat: True [Training] [2023-03-23T22:21:20.960636] ] [Training] [2023-03-23T22:21:20.963628] ] [Training] [2023-03-23T22:21:20.967618] ] [Training] [2023-03-23T22:21:20.971607] path:[ [Training] [2023-03-23T22:21:20.974616] strict_load: True [Training] [2023-03-23T22:21:20.977591] pretrain_model_gpt: ./models/tortoise/autoregressive.pth [Training] [2023-03-23T22:21:20.981580] root: ./ [Training] [2023-03-23T22:21:20.984572] experiments_root: ./training\desco\finetune [Training] [2023-03-23T22:21:20.987564] models: ./training\desco\finetune\models [Training] [2023-03-23T22:21:20.991553] training_state: ./training\desco\finetune\training_state [Training] [2023-03-23T22:21:20.994546] log: ./training\desco\finetune [Training] [2023-03-23T22:21:20.997537] val_images: ./training\desco\finetune\val_images [Training] [2023-03-23T22:21:21.000529] ] [Training] [2023-03-23T22:21:21.004518] train:[ [Training] [2023-03-23T22:21:21.007510] niter: 500 [Training] [2023-03-23T22:21:21.011500] warmup_iter: -1 [Training] [2023-03-23T22:21:21.015522] mega_batch_factor: 37 [Training] [2023-03-23T22:21:21.018492] val_freq: 5 [Training] [2023-03-23T22:21:21.022471] ema_enabled: False [Training] [2023-03-23T22:21:21.025463] default_lr_scheme: MultiStepLR [Training] [2023-03-23T22:21:21.028473] gen_lr_steps: [2, 4, 9, 18, 25, 33, 50] [Training] [2023-03-23T22:21:21.032445] lr_gamma: 0.5 [Training] [2023-03-23T22:21:21.035436] ] [Training] [2023-03-23T22:21:21.038428] eval:[ [Training] [2023-03-23T22:21:21.041420] pure: False [Training] [2023-03-23T22:21:21.045409] output_state: gen [Training] [2023-03-23T22:21:21.048401] ] [Training] [2023-03-23T22:21:21.051393] logger:[ [Training] [2023-03-23T22:21:21.054385] save_checkpoint_freq: 5 [Training] [2023-03-23T22:21:21.057377] visuals: ['gen', 'mel'] [Training] [2023-03-23T22:21:21.060369] visual_debug_rate: 5 [Training] [2023-03-23T22:21:21.063361] is_mel_spectrogram: True [Training] [2023-03-23T22:21:21.066354] ] [Training] [2023-03-23T22:21:21.069345] is_train: True [Training] [2023-03-23T22:21:21.072337] dist: False [Training] [2023-03-23T22:21:21.076327] [Training] [2023-03-23T22:21:21.079338] 23-03-23 22:21:20.531 - INFO: Random seed: 9214 [Training] [2023-03-23T22:21:24.363664] 23-03-23 22:21:24.358 - INFO: Number of training data elements: 74, iters: 1 [Training] [2023-03-23T22:21:24.368661] 23-03-23 22:21:24.363 - INFO: Total epochs needed: 500 for iters 500 [Training] [2023-03-23T22:21:26.067199] E:\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`. [Training] [2023-03-23T22:21:26.072200] warnings.warn( [Training] [2023-03-23T22:21:38.064530] 23-03-23 22:21:38.060 - INFO: Loading model for [./models/tortoise/autoregressive.pth] [Training] [2023-03-23T22:21:39.024503] 23-03-23 22:21:39.008 - INFO: Start training from epoch: 0, iter: 0 [Training] [2023-03-23T22:21:41.296396] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-03-23T22:21:44.228557] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-03-23T22:22:17.483483] 23-03-23 22:22:17.474 - INFO: Saving models and training states. [Training] [2023-03-23T22:22:17.491461] 23-03-23 22:22:17.483 - INFO: Finished training! ``` And the lr_scheduler one (Though it didn't show in this instance): ``` Warning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. ```

And the lr_scheduler one (Though it didn't show in this instance):

This this one occurs for me also, but AFAIK it's harmless, the only thing there that looks weird to me is:

[Training] [2023-03-23T22:21:20.580653] batch_size: 148

But...

[Training] [2023-03-23T22:21:24.363664] 23-03-23 22:21:24.358 - INFO: Number of training data elements: 74, iters: 1

Your batch size is double your dataset size. Try reducing it to the same size.

> And the lr_scheduler one (Though it didn't show in this instance): This this one occurs for me also, but AFAIK it's harmless, the only thing there that looks weird to me is: >[Training] [2023-03-23T22:21:20.580653] batch_size: 148 But... > [Training] [2023-03-23T22:21:24.363664] 23-03-23 22:21:24.358 - INFO: Number of training data elements: 74, iters: 1 Your batch size is double your dataset size. Try reducing it to the same size.
Author

Oh, I did the same with a batch size matching the dataset size before this! I just adjusted it to double to make it evenly divisible by the gradient acc size, since that was the solution to an unrelated issue that I was troubleshooting (A 'list index out of error range' error. The original settings are as follows:

Batch size is larger than your dataset, clamping batch size to: 74
Batch size is not evenly divisible by the gradient accumulation size, adjusting gradient accumulation size to: 2
Batch ratio (37) is expected to exceed your VRAM capacity (8.000GB, suggested 4 batch size cap), adjusting gradient accumulation size to: 18
! EXPERIMENTAL ! BitsAndBytes requested.
For 500 epochs with 74 lines in batches of 74, iterating for 500 steps (1) steps per epoch)

So, basically the same, but with a batch size of 74 and a gradient acc size of 18. Curiously, I don't have the out of range error anymore, now it's just the same set you saw above in the first stack, with the instantly finishing model training.

Oh, I did the same with a batch size matching the dataset size before this! I just adjusted it to double to make it evenly divisible by the gradient acc size, since that was the solution to an unrelated issue that I was troubleshooting (A 'list index out of error range' error. The original settings are as follows: ``` Batch size is larger than your dataset, clamping batch size to: 74 Batch size is not evenly divisible by the gradient accumulation size, adjusting gradient accumulation size to: 2 Batch ratio (37) is expected to exceed your VRAM capacity (8.000GB, suggested 4 batch size cap), adjusting gradient accumulation size to: 18 ! EXPERIMENTAL ! BitsAndBytes requested. For 500 epochs with 74 lines in batches of 74, iterating for 500 steps (1) steps per epoch) ``` So, basically the same, but with a batch size of 74 and a gradient acc size of 18. Curiously, I don't have the out of range error anymore, now it's just the same set you saw above in the first stack, with the instantly finishing model training.

Huh, that's weird. Here's a log for some training I did earlier today to compare with:

Spawning process:  ./train.sh ./training/HyeonSeo/train.yaml
[Training] [2023-03-23T13:12:10.328080] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64')}
[Training] [2023-03-23T13:12:10.332896]   warn(msg)
[Training] [2023-03-23T13:12:10.337398] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64:/usr/lib/wsl/lib: did not contain libcudart.so as expected! Searching further paths...
[Training] [2023-03-23T13:12:10.343533]   warn(msg)
[Training] [2023-03-23T13:12:10.348468] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
[Training] [2023-03-23T13:12:10.353517]   warn(msg)
[Training] [2023-03-23T13:12:10.944038] WARNING:torch.distributed.run:
[Training] [2023-03-23T13:12:10.949758] *****************************************
[Training] [2023-03-23T13:12:10.954934] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[Training] [2023-03-23T13:12:10.961200] *****************************************
[Training] [2023-03-23T13:12:13.124055]
[Training] [2023-03-23T13:12:13.129117] ===================================BUG REPORT===================================
[Training] [2023-03-23T13:12:13.133933] Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
[Training] [2023-03-23T13:12:13.140507] ================================================================================
[Training] [2023-03-23T13:12:13.145093]
[Training] [2023-03-23T13:12:13.149419] ===================================BUG REPORT===================================
[Training] [2023-03-23T13:12:13.154263] Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
[Training] [2023-03-23T13:12:13.158623] ================================================================================
[Training] [2023-03-23T13:12:13.238219] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64')}
[Training] [2023-03-23T13:12:13.243136]   warn(msg)
[Training] [2023-03-23T13:12:13.247165] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64:/home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64:/usr/lib/wsl/lib: did not contain libcudart.so as expected! Searching further paths...
[Training] [2023-03-23T13:12:13.253072]   warn(msg)
[Training] [2023-03-23T13:12:13.260501] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
[Training] [2023-03-23T13:12:13.267581]   warn(msg)
[Training] [2023-03-23T13:12:13.272367] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_j_rli4yy/none_769sz1m0/attempt_0/1/error.json')}
[Training] [2023-03-23T13:12:13.276923]   warn(msg)
[Training] [2023-03-23T13:12:13.280874] CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
[Training] [2023-03-23T13:12:13.285122] CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
[Training] [2023-03-23T13:12:13.289193] CUDA SETUP: Highest compute capability among GPUs detected: 8.6
[Training] [2023-03-23T13:12:13.293738] CUDA SETUP: Detected CUDA version 118
[Training] [2023-03-23T13:12:13.298467] CUDA SETUP: Loading binary /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
[Training] [2023-03-23T13:12:13.303879] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64')}
[Training] [2023-03-23T13:12:13.308047]   warn(msg)
[Training] [2023-03-23T13:12:13.315069] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64:/home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64:/usr/lib/wsl/lib: did not contain libcudart.so as expected! Searching further paths...
[Training] [2023-03-23T13:12:13.319544]   warn(msg)
[Training] [2023-03-23T13:12:13.323918] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
[Training] [2023-03-23T13:12:13.328921]   warn(msg)
[Training] [2023-03-23T13:12:13.333743] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_j_rli4yy/none_769sz1m0/attempt_0/0/error.json')}
[Training] [2023-03-23T13:12:13.338132]   warn(msg)
[Training] [2023-03-23T13:12:13.342703] CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
[Training] [2023-03-23T13:12:13.347503] CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
[Training] [2023-03-23T13:12:13.353330] CUDA SETUP: Highest compute capability among GPUs detected: 8.6
[Training] [2023-03-23T13:12:13.357901] CUDA SETUP: Detected CUDA version 118
[Training] [2023-03-23T13:12:13.362184] CUDA SETUP: Loading binary /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
[Training] [2023-03-23T13:12:16.972000] 23-03-23 13:12:16.971 - INFO:   name: HyeonSeo
[Training] [2023-03-23T13:12:16.976824]   model: extensibletrainer
[Training] [2023-03-23T13:12:16.980637]   scale: 1
[Training] [2023-03-23T13:12:16.986957]   gpu_ids: [0]
[Training] [2023-03-23T13:12:16.990795]   start_step: 0
[Training] [2023-03-23T13:12:16.995477]   checkpointing_enabled: True
[Training] [2023-03-23T13:12:17.000389]   fp16: False
[Training] [2023-03-23T13:12:17.004785]   bitsandbytes: True
[Training] [2023-03-23T13:12:17.008096]   gpus: 2
[Training] [2023-03-23T13:12:17.011382]   datasets:[
[Training] [2023-03-23T13:12:17.014762]     train:[
[Training] [2023-03-23T13:12:17.019084]       name: training
[Training] [2023-03-23T13:12:17.022663]       n_workers: 2
[Training] [2023-03-23T13:12:17.026089]       batch_size: 128
[Training] [2023-03-23T13:12:17.029547]       mode: paired_voice_audio
[Training] [2023-03-23T13:12:17.033081]       path: ./training/HyeonSeo/train.txt
[Training] [2023-03-23T13:12:17.037349]       fetcher_mode: ['lj']
[Training] [2023-03-23T13:12:17.040867]       phase: train
[Training] [2023-03-23T13:12:17.044286]       max_wav_length: 255995
[Training] [2023-03-23T13:12:17.049296]       max_text_length: 200
[Training] [2023-03-23T13:12:17.054039]       sample_rate: 22050
[Training] [2023-03-23T13:12:17.059547]       load_conditioning: True
[Training] [2023-03-23T13:12:17.063352]       num_conditioning_candidates: 2
[Training] [2023-03-23T13:12:17.067402]       conditioning_length: 44000
[Training] [2023-03-23T13:12:17.073203]       use_bpe_tokenizer: True
[Training] [2023-03-23T13:12:17.077798]       tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-03-23T13:12:17.081579]       load_aligned_codes: False
[Training] [2023-03-23T13:12:17.085849]       data_type: img
[Training] [2023-03-23T13:12:17.089773]     ]
[Training] [2023-03-23T13:12:17.093255]     val:[
[Training] [2023-03-23T13:12:17.096463]       name: validation
[Training] [2023-03-23T13:12:17.100331]       n_workers: 2
[Training] [2023-03-23T13:12:17.104311]       batch_size: 8
[Training] [2023-03-23T13:12:17.107604]       mode: paired_voice_audio
[Training] [2023-03-23T13:12:17.111134]       path: ./training/HyeonSeo/validation.txt
[Training] [2023-03-23T13:12:17.114455]       fetcher_mode: ['lj']
[Training] [2023-03-23T13:12:17.118348]       phase: val
[Training] [2023-03-23T13:12:17.121915]       max_wav_length: 255995
[Training] [2023-03-23T13:12:17.125254]       max_text_length: 200
[Training] [2023-03-23T13:12:17.128430]       sample_rate: 22050
[Training] [2023-03-23T13:12:17.133751]       load_conditioning: True
[Training] [2023-03-23T13:12:17.138340]       num_conditioning_candidates: 2
[Training] [2023-03-23T13:12:17.142730]       conditioning_length: 44000
[Training] [2023-03-23T13:12:17.146321]       use_bpe_tokenizer: True
[Training] [2023-03-23T13:12:17.150377]       tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-03-23T13:12:17.155509]       load_aligned_codes: False
[Training] [2023-03-23T13:12:17.160075]       data_type: img
[Training] [2023-03-23T13:12:17.163829]     ]
[Training] [2023-03-23T13:12:17.168444]   ]
[Training] [2023-03-23T13:12:17.173315]   steps:[
[Training] [2023-03-23T13:12:17.177708]     gpt_train:[
[Training] [2023-03-23T13:12:17.181311]       training: gpt
[Training] [2023-03-23T13:12:17.186100]       loss_log_buffer: 500
[Training] [2023-03-23T13:12:17.190527]       optimizer: adamw
[Training] [2023-03-23T13:12:17.195036]       optimizer_params:[
[Training] [2023-03-23T13:12:17.199469]         lr: 5e-05
[Training] [2023-03-23T13:12:17.204747]         weight_decay: 0.01
[Training] [2023-03-23T13:12:17.209475]         beta1: 0.9
[Training] [2023-03-23T13:12:17.214369]         beta2: 0.96
[Training] [2023-03-23T13:12:17.219069]       ]
[Training] [2023-03-23T13:12:17.223716]       clip_grad_eps: 4
[Training] [2023-03-23T13:12:17.228031]       injectors:[
[Training] [2023-03-23T13:12:17.233799]         paired_to_mel:[
[Training] [2023-03-23T13:12:17.238948]           type: torch_mel_spectrogram
[Training] [2023-03-23T13:12:17.243814]           mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-03-23T13:12:17.247333]           in: wav
[Training] [2023-03-23T13:12:17.251047]           out: paired_mel
[Training] [2023-03-23T13:12:17.255530]         ]
[Training] [2023-03-23T13:12:17.259337]         paired_cond_to_mel:[
[Training] [2023-03-23T13:12:17.263483]           type: for_each
[Training] [2023-03-23T13:12:17.267671]           subtype: torch_mel_spectrogram
[Training] [2023-03-23T13:12:17.272279]           mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-03-23T13:12:17.276555]           in: conditioning
[Training] [2023-03-23T13:12:17.280151]           out: paired_conditioning_mel
[Training] [2023-03-23T13:12:17.283983]         ]
[Training] [2023-03-23T13:12:17.287582]         to_codes:[
[Training] [2023-03-23T13:12:17.291257]           type: discrete_token
[Training] [2023-03-23T13:12:17.295126]           in: paired_mel
[Training] [2023-03-23T13:12:17.299058]           out: paired_mel_codes
[Training] [2023-03-23T13:12:17.303663]           dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml
[Training] [2023-03-23T13:12:17.307765]         ]
[Training] [2023-03-23T13:12:17.311091]         paired_fwd_text:[
[Training] [2023-03-23T13:12:17.315081]           type: generator
[Training] [2023-03-23T13:12:17.319685]           generator: gpt
[Training] [2023-03-23T13:12:17.323524]           in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
[Training] [2023-03-23T13:12:17.328618]           out: ['loss_text_ce', 'loss_mel_ce', 'logits']
[Training] [2023-03-23T13:12:17.333509]         ]
[Training] [2023-03-23T13:12:17.338066]       ]
[Training] [2023-03-23T13:12:17.342744]       losses:[
[Training] [2023-03-23T13:12:17.347367]         text_ce:[
[Training] [2023-03-23T13:12:17.351927]           type: direct
[Training] [2023-03-23T13:12:17.356786]           weight: 0.01
[Training] [2023-03-23T13:12:17.361431]           key: loss_text_ce
[Training] [2023-03-23T13:12:17.365340]         ]
[Training] [2023-03-23T13:12:17.370061]         mel_ce:[
[Training] [2023-03-23T13:12:17.374028]           type: direct
[Training] [2023-03-23T13:12:17.377579]           weight: 1
[Training] [2023-03-23T13:12:17.380976]           key: loss_mel_ce
[Training] [2023-03-23T13:12:17.385009]         ]
[Training] [2023-03-23T13:12:17.389024]       ]
[Training] [2023-03-23T13:12:17.392393]     ]
[Training] [2023-03-23T13:12:17.395662]   ]
[Training] [2023-03-23T13:12:17.399439]   networks:[
[Training] [2023-03-23T13:12:17.403701]     gpt:[
[Training] [2023-03-23T13:12:17.406899]       type: generator
[Training] [2023-03-23T13:12:17.410173]       which_model_G: unified_voice2
[Training] [2023-03-23T13:12:17.413801]       kwargs:[
[Training] [2023-03-23T13:12:17.417678]         layers: 30
[Training] [2023-03-23T13:12:17.421208]         model_dim: 1024
[Training] [2023-03-23T13:12:17.425594]         heads: 16
[Training] [2023-03-23T13:12:17.429363]         max_text_tokens: 402
[Training] [2023-03-23T13:12:17.433341]         max_mel_tokens: 604
[Training] [2023-03-23T13:12:17.438123]         max_conditioning_inputs: 2
[Training] [2023-03-23T13:12:17.442102]         mel_length_compression: 1024
[Training] [2023-03-23T13:12:17.446080]         number_text_tokens: 256
[Training] [2023-03-23T13:12:17.449881]         number_mel_codes: 8194
[Training] [2023-03-23T13:12:17.454356]         start_mel_token: 8192
[Training] [2023-03-23T13:12:17.458325]         stop_mel_token: 8193
[Training] [2023-03-23T13:12:17.461909]         start_text_token: 255
[Training] [2023-03-23T13:12:17.465322]         train_solo_embeddings: False
[Training] [2023-03-23T13:12:17.469110]         use_mel_codes_as_input: True
[Training] [2023-03-23T13:12:17.472605]         checkpointing: True
[Training] [2023-03-23T13:12:17.476309]         tortoise_compat: True
[Training] [2023-03-23T13:12:17.479557]       ]
[Training] [2023-03-23T13:12:17.483441]     ]
[Training] [2023-03-23T13:12:17.487863]   ]
[Training] [2023-03-23T13:12:17.492004]   path:[
[Training] [2023-03-23T13:12:17.495432]     strict_load: True
[Training] [2023-03-23T13:12:17.498921]     resume_state: ./training/HyeonSeo/finetune/training_state//900.state
[Training] [2023-03-23T13:12:17.502915]     root: ./
[Training] [2023-03-23T13:12:17.506651]     experiments_root: ./training/HyeonSeo/finetune
[Training] [2023-03-23T13:12:17.511113]     models: ./training/HyeonSeo/finetune/models
[Training] [2023-03-23T13:12:17.515442]     training_state: ./training/HyeonSeo/finetune/training_state
[Training] [2023-03-23T13:12:17.520792]     log: ./training/HyeonSeo/finetune
[Training] [2023-03-23T13:12:17.525027]     val_images: ./training/HyeonSeo/finetune/val_images
[Training] [2023-03-23T13:12:17.528562]   ]
[Training] [2023-03-23T13:12:17.531945]   train:[
[Training] [2023-03-23T13:12:17.537227]     niter: 3000
[Training] [2023-03-23T13:12:17.541841]     warmup_iter: -1
[Training] [2023-03-23T13:12:17.546275]     mega_batch_factor: 16
[Training] [2023-03-23T13:12:17.550549]     val_freq: 20
[Training] [2023-03-23T13:12:17.554995]     ema_enabled: False
[Training] [2023-03-23T13:12:17.558572]     default_lr_scheme: MultiStepLR
[Training] [2023-03-23T13:12:17.561913]     gen_lr_steps: [8, 16, 36, 72, 100, 132, 200]
[Training] [2023-03-23T13:12:17.565301]     lr_gamma: 0.5
[Training] [2023-03-23T13:12:17.569521]   ]
[Training] [2023-03-23T13:12:17.573290]   eval:[
[Training] [2023-03-23T13:12:17.576439]     pure: True
[Training] [2023-03-23T13:12:17.579633]     output_state: gen
[Training] [2023-03-23T13:12:17.582900]   ]
[Training] [2023-03-23T13:12:17.587447]   logger:[
[Training] [2023-03-23T13:12:17.591928]     save_checkpoint_freq: 100
[Training] [2023-03-23T13:12:17.595732]     visuals: ['gen', 'mel']
[Training] [2023-03-23T13:12:17.599053]     visual_debug_rate: 100
[Training] [2023-03-23T13:12:17.602715]     is_mel_spectrogram: True
[Training] [2023-03-23T13:12:17.606076]   ]
[Training] [2023-03-23T13:12:17.609250]   is_train: True
[Training] [2023-03-23T13:12:17.614529]   dist: True
[Training] [2023-03-23T13:12:17.619572]
[Training] [2023-03-23T13:12:17.623729] 23-03-23 13:12:16.971 - INFO: Set model [gpt] to ./training/HyeonSeo/finetune/models/900_gpt.pth
[Training] [2023-03-23T13:12:17.627991] 23-03-23 13:12:16.972 - INFO: Random seed: 2481
[Training] [2023-03-23T13:12:17.887750] 23-03-23 13:12:17.887 - INFO: Number of training data elements: 512, iters: 4
[Training] [2023-03-23T13:12:17.893781] 23-03-23 13:12:17.887 - INFO: Total epochs needed: 750 for iters 3,000
[Training] [2023-03-23T13:12:17.898250] 23-03-23 13:12:17.892 - INFO: Number of val images in [validation]: 120
[Training] [2023-03-23T13:12:18.975519] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/transformers/configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
[Training] [2023-03-23T13:12:18.980206]   warnings.warn(
[Training] [2023-03-23T13:12:18.985660] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/transformers/configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
[Training] [2023-03-23T13:12:18.991716]   warnings.warn(
[Training] [2023-03-23T13:12:28.744508] Loading from ./models/tortoise/dvae.pth
[Training] [2023-03-23T13:12:28.760195] Loading from ./models/tortoise/dvae.pth
[Training] [2023-03-23T13:12:30.121838] 23-03-23 13:12:30.120 - INFO: Loading model for [./training/HyeonSeo/finetune/models/900_gpt.pth]
[Training] [2023-03-23T13:12:35.404856] 23-03-23 13:12:35.403 - INFO: Resuming training from epoch: 221, iter: 900.
[Training] [2023-03-23T13:12:35.666013] 23-03-23 13:12:35.545 - INFO: Start training from epoch: 221, iter: 900
[Training] [2023-03-23T13:12:52.567346] 23-03-23 13:12:52.565 - INFO: Training Metrics: {"loss_text_ce": 2.561223030090332, "loss_mel_ce": 0.7925429344177246, "loss_gpt_total": 0.8181551098823547, "lr": 3.90625e-07, "it": 901, "step": 1, "steps": 4, "epoch": 221, "iteration_rate": 14.786593198776245}
[Training] [2023-03-23T13:13:04.799345] 23-03-23 13:13:04.798 - INFO: Training Metrics: {"loss_text_ce": 2.5922513008117676, "loss_mel_ce": 0.7928056716918945, "loss_gpt_total": 0.8187281489372253, "lr": 3.90625e-07, "it": 902, "step": 2, "steps": 4, "epoch": 221, "iteration_rate": 12.228018283843994}
[Training] [2023-03-23T13:13:17.313830] 23-03-23 13:13:17.312 - INFO: Training Metrics: {"loss_text_ce": 2.588486909866333, "loss_mel_ce": 0.7975842356681824, "loss_gpt_total": 0.8234691023826599, "lr": 3.90625e-07, "it": 903, "step": 3, "steps": 4, "epoch": 221, "iteration_rate": 12.510688066482544}
[Training] [2023-03-23T13:13:29.629437] 23-03-23 13:13:29.628 - INFO: Training Metrics: {"loss_text_ce": 2.598679780960083, "loss_mel_ce": 0.8032808303833008, "loss_gpt_total": 0.829267680644989, "lr": 3.90625e-07, "it": 904, "step": 4, "steps": 4, "epoch": 221, "iteration_rate": 12.311805963516235}
[Training] [2023-03-23T13:13:43.647493] 23-03-23 13:13:43.646 - INFO: Training Metrics: {"loss_text_ce": 2.62754487991333, "loss_mel_ce": 0.8290464282035828, "loss_gpt_total": 0.855322003364563, "lr": 3.90625e-07, "it": 905, "step": 1, "steps": 4, "epoch": 222, "iteration_rate": 12.529346704483032}
[Training] [2023-03-23T13:13:56.050374] 23-03-23 13:13:56.049 - INFO: Training Metrics: {"loss_text_ce": 2.6233415603637695, "loss_mel_ce": 0.8267382979393005, "loss_gpt_total": 0.8529717922210693, "lr": 3.90625e-07, "it": 906, "step": 2, "steps": 4, "epoch": 222, "iteration_rate": 12.39871072769165}
[Training] [2023-03-23T13:14:08.341644] 23-03-23 13:14:08.340 - INFO: Training Metrics: {"loss_text_ce": 2.6205732822418213, "loss_mel_ce": 0.82821124792099, "loss_gpt_total": 0.8544170260429382, "lr": 3.90625e-07, "it": 907, "step": 3, "steps": 4, "epoch": 222, "iteration_rate": 12.28763198852539}
[Training] [2023-03-23T13:14:20.678430] 23-03-23 13:14:20.677 - INFO: Training Metrics: {"loss_text_ce": 2.6153459548950195, "loss_mel_ce": 0.8285800218582153, "loss_gpt_total": 0.8547335267066956, "lr": 3.90625e-07, "it": 908, "step": 4, "steps": 4, "epoch": 222, "iteration_rate": 12.333005666732788}
[Training] [2023-03-23T13:14:34.947494] 23-03-23 13:14:34.946 - INFO: Training Metrics: {"loss_text_ce": 2.600755214691162, "loss_mel_ce": 0.8313522934913635, "loss_gpt_total": 0.8573598861694336, "lr": 3.90625e-07, "it": 909, "step": 1, "steps": 4, "epoch": 223, "iteration_rate": 12.734298467636108}
[Training] [2023-03-23T13:14:48.003932] 23-03-23 13:14:48.002 - INFO: Training Metrics: {"loss_text_ce": 2.5963339805603027, "loss_mel_ce": 0.8327493667602539, "loss_gpt_total": 0.8587127923965454, "lr": 3.90625e-07, "it": 910, "step": 2, "steps": 4, "epoch": 223, "iteration_rate": 13.052069902420044}
[Training] [2023-03-23T13:15:00.840959] 23-03-23 13:15:00.839 - INFO: Training Metrics: {"loss_text_ce": 2.5941975116729736, "loss_mel_ce": 0.8326621651649475, "loss_gpt_total": 0.8586041927337646, "lr": 3.90625e-07, "it": 911, "step": 3, "steps": 4, "epoch": 223, "iteration_rate": 12.83292841911316}
[Training] [2023-03-23T13:15:13.773065] 23-03-23 13:15:13.772 - INFO: Training Metrics: {"loss_text_ce": 2.593618869781494, "loss_mel_ce": 0.8316826820373535, "loss_gpt_total": 0.8576189279556274, "lr": 3.90625e-07, "it": 912, "step": 4, "steps": 4, "epoch": 223, "iteration_rate": 12.928085088729858}
[Training] [2023-03-23T13:15:28.440042] 23-03-23 13:15:28.438 - INFO: Training Metrics: {"loss_text_ce": 2.5887019634246826, "loss_mel_ce": 0.8272339701652527, "loss_gpt_total": 0.8531210422515869, "lr": 3.90625e-07, "it": 913, "step": 1, "steps": 4, "epoch": 224, "iteration_rate": 13.05258560180664}
[Training] [2023-03-23T13:15:41.722726] 23-03-23 13:15:41.721 - INFO: Training Metrics: {"loss_text_ce": 2.593032121658325, "loss_mel_ce": 0.8263492584228516, "loss_gpt_total": 0.852279543876648, "lr": 3.90625e-07, "it": 914, "step": 2, "steps": 4, "epoch": 224, "iteration_rate": 13.278674602508545}
[Training] [2023-03-23T13:15:54.976470] 23-03-23 13:15:54.975 - INFO: Training Metrics: {"loss_text_ce": 2.593416929244995, "loss_mel_ce": 0.8254876732826233, "loss_gpt_total": 0.8514218330383301, "lr": 3.90625e-07, "it": 915, "step": 3, "steps": 4, "epoch": 224, "iteration_rate": 13.249351978302002}
[Training] [2023-03-23T13:16:08.553505] 23-03-23 13:16:08.552 - INFO: Training Metrics: {"loss_text_ce": 2.5929136276245117, "loss_mel_ce": 0.8258978724479675, "loss_gpt_total": 0.8518270254135132, "lr": 3.90625e-07, "it": 916, "step": 4, "steps": 4, "epoch": 224, "iteration_rate": 13.573248863220215}
[Training] [2023-03-23T13:16:23.450124] 23-03-23 13:16:23.448 - INFO: Training Metrics: {"loss_text_ce": 2.59108829498291, "loss_mel_ce": 0.8244689702987671, "loss_gpt_total": 0.8503797650337219, "lr": 3.90625e-07, "it": 917, "step": 1, "steps": 4, "epoch": 225, "iteration_rate": 13.316820859909058}
[Training] [2023-03-23T13:16:37.027459] 23-03-23 13:16:37.026 - INFO: Training Metrics: {"loss_text_ce": 2.5956287384033203, "loss_mel_ce": 0.8218204975128174, "loss_gpt_total": 0.8477767109870911, "lr": 3.90625e-07, "it": 918, "step": 2, "steps": 4, "epoch": 225, "iteration_rate": 13.573445558547974}
[Training] [2023-03-23T13:16:50.872448] 23-03-23 13:16:50.871 - INFO: Training Metrics: {"loss_text_ce": 2.595446825027466, "loss_mel_ce": 0.8234513401985168, "loss_gpt_total": 0.849405825138092, "lr": 3.90625e-07, "it": 919, "step": 3, "steps": 4, "epoch": 225, "iteration_rate": 13.840927124023438}
[Training] [2023-03-23T13:17:04.696623] 23-03-23 13:17:04.695 - INFO: Training Metrics: {"loss_text_ce": 2.5941290855407715, "loss_mel_ce": 0.8249985575675964, "loss_gpt_total": 0.8509398698806763, "lr": 3.90625e-07, "it": 920, "step": 4, "steps": 4, "epoch": 225, "iteration_rate": 13.819649696350098}
[Training] [2023-03-23T13:17:20.074128] 23-03-23 13:17:20.073 - INFO: Beginning validation.

Only thing I can think of is try reducing your dataset size to 64 so it divides evenly.

Huh, that's weird. Here's a log for some training I did earlier today to compare with: ``` Spawning process: ./train.sh ./training/HyeonSeo/train.yaml [Training] [2023-03-23T13:12:10.328080] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64')} [Training] [2023-03-23T13:12:10.332896] warn(msg) [Training] [2023-03-23T13:12:10.337398] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64:/usr/lib/wsl/lib: did not contain libcudart.so as expected! Searching further paths... [Training] [2023-03-23T13:12:10.343533] warn(msg) [Training] [2023-03-23T13:12:10.348468] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')} [Training] [2023-03-23T13:12:10.353517] warn(msg) [Training] [2023-03-23T13:12:10.944038] WARNING:torch.distributed.run: [Training] [2023-03-23T13:12:10.949758] ***************************************** [Training] [2023-03-23T13:12:10.954934] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [Training] [2023-03-23T13:12:10.961200] ***************************************** [Training] [2023-03-23T13:12:13.124055] [Training] [2023-03-23T13:12:13.129117] ===================================BUG REPORT=================================== [Training] [2023-03-23T13:12:13.133933] Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues [Training] [2023-03-23T13:12:13.140507] ================================================================================ [Training] [2023-03-23T13:12:13.145093] [Training] [2023-03-23T13:12:13.149419] ===================================BUG REPORT=================================== [Training] [2023-03-23T13:12:13.154263] Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues [Training] [2023-03-23T13:12:13.158623] ================================================================================ [Training] [2023-03-23T13:12:13.238219] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64')} [Training] [2023-03-23T13:12:13.243136] warn(msg) [Training] [2023-03-23T13:12:13.247165] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64:/home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64:/usr/lib/wsl/lib: did not contain libcudart.so as expected! Searching further paths... [Training] [2023-03-23T13:12:13.253072] warn(msg) [Training] [2023-03-23T13:12:13.260501] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')} [Training] [2023-03-23T13:12:13.267581] warn(msg) [Training] [2023-03-23T13:12:13.272367] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_j_rli4yy/none_769sz1m0/attempt_0/1/error.json')} [Training] [2023-03-23T13:12:13.276923] warn(msg) [Training] [2023-03-23T13:12:13.280874] CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... [Training] [2023-03-23T13:12:13.285122] CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so [Training] [2023-03-23T13:12:13.289193] CUDA SETUP: Highest compute capability among GPUs detected: 8.6 [Training] [2023-03-23T13:12:13.293738] CUDA SETUP: Detected CUDA version 118 [Training] [2023-03-23T13:12:13.298467] CUDA SETUP: Loading binary /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so... [Training] [2023-03-23T13:12:13.303879] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64')} [Training] [2023-03-23T13:12:13.308047] warn(msg) [Training] [2023-03-23T13:12:13.315069] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64:/home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/cv2/../../lib64:/usr/lib/wsl/lib: did not contain libcudart.so as expected! Searching further paths... [Training] [2023-03-23T13:12:13.319544] warn(msg) [Training] [2023-03-23T13:12:13.323918] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')} [Training] [2023-03-23T13:12:13.328921] warn(msg) [Training] [2023-03-23T13:12:13.333743] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_j_rli4yy/none_769sz1m0/attempt_0/0/error.json')} [Training] [2023-03-23T13:12:13.338132] warn(msg) [Training] [2023-03-23T13:12:13.342703] CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... [Training] [2023-03-23T13:12:13.347503] CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so [Training] [2023-03-23T13:12:13.353330] CUDA SETUP: Highest compute capability among GPUs detected: 8.6 [Training] [2023-03-23T13:12:13.357901] CUDA SETUP: Detected CUDA version 118 [Training] [2023-03-23T13:12:13.362184] CUDA SETUP: Loading binary /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so... [Training] [2023-03-23T13:12:16.972000] 23-03-23 13:12:16.971 - INFO: name: HyeonSeo [Training] [2023-03-23T13:12:16.976824] model: extensibletrainer [Training] [2023-03-23T13:12:16.980637] scale: 1 [Training] [2023-03-23T13:12:16.986957] gpu_ids: [0] [Training] [2023-03-23T13:12:16.990795] start_step: 0 [Training] [2023-03-23T13:12:16.995477] checkpointing_enabled: True [Training] [2023-03-23T13:12:17.000389] fp16: False [Training] [2023-03-23T13:12:17.004785] bitsandbytes: True [Training] [2023-03-23T13:12:17.008096] gpus: 2 [Training] [2023-03-23T13:12:17.011382] datasets:[ [Training] [2023-03-23T13:12:17.014762] train:[ [Training] [2023-03-23T13:12:17.019084] name: training [Training] [2023-03-23T13:12:17.022663] n_workers: 2 [Training] [2023-03-23T13:12:17.026089] batch_size: 128 [Training] [2023-03-23T13:12:17.029547] mode: paired_voice_audio [Training] [2023-03-23T13:12:17.033081] path: ./training/HyeonSeo/train.txt [Training] [2023-03-23T13:12:17.037349] fetcher_mode: ['lj'] [Training] [2023-03-23T13:12:17.040867] phase: train [Training] [2023-03-23T13:12:17.044286] max_wav_length: 255995 [Training] [2023-03-23T13:12:17.049296] max_text_length: 200 [Training] [2023-03-23T13:12:17.054039] sample_rate: 22050 [Training] [2023-03-23T13:12:17.059547] load_conditioning: True [Training] [2023-03-23T13:12:17.063352] num_conditioning_candidates: 2 [Training] [2023-03-23T13:12:17.067402] conditioning_length: 44000 [Training] [2023-03-23T13:12:17.073203] use_bpe_tokenizer: True [Training] [2023-03-23T13:12:17.077798] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json [Training] [2023-03-23T13:12:17.081579] load_aligned_codes: False [Training] [2023-03-23T13:12:17.085849] data_type: img [Training] [2023-03-23T13:12:17.089773] ] [Training] [2023-03-23T13:12:17.093255] val:[ [Training] [2023-03-23T13:12:17.096463] name: validation [Training] [2023-03-23T13:12:17.100331] n_workers: 2 [Training] [2023-03-23T13:12:17.104311] batch_size: 8 [Training] [2023-03-23T13:12:17.107604] mode: paired_voice_audio [Training] [2023-03-23T13:12:17.111134] path: ./training/HyeonSeo/validation.txt [Training] [2023-03-23T13:12:17.114455] fetcher_mode: ['lj'] [Training] [2023-03-23T13:12:17.118348] phase: val [Training] [2023-03-23T13:12:17.121915] max_wav_length: 255995 [Training] [2023-03-23T13:12:17.125254] max_text_length: 200 [Training] [2023-03-23T13:12:17.128430] sample_rate: 22050 [Training] [2023-03-23T13:12:17.133751] load_conditioning: True [Training] [2023-03-23T13:12:17.138340] num_conditioning_candidates: 2 [Training] [2023-03-23T13:12:17.142730] conditioning_length: 44000 [Training] [2023-03-23T13:12:17.146321] use_bpe_tokenizer: True [Training] [2023-03-23T13:12:17.150377] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json [Training] [2023-03-23T13:12:17.155509] load_aligned_codes: False [Training] [2023-03-23T13:12:17.160075] data_type: img [Training] [2023-03-23T13:12:17.163829] ] [Training] [2023-03-23T13:12:17.168444] ] [Training] [2023-03-23T13:12:17.173315] steps:[ [Training] [2023-03-23T13:12:17.177708] gpt_train:[ [Training] [2023-03-23T13:12:17.181311] training: gpt [Training] [2023-03-23T13:12:17.186100] loss_log_buffer: 500 [Training] [2023-03-23T13:12:17.190527] optimizer: adamw [Training] [2023-03-23T13:12:17.195036] optimizer_params:[ [Training] [2023-03-23T13:12:17.199469] lr: 5e-05 [Training] [2023-03-23T13:12:17.204747] weight_decay: 0.01 [Training] [2023-03-23T13:12:17.209475] beta1: 0.9 [Training] [2023-03-23T13:12:17.214369] beta2: 0.96 [Training] [2023-03-23T13:12:17.219069] ] [Training] [2023-03-23T13:12:17.223716] clip_grad_eps: 4 [Training] [2023-03-23T13:12:17.228031] injectors:[ [Training] [2023-03-23T13:12:17.233799] paired_to_mel:[ [Training] [2023-03-23T13:12:17.238948] type: torch_mel_spectrogram [Training] [2023-03-23T13:12:17.243814] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth [Training] [2023-03-23T13:12:17.247333] in: wav [Training] [2023-03-23T13:12:17.251047] out: paired_mel [Training] [2023-03-23T13:12:17.255530] ] [Training] [2023-03-23T13:12:17.259337] paired_cond_to_mel:[ [Training] [2023-03-23T13:12:17.263483] type: for_each [Training] [2023-03-23T13:12:17.267671] subtype: torch_mel_spectrogram [Training] [2023-03-23T13:12:17.272279] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth [Training] [2023-03-23T13:12:17.276555] in: conditioning [Training] [2023-03-23T13:12:17.280151] out: paired_conditioning_mel [Training] [2023-03-23T13:12:17.283983] ] [Training] [2023-03-23T13:12:17.287582] to_codes:[ [Training] [2023-03-23T13:12:17.291257] type: discrete_token [Training] [2023-03-23T13:12:17.295126] in: paired_mel [Training] [2023-03-23T13:12:17.299058] out: paired_mel_codes [Training] [2023-03-23T13:12:17.303663] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml [Training] [2023-03-23T13:12:17.307765] ] [Training] [2023-03-23T13:12:17.311091] paired_fwd_text:[ [Training] [2023-03-23T13:12:17.315081] type: generator [Training] [2023-03-23T13:12:17.319685] generator: gpt [Training] [2023-03-23T13:12:17.323524] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths'] [Training] [2023-03-23T13:12:17.328618] out: ['loss_text_ce', 'loss_mel_ce', 'logits'] [Training] [2023-03-23T13:12:17.333509] ] [Training] [2023-03-23T13:12:17.338066] ] [Training] [2023-03-23T13:12:17.342744] losses:[ [Training] [2023-03-23T13:12:17.347367] text_ce:[ [Training] [2023-03-23T13:12:17.351927] type: direct [Training] [2023-03-23T13:12:17.356786] weight: 0.01 [Training] [2023-03-23T13:12:17.361431] key: loss_text_ce [Training] [2023-03-23T13:12:17.365340] ] [Training] [2023-03-23T13:12:17.370061] mel_ce:[ [Training] [2023-03-23T13:12:17.374028] type: direct [Training] [2023-03-23T13:12:17.377579] weight: 1 [Training] [2023-03-23T13:12:17.380976] key: loss_mel_ce [Training] [2023-03-23T13:12:17.385009] ] [Training] [2023-03-23T13:12:17.389024] ] [Training] [2023-03-23T13:12:17.392393] ] [Training] [2023-03-23T13:12:17.395662] ] [Training] [2023-03-23T13:12:17.399439] networks:[ [Training] [2023-03-23T13:12:17.403701] gpt:[ [Training] [2023-03-23T13:12:17.406899] type: generator [Training] [2023-03-23T13:12:17.410173] which_model_G: unified_voice2 [Training] [2023-03-23T13:12:17.413801] kwargs:[ [Training] [2023-03-23T13:12:17.417678] layers: 30 [Training] [2023-03-23T13:12:17.421208] model_dim: 1024 [Training] [2023-03-23T13:12:17.425594] heads: 16 [Training] [2023-03-23T13:12:17.429363] max_text_tokens: 402 [Training] [2023-03-23T13:12:17.433341] max_mel_tokens: 604 [Training] [2023-03-23T13:12:17.438123] max_conditioning_inputs: 2 [Training] [2023-03-23T13:12:17.442102] mel_length_compression: 1024 [Training] [2023-03-23T13:12:17.446080] number_text_tokens: 256 [Training] [2023-03-23T13:12:17.449881] number_mel_codes: 8194 [Training] [2023-03-23T13:12:17.454356] start_mel_token: 8192 [Training] [2023-03-23T13:12:17.458325] stop_mel_token: 8193 [Training] [2023-03-23T13:12:17.461909] start_text_token: 255 [Training] [2023-03-23T13:12:17.465322] train_solo_embeddings: False [Training] [2023-03-23T13:12:17.469110] use_mel_codes_as_input: True [Training] [2023-03-23T13:12:17.472605] checkpointing: True [Training] [2023-03-23T13:12:17.476309] tortoise_compat: True [Training] [2023-03-23T13:12:17.479557] ] [Training] [2023-03-23T13:12:17.483441] ] [Training] [2023-03-23T13:12:17.487863] ] [Training] [2023-03-23T13:12:17.492004] path:[ [Training] [2023-03-23T13:12:17.495432] strict_load: True [Training] [2023-03-23T13:12:17.498921] resume_state: ./training/HyeonSeo/finetune/training_state//900.state [Training] [2023-03-23T13:12:17.502915] root: ./ [Training] [2023-03-23T13:12:17.506651] experiments_root: ./training/HyeonSeo/finetune [Training] [2023-03-23T13:12:17.511113] models: ./training/HyeonSeo/finetune/models [Training] [2023-03-23T13:12:17.515442] training_state: ./training/HyeonSeo/finetune/training_state [Training] [2023-03-23T13:12:17.520792] log: ./training/HyeonSeo/finetune [Training] [2023-03-23T13:12:17.525027] val_images: ./training/HyeonSeo/finetune/val_images [Training] [2023-03-23T13:12:17.528562] ] [Training] [2023-03-23T13:12:17.531945] train:[ [Training] [2023-03-23T13:12:17.537227] niter: 3000 [Training] [2023-03-23T13:12:17.541841] warmup_iter: -1 [Training] [2023-03-23T13:12:17.546275] mega_batch_factor: 16 [Training] [2023-03-23T13:12:17.550549] val_freq: 20 [Training] [2023-03-23T13:12:17.554995] ema_enabled: False [Training] [2023-03-23T13:12:17.558572] default_lr_scheme: MultiStepLR [Training] [2023-03-23T13:12:17.561913] gen_lr_steps: [8, 16, 36, 72, 100, 132, 200] [Training] [2023-03-23T13:12:17.565301] lr_gamma: 0.5 [Training] [2023-03-23T13:12:17.569521] ] [Training] [2023-03-23T13:12:17.573290] eval:[ [Training] [2023-03-23T13:12:17.576439] pure: True [Training] [2023-03-23T13:12:17.579633] output_state: gen [Training] [2023-03-23T13:12:17.582900] ] [Training] [2023-03-23T13:12:17.587447] logger:[ [Training] [2023-03-23T13:12:17.591928] save_checkpoint_freq: 100 [Training] [2023-03-23T13:12:17.595732] visuals: ['gen', 'mel'] [Training] [2023-03-23T13:12:17.599053] visual_debug_rate: 100 [Training] [2023-03-23T13:12:17.602715] is_mel_spectrogram: True [Training] [2023-03-23T13:12:17.606076] ] [Training] [2023-03-23T13:12:17.609250] is_train: True [Training] [2023-03-23T13:12:17.614529] dist: True [Training] [2023-03-23T13:12:17.619572] [Training] [2023-03-23T13:12:17.623729] 23-03-23 13:12:16.971 - INFO: Set model [gpt] to ./training/HyeonSeo/finetune/models/900_gpt.pth [Training] [2023-03-23T13:12:17.627991] 23-03-23 13:12:16.972 - INFO: Random seed: 2481 [Training] [2023-03-23T13:12:17.887750] 23-03-23 13:12:17.887 - INFO: Number of training data elements: 512, iters: 4 [Training] [2023-03-23T13:12:17.893781] 23-03-23 13:12:17.887 - INFO: Total epochs needed: 750 for iters 3,000 [Training] [2023-03-23T13:12:17.898250] 23-03-23 13:12:17.892 - INFO: Number of val images in [validation]: 120 [Training] [2023-03-23T13:12:18.975519] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/transformers/configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`. [Training] [2023-03-23T13:12:18.980206] warnings.warn( [Training] [2023-03-23T13:12:18.985660] /home/sneed/ai-voice-cloning/venv/lib/python3.10/site-packages/transformers/configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`. [Training] [2023-03-23T13:12:18.991716] warnings.warn( [Training] [2023-03-23T13:12:28.744508] Loading from ./models/tortoise/dvae.pth [Training] [2023-03-23T13:12:28.760195] Loading from ./models/tortoise/dvae.pth [Training] [2023-03-23T13:12:30.121838] 23-03-23 13:12:30.120 - INFO: Loading model for [./training/HyeonSeo/finetune/models/900_gpt.pth] [Training] [2023-03-23T13:12:35.404856] 23-03-23 13:12:35.403 - INFO: Resuming training from epoch: 221, iter: 900. [Training] [2023-03-23T13:12:35.666013] 23-03-23 13:12:35.545 - INFO: Start training from epoch: 221, iter: 900 [Training] [2023-03-23T13:12:52.567346] 23-03-23 13:12:52.565 - INFO: Training Metrics: {"loss_text_ce": 2.561223030090332, "loss_mel_ce": 0.7925429344177246, "loss_gpt_total": 0.8181551098823547, "lr": 3.90625e-07, "it": 901, "step": 1, "steps": 4, "epoch": 221, "iteration_rate": 14.786593198776245} [Training] [2023-03-23T13:13:04.799345] 23-03-23 13:13:04.798 - INFO: Training Metrics: {"loss_text_ce": 2.5922513008117676, "loss_mel_ce": 0.7928056716918945, "loss_gpt_total": 0.8187281489372253, "lr": 3.90625e-07, "it": 902, "step": 2, "steps": 4, "epoch": 221, "iteration_rate": 12.228018283843994} [Training] [2023-03-23T13:13:17.313830] 23-03-23 13:13:17.312 - INFO: Training Metrics: {"loss_text_ce": 2.588486909866333, "loss_mel_ce": 0.7975842356681824, "loss_gpt_total": 0.8234691023826599, "lr": 3.90625e-07, "it": 903, "step": 3, "steps": 4, "epoch": 221, "iteration_rate": 12.510688066482544} [Training] [2023-03-23T13:13:29.629437] 23-03-23 13:13:29.628 - INFO: Training Metrics: {"loss_text_ce": 2.598679780960083, "loss_mel_ce": 0.8032808303833008, "loss_gpt_total": 0.829267680644989, "lr": 3.90625e-07, "it": 904, "step": 4, "steps": 4, "epoch": 221, "iteration_rate": 12.311805963516235} [Training] [2023-03-23T13:13:43.647493] 23-03-23 13:13:43.646 - INFO: Training Metrics: {"loss_text_ce": 2.62754487991333, "loss_mel_ce": 0.8290464282035828, "loss_gpt_total": 0.855322003364563, "lr": 3.90625e-07, "it": 905, "step": 1, "steps": 4, "epoch": 222, "iteration_rate": 12.529346704483032} [Training] [2023-03-23T13:13:56.050374] 23-03-23 13:13:56.049 - INFO: Training Metrics: {"loss_text_ce": 2.6233415603637695, "loss_mel_ce": 0.8267382979393005, "loss_gpt_total": 0.8529717922210693, "lr": 3.90625e-07, "it": 906, "step": 2, "steps": 4, "epoch": 222, "iteration_rate": 12.39871072769165} [Training] [2023-03-23T13:14:08.341644] 23-03-23 13:14:08.340 - INFO: Training Metrics: {"loss_text_ce": 2.6205732822418213, "loss_mel_ce": 0.82821124792099, "loss_gpt_total": 0.8544170260429382, "lr": 3.90625e-07, "it": 907, "step": 3, "steps": 4, "epoch": 222, "iteration_rate": 12.28763198852539} [Training] [2023-03-23T13:14:20.678430] 23-03-23 13:14:20.677 - INFO: Training Metrics: {"loss_text_ce": 2.6153459548950195, "loss_mel_ce": 0.8285800218582153, "loss_gpt_total": 0.8547335267066956, "lr": 3.90625e-07, "it": 908, "step": 4, "steps": 4, "epoch": 222, "iteration_rate": 12.333005666732788} [Training] [2023-03-23T13:14:34.947494] 23-03-23 13:14:34.946 - INFO: Training Metrics: {"loss_text_ce": 2.600755214691162, "loss_mel_ce": 0.8313522934913635, "loss_gpt_total": 0.8573598861694336, "lr": 3.90625e-07, "it": 909, "step": 1, "steps": 4, "epoch": 223, "iteration_rate": 12.734298467636108} [Training] [2023-03-23T13:14:48.003932] 23-03-23 13:14:48.002 - INFO: Training Metrics: {"loss_text_ce": 2.5963339805603027, "loss_mel_ce": 0.8327493667602539, "loss_gpt_total": 0.8587127923965454, "lr": 3.90625e-07, "it": 910, "step": 2, "steps": 4, "epoch": 223, "iteration_rate": 13.052069902420044} [Training] [2023-03-23T13:15:00.840959] 23-03-23 13:15:00.839 - INFO: Training Metrics: {"loss_text_ce": 2.5941975116729736, "loss_mel_ce": 0.8326621651649475, "loss_gpt_total": 0.8586041927337646, "lr": 3.90625e-07, "it": 911, "step": 3, "steps": 4, "epoch": 223, "iteration_rate": 12.83292841911316} [Training] [2023-03-23T13:15:13.773065] 23-03-23 13:15:13.772 - INFO: Training Metrics: {"loss_text_ce": 2.593618869781494, "loss_mel_ce": 0.8316826820373535, "loss_gpt_total": 0.8576189279556274, "lr": 3.90625e-07, "it": 912, "step": 4, "steps": 4, "epoch": 223, "iteration_rate": 12.928085088729858} [Training] [2023-03-23T13:15:28.440042] 23-03-23 13:15:28.438 - INFO: Training Metrics: {"loss_text_ce": 2.5887019634246826, "loss_mel_ce": 0.8272339701652527, "loss_gpt_total": 0.8531210422515869, "lr": 3.90625e-07, "it": 913, "step": 1, "steps": 4, "epoch": 224, "iteration_rate": 13.05258560180664} [Training] [2023-03-23T13:15:41.722726] 23-03-23 13:15:41.721 - INFO: Training Metrics: {"loss_text_ce": 2.593032121658325, "loss_mel_ce": 0.8263492584228516, "loss_gpt_total": 0.852279543876648, "lr": 3.90625e-07, "it": 914, "step": 2, "steps": 4, "epoch": 224, "iteration_rate": 13.278674602508545} [Training] [2023-03-23T13:15:54.976470] 23-03-23 13:15:54.975 - INFO: Training Metrics: {"loss_text_ce": 2.593416929244995, "loss_mel_ce": 0.8254876732826233, "loss_gpt_total": 0.8514218330383301, "lr": 3.90625e-07, "it": 915, "step": 3, "steps": 4, "epoch": 224, "iteration_rate": 13.249351978302002} [Training] [2023-03-23T13:16:08.553505] 23-03-23 13:16:08.552 - INFO: Training Metrics: {"loss_text_ce": 2.5929136276245117, "loss_mel_ce": 0.8258978724479675, "loss_gpt_total": 0.8518270254135132, "lr": 3.90625e-07, "it": 916, "step": 4, "steps": 4, "epoch": 224, "iteration_rate": 13.573248863220215} [Training] [2023-03-23T13:16:23.450124] 23-03-23 13:16:23.448 - INFO: Training Metrics: {"loss_text_ce": 2.59108829498291, "loss_mel_ce": 0.8244689702987671, "loss_gpt_total": 0.8503797650337219, "lr": 3.90625e-07, "it": 917, "step": 1, "steps": 4, "epoch": 225, "iteration_rate": 13.316820859909058} [Training] [2023-03-23T13:16:37.027459] 23-03-23 13:16:37.026 - INFO: Training Metrics: {"loss_text_ce": 2.5956287384033203, "loss_mel_ce": 0.8218204975128174, "loss_gpt_total": 0.8477767109870911, "lr": 3.90625e-07, "it": 918, "step": 2, "steps": 4, "epoch": 225, "iteration_rate": 13.573445558547974} [Training] [2023-03-23T13:16:50.872448] 23-03-23 13:16:50.871 - INFO: Training Metrics: {"loss_text_ce": 2.595446825027466, "loss_mel_ce": 0.8234513401985168, "loss_gpt_total": 0.849405825138092, "lr": 3.90625e-07, "it": 919, "step": 3, "steps": 4, "epoch": 225, "iteration_rate": 13.840927124023438} [Training] [2023-03-23T13:17:04.696623] 23-03-23 13:17:04.695 - INFO: Training Metrics: {"loss_text_ce": 2.5941290855407715, "loss_mel_ce": 0.8249985575675964, "loss_gpt_total": 0.8509398698806763, "lr": 3.90625e-07, "it": 920, "step": 4, "steps": 4, "epoch": 225, "iteration_rate": 13.819649696350098} [Training] [2023-03-23T13:17:20.074128] 23-03-23 13:17:20.073 - INFO: Beginning validation. ``` Only thing I can think of is try reducing your dataset size to 64 so it divides evenly.
Author

Well, that solved it! Guess I won't go with near-prime numbers in the future, thanks a ton!

Well, that solved it! Guess I won't go with near-prime numbers in the future, thanks a ton!
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#169
No description provided.