Can't train #125

Closed
opened 2023-03-12 21:33:39 +00:00 by gasthemall · 2 comments

python 3.9
gtx 3090

Fresh install. Trying to train

image

Spawning process: train.bat ./training/Somegirl/train.yaml
[Training] [2023-03-13T04:24:04.434194]
[Training] [2023-03-13T04:24:04.438195] (venv) C:\Users\PC\Desktop\ai-voice-cloning>call .\venv\Scripts\activate.bat
[Training] [2023-03-13T04:24:06.420913] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-03-13T04:24:06.628386] 23-03-13 04:24:06.628 - INFO: name: Somegirl
[Training] [2023-03-13T04:24:06.632387] model: extensibletrainer
[Training] [2023-03-13T04:24:06.635386] scale: 1
[Training] [2023-03-13T04:24:06.639386] gpu_ids: [0]
[Training] [2023-03-13T04:24:06.643386] start_step: 0
[Training] [2023-03-13T04:24:06.647386] checkpointing_enabled: True
[Training] [2023-03-13T04:24:06.649386] fp16: False
[Training] [2023-03-13T04:24:06.652386] bitsandbytes: True
[Training] [2023-03-13T04:24:06.655386] gpus: 1
[Training] [2023-03-13T04:24:06.657386] datasets:[
[Training] [2023-03-13T04:24:06.661386] train:[
[Training] [2023-03-13T04:24:06.663386] name: training
[Training] [2023-03-13T04:24:06.666386] n_workers: 2
[Training] [2023-03-13T04:24:06.669386] batch_size: 1
[Training] [2023-03-13T04:24:06.671386] mode: paired_voice_audio
[Training] [2023-03-13T04:24:06.675387] path: ./training/Somegirl/train.txt
[Training] [2023-03-13T04:24:06.678386] fetcher_mode: ['lj']
[Training] [2023-03-13T04:24:06.682386] phase: train
[Training] [2023-03-13T04:24:06.685386] max_wav_length: 255995
[Training] [2023-03-13T04:24:06.689386] max_text_length: 200
[Training] [2023-03-13T04:24:06.693386] sample_rate: 22050
[Training] [2023-03-13T04:24:06.696386] load_conditioning: True
[Training] [2023-03-13T04:24:06.698386] num_conditioning_candidates: 2
[Training] [2023-03-13T04:24:06.701386] conditioning_length: 44000
[Training] [2023-03-13T04:24:06.703386] use_bpe_tokenizer: True
[Training] [2023-03-13T04:24:06.706386] tokenizer_vocab: ./models/tortoise/bpe_lowercase_asr_256.json
[Training] [2023-03-13T04:24:06.709386] load_aligned_codes: False
[Training] [2023-03-13T04:24:06.711386] data_type: img
[Training] [2023-03-13T04:24:06.713386] ]
[Training] [2023-03-13T04:24:06.715386] val:[
[Training] [2023-03-13T04:24:06.719285] name: validation
[Training] [2023-03-13T04:24:06.722282] n_workers: 2
[Training] [2023-03-13T04:24:06.729283] batch_size: 0
[Training] [2023-03-13T04:24:06.732283] mode: paired_voice_audio
[Training] [2023-03-13T04:24:06.735283] path: ./training/Somegirl/validation.txt
[Training] [2023-03-13T04:24:06.738282] fetcher_mode: ['lj']
[Training] [2023-03-13T04:24:06.741283] phase: val
[Training] [2023-03-13T04:24:06.745282] max_wav_length: 255995
[Training] [2023-03-13T04:24:06.747283] max_text_length: 200
[Training] [2023-03-13T04:24:06.750283] sample_rate: 22050
[Training] [2023-03-13T04:24:06.752282] load_conditioning: True
[Training] [2023-03-13T04:24:06.754283] num_conditioning_candidates: 2
[Training] [2023-03-13T04:24:06.757282] conditioning_length: 44000
[Training] [2023-03-13T04:24:06.760282] use_bpe_tokenizer: True
[Training] [2023-03-13T04:24:06.762284] tokenizer_vocab: ./models/tortoise/bpe_lowercase_asr_256.json
[Training] [2023-03-13T04:24:06.765282] load_aligned_codes: False
[Training] [2023-03-13T04:24:06.767282] data_type: img
[Training] [2023-03-13T04:24:06.770282] ]
[Training] [2023-03-13T04:24:06.772283] ]
[Training] [2023-03-13T04:24:06.775283] steps:[
[Training] [2023-03-13T04:24:06.777282] gpt_train:[
[Training] [2023-03-13T04:24:06.780283] training: gpt
[Training] [2023-03-13T04:24:06.782282] loss_log_buffer: 500
[Training] [2023-03-13T04:24:06.784283] optimizer: adamw
[Training] [2023-03-13T04:24:06.787282] optimizer_params:[
[Training] [2023-03-13T04:24:06.789282] lr: 1e-05
[Training] [2023-03-13T04:24:06.792282] weight_decay: 0.01
[Training] [2023-03-13T04:24:06.795282] beta1: 0.9
[Training] [2023-03-13T04:24:06.797282] beta2: 0.96
[Training] [2023-03-13T04:24:06.799282] ]
[Training] [2023-03-13T04:24:06.801282] clip_grad_eps: 4
[Training] [2023-03-13T04:24:06.804282] injectors:[
[Training] [2023-03-13T04:24:06.806283] paired_to_mel:[
[Training] [2023-03-13T04:24:06.809282] type: torch_mel_spectrogram
[Training] [2023-03-13T04:24:06.811282] mel_norm_file: ./models/tortoise/clips_mel_norms.pth
[Training] [2023-03-13T04:24:06.814283] in: wav
[Training] [2023-03-13T04:24:06.816282] out: paired_mel
[Training] [2023-03-13T04:24:06.819282] ]
[Training] [2023-03-13T04:24:06.822282] paired_cond_to_mel:[
[Training] [2023-03-13T04:24:06.824282] type: for_each
[Training] [2023-03-13T04:24:06.826282] subtype: torch_mel_spectrogram
[Training] [2023-03-13T04:24:06.829282] mel_norm_file: ./models/tortoise/clips_mel_norms.pth
[Training] [2023-03-13T04:24:06.832282] in: conditioning
[Training] [2023-03-13T04:24:06.835282] out: paired_conditioning_mel
[Training] [2023-03-13T04:24:06.838282] ]
[Training] [2023-03-13T04:24:06.841282] to_codes:[
[Training] [2023-03-13T04:24:06.843282] type: discrete_token
[Training] [2023-03-13T04:24:06.845282] in: paired_mel
[Training] [2023-03-13T04:24:06.849282] out: paired_mel_codes
[Training] [2023-03-13T04:24:06.851282] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml
[Training] [2023-03-13T04:24:06.854283] ]
[Training] [2023-03-13T04:24:06.857282] paired_fwd_text:[
[Training] [2023-03-13T04:24:06.859282] type: generator
[Training] [2023-03-13T04:24:06.861282] generator: gpt
[Training] [2023-03-13T04:24:06.863282] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
[Training] [2023-03-13T04:24:06.866282] out: ['loss_text_ce', 'loss_mel_ce', 'logits']
[Training] [2023-03-13T04:24:06.869283] ]
[Training] [2023-03-13T04:24:06.871282] ]
[Training] [2023-03-13T04:24:06.874282] losses:[
[Training] [2023-03-13T04:24:06.877282] text_ce:[
[Training] [2023-03-13T04:24:06.879282] type: direct
[Training] [2023-03-13T04:24:06.882282] weight: 0.01
[Training] [2023-03-13T04:24:06.885282] key: loss_text_ce
[Training] [2023-03-13T04:24:06.887282] ]
[Training] [2023-03-13T04:24:06.889282] mel_ce:[
[Training] [2023-03-13T04:24:06.892282] type: direct
[Training] [2023-03-13T04:24:06.894282] weight: 1
[Training] [2023-03-13T04:24:06.896282] key: loss_mel_ce
[Training] [2023-03-13T04:24:06.899282] ]
[Training] [2023-03-13T04:24:06.901282] ]
[Training] [2023-03-13T04:24:06.904282] ]
[Training] [2023-03-13T04:24:06.906282] ]
[Training] [2023-03-13T04:24:06.908282] networks:[
[Training] [2023-03-13T04:24:06.911282] gpt:[
[Training] [2023-03-13T04:24:06.913282] type: generator
[Training] [2023-03-13T04:24:06.915282] which_model_G: unified_voice2
[Training] [2023-03-13T04:24:06.918282] kwargs:[
[Training] [2023-03-13T04:24:06.921282] layers: 30
[Training] [2023-03-13T04:24:06.923282] model_dim: 1024
[Training] [2023-03-13T04:24:06.926282] heads: 16
[Training] [2023-03-13T04:24:06.928282] max_text_tokens: 402
[Training] [2023-03-13T04:24:06.931282] max_mel_tokens: 604
[Training] [2023-03-13T04:24:06.933282] max_conditioning_inputs: 2
[Training] [2023-03-13T04:24:06.936282] mel_length_compression: 1024
[Training] [2023-03-13T04:24:06.939283] number_text_tokens: 256
[Training] [2023-03-13T04:24:06.941282] number_mel_codes: 8194
[Training] [2023-03-13T04:24:06.944282] start_mel_token: 8192
[Training] [2023-03-13T04:24:06.947282] stop_mel_token: 8193
[Training] [2023-03-13T04:24:06.949282] start_text_token: 255
[Training] [2023-03-13T04:24:06.952383] train_solo_embeddings: False
[Training] [2023-03-13T04:24:06.954282] use_mel_codes_as_input: True
[Training] [2023-03-13T04:24:06.956282] checkpointing: True
[Training] [2023-03-13T04:24:06.959282] tortoise_compat: True
[Training] [2023-03-13T04:24:06.962282] ]
[Training] [2023-03-13T04:24:06.964282] ]
[Training] [2023-03-13T04:24:06.967282] ]
[Training] [2023-03-13T04:24:06.969282] path:[
[Training] [2023-03-13T04:24:06.972080] strict_load: True
[Training] [2023-03-13T04:24:06.974082] pretrain_model_gpt: C:\Users\PC\Desktop\ai-voice-cloning\models\tortoise\autoregressive.pth
[Training] [2023-03-13T04:24:06.976083] root: ./
[Training] [2023-03-13T04:24:06.979083] experiments_root: ./training\Somegirl\finetune
[Training] [2023-03-13T04:24:06.982083] models: ./training\Somegirl\finetune\models
[Training] [2023-03-13T04:24:06.984083] training_state: ./training\Somegirl\finetune\training_state
[Training] [2023-03-13T04:24:06.987099] log: ./training\Somegirl\finetune
[Training] [2023-03-13T04:24:06.989430] val_images: ./training\Somegirl\finetune\val_images
[Training] [2023-03-13T04:24:06.992384] ]
[Training] [2023-03-13T04:24:06.995384] train:[
[Training] [2023-03-13T04:24:06.997384] niter: 200
[Training] [2023-03-13T04:24:07.000384] warmup_iter: -1
[Training] [2023-03-13T04:24:07.002384] mega_batch_factor: 1
[Training] [2023-03-13T04:24:07.004384] val_freq: 5
[Training] [2023-03-13T04:24:07.007384] ema_enabled: False
[Training] [2023-03-13T04:24:07.010385] default_lr_scheme: MultiStepLR
[Training] [2023-03-13T04:24:07.013385] gen_lr_steps: [9, 18, 25, 33, 50, 59]
[Training] [2023-03-13T04:24:07.016384] lr_gamma: 0.5
[Training] [2023-03-13T04:24:07.019384] ]
[Training] [2023-03-13T04:24:07.022384] eval:[
[Training] [2023-03-13T04:24:07.025384] pure: True
[Training] [2023-03-13T04:24:07.027384] output_state: gen
[Training] [2023-03-13T04:24:07.030384] ]
[Training] [2023-03-13T04:24:07.032384] logger:[
[Training] [2023-03-13T04:24:07.035384] save_checkpoint_freq: 5
[Training] [2023-03-13T04:24:07.038384] visuals: ['gen', 'mel']
[Training] [2023-03-13T04:24:07.040384] visual_debug_rate: 5
[Training] [2023-03-13T04:24:07.043384] is_mel_spectrogram: True
[Training] [2023-03-13T04:24:07.045384] ]
[Training] [2023-03-13T04:24:07.048384] is_train: True
[Training] [2023-03-13T04:24:07.051384] dist: False
[Training] [2023-03-13T04:24:07.053384]
[Training] [2023-03-13T04:24:07.056384] 23-03-13 04:24:06.628 - INFO: Random seed: 4113
[Training] [2023-03-13T04:24:07.307952] 23-03-13 04:24:07.307 - INFO: Number of training data elements: 1, iters: 1
[Training] [2023-03-13T04:24:07.310952] 23-03-13 04:24:07.307 - INFO: Total epochs needed: 200 for iters 200
[Training] [2023-03-13T04:24:07.313952] 23-03-13 04:24:07.308 - INFO: Number of val images in [validation]: 0
[Training] [2023-03-13T04:24:08.134973] C:\Users\PC\Desktop\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:375: UserWarning: Passing gradient_checkpointing to a config initialization is deprecated and will be removed in v5 Transformers. Using model.gradient_checkpointing_enable() instead, or if you are using the Trainer API, pass gradient_checkpointing=True in your TrainingArguments.
[Training] [2023-03-13T04:24:08.137973] warnings.warn(
[Training] [2023-03-13T04:24:12.723847] 23-03-13 04:24:12.723 - INFO: Loading model for [C:\Users\PC\Desktop\ai-voice-cloning\models\tortoise\autoregressive.pth]
[Training] [2023-03-13T04:24:13.619087] 23-03-13 04:24:13.614 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-03-13T04:24:15.552085] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-03-13T04:24:15.583085] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-03-13T04:24:16.364083] C:\Users\PC\Desktop\ai-voice-cloning./modules/dlas/codes\models\audio\tts\tacotron2\taco_utils.py:17: WavFileWarning: Chunk (non-data) not understood, skipping it.
[Training] [2023-03-13T04:24:16.364083] sampling_rate, data = read(full_path)

python 3.9 gtx 3090 Fresh install. Trying to train ![image](/attachments/db7ed579-d8bb-4d0d-9782-528fd1d8fc0c) Spawning process: train.bat ./training/Somegirl/train.yaml [Training] [2023-03-13T04:24:04.434194] [Training] [2023-03-13T04:24:04.438195] (venv) C:\Users\PC\Desktop\ai-voice-cloning>call .\venv\Scripts\activate.bat [Training] [2023-03-13T04:24:06.420913] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-03-13T04:24:06.628386] 23-03-13 04:24:06.628 - INFO: name: Somegirl [Training] [2023-03-13T04:24:06.632387] model: extensibletrainer [Training] [2023-03-13T04:24:06.635386] scale: 1 [Training] [2023-03-13T04:24:06.639386] gpu_ids: [0] [Training] [2023-03-13T04:24:06.643386] start_step: 0 [Training] [2023-03-13T04:24:06.647386] checkpointing_enabled: True [Training] [2023-03-13T04:24:06.649386] fp16: False [Training] [2023-03-13T04:24:06.652386] bitsandbytes: True [Training] [2023-03-13T04:24:06.655386] gpus: 1 [Training] [2023-03-13T04:24:06.657386] datasets:[ [Training] [2023-03-13T04:24:06.661386] train:[ [Training] [2023-03-13T04:24:06.663386] name: training [Training] [2023-03-13T04:24:06.666386] n_workers: 2 [Training] [2023-03-13T04:24:06.669386] batch_size: 1 [Training] [2023-03-13T04:24:06.671386] mode: paired_voice_audio [Training] [2023-03-13T04:24:06.675387] path: ./training/Somegirl/train.txt [Training] [2023-03-13T04:24:06.678386] fetcher_mode: ['lj'] [Training] [2023-03-13T04:24:06.682386] phase: train [Training] [2023-03-13T04:24:06.685386] max_wav_length: 255995 [Training] [2023-03-13T04:24:06.689386] max_text_length: 200 [Training] [2023-03-13T04:24:06.693386] sample_rate: 22050 [Training] [2023-03-13T04:24:06.696386] load_conditioning: True [Training] [2023-03-13T04:24:06.698386] num_conditioning_candidates: 2 [Training] [2023-03-13T04:24:06.701386] conditioning_length: 44000 [Training] [2023-03-13T04:24:06.703386] use_bpe_tokenizer: True [Training] [2023-03-13T04:24:06.706386] tokenizer_vocab: ./models/tortoise/bpe_lowercase_asr_256.json [Training] [2023-03-13T04:24:06.709386] load_aligned_codes: False [Training] [2023-03-13T04:24:06.711386] data_type: img [Training] [2023-03-13T04:24:06.713386] ] [Training] [2023-03-13T04:24:06.715386] val:[ [Training] [2023-03-13T04:24:06.719285] name: validation [Training] [2023-03-13T04:24:06.722282] n_workers: 2 [Training] [2023-03-13T04:24:06.729283] batch_size: 0 [Training] [2023-03-13T04:24:06.732283] mode: paired_voice_audio [Training] [2023-03-13T04:24:06.735283] path: ./training/Somegirl/validation.txt [Training] [2023-03-13T04:24:06.738282] fetcher_mode: ['lj'] [Training] [2023-03-13T04:24:06.741283] phase: val [Training] [2023-03-13T04:24:06.745282] max_wav_length: 255995 [Training] [2023-03-13T04:24:06.747283] max_text_length: 200 [Training] [2023-03-13T04:24:06.750283] sample_rate: 22050 [Training] [2023-03-13T04:24:06.752282] load_conditioning: True [Training] [2023-03-13T04:24:06.754283] num_conditioning_candidates: 2 [Training] [2023-03-13T04:24:06.757282] conditioning_length: 44000 [Training] [2023-03-13T04:24:06.760282] use_bpe_tokenizer: True [Training] [2023-03-13T04:24:06.762284] tokenizer_vocab: ./models/tortoise/bpe_lowercase_asr_256.json [Training] [2023-03-13T04:24:06.765282] load_aligned_codes: False [Training] [2023-03-13T04:24:06.767282] data_type: img [Training] [2023-03-13T04:24:06.770282] ] [Training] [2023-03-13T04:24:06.772283] ] [Training] [2023-03-13T04:24:06.775283] steps:[ [Training] [2023-03-13T04:24:06.777282] gpt_train:[ [Training] [2023-03-13T04:24:06.780283] training: gpt [Training] [2023-03-13T04:24:06.782282] loss_log_buffer: 500 [Training] [2023-03-13T04:24:06.784283] optimizer: adamw [Training] [2023-03-13T04:24:06.787282] optimizer_params:[ [Training] [2023-03-13T04:24:06.789282] lr: 1e-05 [Training] [2023-03-13T04:24:06.792282] weight_decay: 0.01 [Training] [2023-03-13T04:24:06.795282] beta1: 0.9 [Training] [2023-03-13T04:24:06.797282] beta2: 0.96 [Training] [2023-03-13T04:24:06.799282] ] [Training] [2023-03-13T04:24:06.801282] clip_grad_eps: 4 [Training] [2023-03-13T04:24:06.804282] injectors:[ [Training] [2023-03-13T04:24:06.806283] paired_to_mel:[ [Training] [2023-03-13T04:24:06.809282] type: torch_mel_spectrogram [Training] [2023-03-13T04:24:06.811282] mel_norm_file: ./models/tortoise/clips_mel_norms.pth [Training] [2023-03-13T04:24:06.814283] in: wav [Training] [2023-03-13T04:24:06.816282] out: paired_mel [Training] [2023-03-13T04:24:06.819282] ] [Training] [2023-03-13T04:24:06.822282] paired_cond_to_mel:[ [Training] [2023-03-13T04:24:06.824282] type: for_each [Training] [2023-03-13T04:24:06.826282] subtype: torch_mel_spectrogram [Training] [2023-03-13T04:24:06.829282] mel_norm_file: ./models/tortoise/clips_mel_norms.pth [Training] [2023-03-13T04:24:06.832282] in: conditioning [Training] [2023-03-13T04:24:06.835282] out: paired_conditioning_mel [Training] [2023-03-13T04:24:06.838282] ] [Training] [2023-03-13T04:24:06.841282] to_codes:[ [Training] [2023-03-13T04:24:06.843282] type: discrete_token [Training] [2023-03-13T04:24:06.845282] in: paired_mel [Training] [2023-03-13T04:24:06.849282] out: paired_mel_codes [Training] [2023-03-13T04:24:06.851282] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml [Training] [2023-03-13T04:24:06.854283] ] [Training] [2023-03-13T04:24:06.857282] paired_fwd_text:[ [Training] [2023-03-13T04:24:06.859282] type: generator [Training] [2023-03-13T04:24:06.861282] generator: gpt [Training] [2023-03-13T04:24:06.863282] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths'] [Training] [2023-03-13T04:24:06.866282] out: ['loss_text_ce', 'loss_mel_ce', 'logits'] [Training] [2023-03-13T04:24:06.869283] ] [Training] [2023-03-13T04:24:06.871282] ] [Training] [2023-03-13T04:24:06.874282] losses:[ [Training] [2023-03-13T04:24:06.877282] text_ce:[ [Training] [2023-03-13T04:24:06.879282] type: direct [Training] [2023-03-13T04:24:06.882282] weight: 0.01 [Training] [2023-03-13T04:24:06.885282] key: loss_text_ce [Training] [2023-03-13T04:24:06.887282] ] [Training] [2023-03-13T04:24:06.889282] mel_ce:[ [Training] [2023-03-13T04:24:06.892282] type: direct [Training] [2023-03-13T04:24:06.894282] weight: 1 [Training] [2023-03-13T04:24:06.896282] key: loss_mel_ce [Training] [2023-03-13T04:24:06.899282] ] [Training] [2023-03-13T04:24:06.901282] ] [Training] [2023-03-13T04:24:06.904282] ] [Training] [2023-03-13T04:24:06.906282] ] [Training] [2023-03-13T04:24:06.908282] networks:[ [Training] [2023-03-13T04:24:06.911282] gpt:[ [Training] [2023-03-13T04:24:06.913282] type: generator [Training] [2023-03-13T04:24:06.915282] which_model_G: unified_voice2 [Training] [2023-03-13T04:24:06.918282] kwargs:[ [Training] [2023-03-13T04:24:06.921282] layers: 30 [Training] [2023-03-13T04:24:06.923282] model_dim: 1024 [Training] [2023-03-13T04:24:06.926282] heads: 16 [Training] [2023-03-13T04:24:06.928282] max_text_tokens: 402 [Training] [2023-03-13T04:24:06.931282] max_mel_tokens: 604 [Training] [2023-03-13T04:24:06.933282] max_conditioning_inputs: 2 [Training] [2023-03-13T04:24:06.936282] mel_length_compression: 1024 [Training] [2023-03-13T04:24:06.939283] number_text_tokens: 256 [Training] [2023-03-13T04:24:06.941282] number_mel_codes: 8194 [Training] [2023-03-13T04:24:06.944282] start_mel_token: 8192 [Training] [2023-03-13T04:24:06.947282] stop_mel_token: 8193 [Training] [2023-03-13T04:24:06.949282] start_text_token: 255 [Training] [2023-03-13T04:24:06.952383] train_solo_embeddings: False [Training] [2023-03-13T04:24:06.954282] use_mel_codes_as_input: True [Training] [2023-03-13T04:24:06.956282] checkpointing: True [Training] [2023-03-13T04:24:06.959282] tortoise_compat: True [Training] [2023-03-13T04:24:06.962282] ] [Training] [2023-03-13T04:24:06.964282] ] [Training] [2023-03-13T04:24:06.967282] ] [Training] [2023-03-13T04:24:06.969282] path:[ [Training] [2023-03-13T04:24:06.972080] strict_load: True [Training] [2023-03-13T04:24:06.974082] pretrain_model_gpt: C:\Users\PC\Desktop\ai-voice-cloning\models\tortoise\autoregressive.pth [Training] [2023-03-13T04:24:06.976083] root: ./ [Training] [2023-03-13T04:24:06.979083] experiments_root: ./training\Somegirl\finetune [Training] [2023-03-13T04:24:06.982083] models: ./training\Somegirl\finetune\models [Training] [2023-03-13T04:24:06.984083] training_state: ./training\Somegirl\finetune\training_state [Training] [2023-03-13T04:24:06.987099] log: ./training\Somegirl\finetune [Training] [2023-03-13T04:24:06.989430] val_images: ./training\Somegirl\finetune\val_images [Training] [2023-03-13T04:24:06.992384] ] [Training] [2023-03-13T04:24:06.995384] train:[ [Training] [2023-03-13T04:24:06.997384] niter: 200 [Training] [2023-03-13T04:24:07.000384] warmup_iter: -1 [Training] [2023-03-13T04:24:07.002384] mega_batch_factor: 1 [Training] [2023-03-13T04:24:07.004384] val_freq: 5 [Training] [2023-03-13T04:24:07.007384] ema_enabled: False [Training] [2023-03-13T04:24:07.010385] default_lr_scheme: MultiStepLR [Training] [2023-03-13T04:24:07.013385] gen_lr_steps: [9, 18, 25, 33, 50, 59] [Training] [2023-03-13T04:24:07.016384] lr_gamma: 0.5 [Training] [2023-03-13T04:24:07.019384] ] [Training] [2023-03-13T04:24:07.022384] eval:[ [Training] [2023-03-13T04:24:07.025384] pure: True [Training] [2023-03-13T04:24:07.027384] output_state: gen [Training] [2023-03-13T04:24:07.030384] ] [Training] [2023-03-13T04:24:07.032384] logger:[ [Training] [2023-03-13T04:24:07.035384] save_checkpoint_freq: 5 [Training] [2023-03-13T04:24:07.038384] visuals: ['gen', 'mel'] [Training] [2023-03-13T04:24:07.040384] visual_debug_rate: 5 [Training] [2023-03-13T04:24:07.043384] is_mel_spectrogram: True [Training] [2023-03-13T04:24:07.045384] ] [Training] [2023-03-13T04:24:07.048384] is_train: True [Training] [2023-03-13T04:24:07.051384] dist: False [Training] [2023-03-13T04:24:07.053384] [Training] [2023-03-13T04:24:07.056384] 23-03-13 04:24:06.628 - INFO: Random seed: 4113 [Training] [2023-03-13T04:24:07.307952] 23-03-13 04:24:07.307 - INFO: Number of training data elements: 1, iters: 1 [Training] [2023-03-13T04:24:07.310952] 23-03-13 04:24:07.307 - INFO: Total epochs needed: 200 for iters 200 [Training] [2023-03-13T04:24:07.313952] 23-03-13 04:24:07.308 - INFO: Number of val images in [validation]: 0 [Training] [2023-03-13T04:24:08.134973] C:\Users\PC\Desktop\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:375: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`. [Training] [2023-03-13T04:24:08.137973] warnings.warn( [Training] [2023-03-13T04:24:12.723847] 23-03-13 04:24:12.723 - INFO: Loading model for [C:\Users\PC\Desktop\ai-voice-cloning\models\tortoise\autoregressive.pth] [Training] [2023-03-13T04:24:13.619087] 23-03-13 04:24:13.614 - INFO: Start training from epoch: 0, iter: 0 [Training] [2023-03-13T04:24:15.552085] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-03-13T04:24:15.583085] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-03-13T04:24:16.364083] C:\Users\PC\Desktop\ai-voice-cloning\./modules/dlas/codes\models\audio\tts\tacotron2\taco_utils.py:17: WavFileWarning: Chunk (non-data) not understood, skipping it. [Training] [2023-03-13T04:24:16.364083] sampling_rate, data = read(full_path)
Owner

Now, usually it would just mean it's still initializing, as there's no (meaningful) error message by then. There's a bit of a gap between that point and when it finishes the first iteration for the UI to update with the metrics.

But

INFO: Number of training data elements: 1, iters: 1

I think you should slice your dataset first into pieces in the UI. I'm not too sure if that's a reason, but

Now, usually it would just mean it's still initializing, as there's no (meaningful) error message by then. There's a bit of a gap between that point and when it finishes the first iteration for the UI to update with the metrics. But > INFO: Number of training data elements: 1, iters: 1 I think you should slice your dataset first into pieces in the UI. I'm not too sure if that's a reason, but
Author

I works now, I guess if the audio data is chunky enough, you really do need to slice it

I works now, I guess if the audio data is chunky enough, you really do need to slice it
mrq closed this issue 2023-03-12 22:50:27 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#125
No description provided.