getting strange error when trying to train
#371
Open
opened
Loading…
Reference in New Issue
There is no content yet.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. It CANNOT be undone. Continue?
this is the output I'm currently getting when trying to train it. not sure what else to share to help solve this (I've already tried reinstalling and using different cuda versions Im also using windows 10 and have a GTX 980 Ti):
Spawning process: train.bat ./training/markiplier/train.yaml
[Training] [2023-09-05T20:27:18.993858]
[Training] [2023-09-05T20:27:18.998854] (venv) D:\TorToise\ai-voice-cloning>call .\venv\Scripts\activate.bat
[Training] [2023-09-05T20:27:22.547885] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-09-05T20:27:26.897653] 23-09-05 20:27:26.897 - INFO: name: markiplier
[Training] [2023-09-05T20:27:26.902648] model: extensibletrainer
[Training] [2023-09-05T20:27:26.907642] scale: 1
[Training] [2023-09-05T20:27:26.912638] gpu_ids: [0]
[Training] [2023-09-05T20:27:26.919633] start_step: 0
[Training] [2023-09-05T20:27:26.924628] checkpointing_enabled: True
[Training] [2023-09-05T20:27:26.930623] fp16: False
[Training] [2023-09-05T20:27:26.935616] bitsandbytes: True
[Training] [2023-09-05T20:27:26.939614] gpus: 1
[Training] [2023-09-05T20:27:26.943609] datasets:[
[Training] [2023-09-05T20:27:26.948606] train:[
[Training] [2023-09-05T20:27:26.953601] name: training
[Training] [2023-09-05T20:27:26.957596] n_workers: 2
[Training] [2023-09-05T20:27:26.961593] batch_size: 128
[Training] [2023-09-05T20:27:26.966589] mode: paired_voice_audio
[Training] [2023-09-05T20:27:26.971584] path: ./training/markiplier/train.txt
[Training] [2023-09-05T20:27:26.976578] fetcher_mode: ['lj']
[Training] [2023-09-05T20:27:26.981575] phase: train
[Training] [2023-09-05T20:27:26.986569] max_wav_length: 255995
[Training] [2023-09-05T20:27:26.990565] max_text_length: 200
[Training] [2023-09-05T20:27:26.995561] sample_rate: 22050
[Training] [2023-09-05T20:27:26.999558] load_conditioning: True
[Training] [2023-09-05T20:27:27.004604] num_conditioning_candidates: 2
[Training] [2023-09-05T20:27:27.008550] conditioning_length: 44000
[Training] [2023-09-05T20:27:27.016542] use_bpe_tokenizer: True
[Training] [2023-09-05T20:27:27.020537] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-09-05T20:27:27.027533] load_aligned_codes: False
[Training] [2023-09-05T20:27:27.035524] data_type: img
[Training] [2023-09-05T20:27:27.041519] ]
[Training] [2023-09-05T20:27:27.047516] val:[
[Training] [2023-09-05T20:27:27.052509] name: validation
[Training] [2023-09-05T20:27:27.058503] n_workers: 2
[Training] [2023-09-05T20:27:27.064496] batch_size: 2
[Training] [2023-09-05T20:27:27.069493] mode: paired_voice_audio
[Training] [2023-09-05T20:27:27.073489] path: ./training/markiplier/validation.txt
[Training] [2023-09-05T20:27:27.078484] fetcher_mode: ['lj']
[Training] [2023-09-05T20:27:27.083480] phase: val
[Training] [2023-09-05T20:27:27.087476] max_wav_length: 255995
[Training] [2023-09-05T20:27:27.092471] max_text_length: 200
[Training] [2023-09-05T20:27:27.097467] sample_rate: 22050
[Training] [2023-09-05T20:27:27.102462] load_conditioning: True
[Training] [2023-09-05T20:27:27.106456] num_conditioning_candidates: 2
[Training] [2023-09-05T20:27:27.111452] conditioning_length: 44000
[Training] [2023-09-05T20:27:27.116447] use_bpe_tokenizer: True
[Training] [2023-09-05T20:27:27.120443] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-09-05T20:27:27.125440] load_aligned_codes: False
[Training] [2023-09-05T20:27:27.131434] data_type: img
[Training] [2023-09-05T20:27:27.136430] ]
[Training] [2023-09-05T20:27:27.141423] ]
[Training] [2023-09-05T20:27:27.146421] steps:[
[Training] [2023-09-05T20:27:27.150416] gpt_train:[
[Training] [2023-09-05T20:27:27.155413] training: gpt
[Training] [2023-09-05T20:27:27.161408] loss_log_buffer: 500
[Training] [2023-09-05T20:27:27.166402] optimizer: adamw
[Training] [2023-09-05T20:27:27.170399] optimizer_params:[
[Training] [2023-09-05T20:27:27.175393] lr: 1e-05
[Training] [2023-09-05T20:27:27.180388] weight_decay: 0.01
[Training] [2023-09-05T20:27:27.185384] beta1: 0.9
[Training] [2023-09-05T20:27:27.190381] beta2: 0.96
[Training] [2023-09-05T20:27:27.195374] ]
[Training] [2023-09-05T20:27:27.200371] clip_grad_eps: 4
[Training] [2023-09-05T20:27:27.205365] injectors:[
[Training] [2023-09-05T20:27:27.209360] paired_to_mel:[
[Training] [2023-09-05T20:27:27.214358] type: torch_mel_spectrogram
[Training] [2023-09-05T20:27:27.219353] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-09-05T20:27:27.223348] in: wav
[Training] [2023-09-05T20:27:27.229343] out: paired_mel
[Training] [2023-09-05T20:27:27.234339] ]
[Training] [2023-09-05T20:27:27.238335] paired_cond_to_mel:[
[Training] [2023-09-05T20:27:27.243329] type: for_each
[Training] [2023-09-05T20:27:27.248326] subtype: torch_mel_spectrogram
[Training] [2023-09-05T20:27:27.253320] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-09-05T20:27:27.258315] in: conditioning
[Training] [2023-09-05T20:27:27.263310] out: paired_conditioning_mel
[Training] [2023-09-05T20:27:27.269307] ]
[Training] [2023-09-05T20:27:27.274302] to_codes:[
[Training] [2023-09-05T20:27:27.280296] type: discrete_token
[Training] [2023-09-05T20:27:27.284291] in: paired_mel
[Training] [2023-09-05T20:27:27.289286] out: paired_mel_codes
[Training] [2023-09-05T20:27:27.295282] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml
[Training] [2023-09-05T20:27:27.300277] ]
[Training] [2023-09-05T20:27:27.304272] paired_fwd_text:[
[Training] [2023-09-05T20:27:27.309268] type: generator
[Training] [2023-09-05T20:27:27.314263] generator: gpt
[Training] [2023-09-05T20:27:27.319258] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
[Training] [2023-09-05T20:27:27.325253] out: ['loss_text_ce', 'loss_mel_ce', 'logits']
[Training] [2023-09-05T20:27:27.330248] ]
[Training] [2023-09-05T20:27:27.335244] ]
[Training] [2023-09-05T20:27:27.340238] losses:[
[Training] [2023-09-05T20:27:27.345236] text_ce:[
[Training] [2023-09-05T20:27:27.350230] type: direct
[Training] [2023-09-05T20:27:27.354226] weight: 0.01
[Training] [2023-09-05T20:27:27.358222] key: loss_text_ce
[Training] [2023-09-05T20:27:27.362220] ]
[Training] [2023-09-05T20:27:27.367215] mel_ce:[
[Training] [2023-09-05T20:27:27.373208] type: direct
[Training] [2023-09-05T20:27:27.378204] weight: 1
[Training] [2023-09-05T20:27:27.383200] key: loss_mel_ce
[Training] [2023-09-05T20:27:27.388194] ]
[Training] [2023-09-05T20:27:27.392191] ]
[Training] [2023-09-05T20:27:27.397187] ]
[Training] [2023-09-05T20:27:27.402182] ]
[Training] [2023-09-05T20:27:27.406178] networks:[
[Training] [2023-09-05T20:27:27.411173] gpt:[
[Training] [2023-09-05T20:27:27.416167] type: generator
[Training] [2023-09-05T20:27:27.421163] which_model_G: unified_voice2
[Training] [2023-09-05T20:27:27.425160] kwargs:[
[Training] [2023-09-05T20:27:27.431154] layers: 30
[Training] [2023-09-05T20:27:27.435151] model_dim: 1024
[Training] [2023-09-05T20:27:27.440147] heads: 16
[Training] [2023-09-05T20:27:27.444144] max_text_tokens: 402
[Training] [2023-09-05T20:27:27.449137] max_mel_tokens: 604
[Training] [2023-09-05T20:27:27.453133] max_conditioning_inputs: 2
[Training] [2023-09-05T20:27:27.458130] mel_length_compression: 1024
[Training] [2023-09-05T20:27:27.464124] number_text_tokens: 256
[Training] [2023-09-05T20:27:27.469119] number_mel_codes: 8194
[Training] [2023-09-05T20:27:27.474115] start_mel_token: 8192
[Training] [2023-09-05T20:27:27.479111] stop_mel_token: 8193
[Training] [2023-09-05T20:27:27.484104] start_text_token: 255
[Training] [2023-09-05T20:27:27.489101] train_solo_embeddings: False
[Training] [2023-09-05T20:27:27.495094] use_mel_codes_as_input: True
[Training] [2023-09-05T20:27:27.500091] checkpointing: True
[Training] [2023-09-05T20:27:27.505084] tortoise_compat: True
[Training] [2023-09-05T20:27:27.510082] ]
[Training] [2023-09-05T20:27:27.515075] ]
[Training] [2023-09-05T20:27:27.520071] ]
[Training] [2023-09-05T20:27:27.525066] path:[
[Training] [2023-09-05T20:27:27.530062] strict_load: True
[Training] [2023-09-05T20:27:27.535057] pretrain_model_gpt: ./models/tortoise/autoregressive.pth
[Training] [2023-09-05T20:27:27.540052] root: ./
[Training] [2023-09-05T20:27:27.545049] experiments_root: ./training\markiplier\finetune
[Training] [2023-09-05T20:27:27.550044] models: ./training\markiplier\finetune\models
[Training] [2023-09-05T20:27:27.555039] training_state: ./training\markiplier\finetune\training_state
[Training] [2023-09-05T20:27:27.560034] log: ./training\markiplier\finetune
[Training] [2023-09-05T20:27:27.565028] val_images: ./training\markiplier\finetune\val_images
[Training] [2023-09-05T20:27:27.569025] ]
[Training] [2023-09-05T20:27:27.574021] train:[
[Training] [2023-09-05T20:27:27.579016] niter: 500
[Training] [2023-09-05T20:27:27.584012] warmup_iter: -1
[Training] [2023-09-05T20:27:27.588008] mega_batch_factor: 64
[Training] [2023-09-05T20:27:27.593005] val_freq: 25
[Training] [2023-09-05T20:27:27.599996] ema_enabled: False
[Training] [2023-09-05T20:27:27.604992] default_lr_scheme: MultiStepLR
[Training] [2023-09-05T20:27:27.609986] gen_lr_steps: [10, 20, 45, 90, 125, 165, 250]
[Training] [2023-09-05T20:27:27.614983] lr_gamma: 0.5
[Training] [2023-09-05T20:27:27.619979] ]
[Training] [2023-09-05T20:27:27.623975] eval:[
[Training] [2023-09-05T20:27:27.629970] pure: False
[Training] [2023-09-05T20:27:27.633966] output_state: gen
[Training] [2023-09-05T20:27:27.638961] ]
[Training] [2023-09-05T20:27:27.642957] logger:[
[Training] [2023-09-05T20:27:27.647952] save_checkpoint_freq: 25
[Training] [2023-09-05T20:27:27.652946] visuals: ['gen', 'mel']
[Training] [2023-09-05T20:27:27.658942] visual_debug_rate: 25
[Training] [2023-09-05T20:27:27.663936] is_mel_spectrogram: True
[Training] [2023-09-05T20:27:27.668931] ]
[Training] [2023-09-05T20:27:27.672928] is_train: True
[Training] [2023-09-05T20:27:27.676924] dist: False
[Training] [2023-09-05T20:27:27.682919]
[Training] [2023-09-05T20:27:27.686916] 23-09-05 20:27:26.897 - INFO: Random seed: 8574
[Training] [2023-09-05T20:27:29.059635] 23-09-05 20:27:29.059 - INFO: Number of training data elements: 541, iters: 5
[Training] [2023-09-05T20:27:29.065629] 23-09-05 20:27:29.059 - INFO: Total epochs needed: 100 for iters 500
[Training] [2023-09-05T20:27:31.678695] D:\TorToise\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing
gradient_checkpointing
to a config initialization is deprecated and will be removed in v5 Transformers. Usingmodel.gradient_checkpointing_enable()
instead, or if you are using theTrainer
API, passgradient_checkpointing=True
in yourTrainingArguments
.[Training] [2023-09-05T20:27:31.686686] warnings.warn(
[Training] [2023-09-05T20:28:12.404770] 23-09-05 20:28:12.403 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2023-09-05T20:28:13.941333] 23-09-05 20:28:13.933 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-09-05T20:28:17.386937] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-09-05T20:28:21.151689] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-09-05T20:28:22.950009] D:\TorToise\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of
lr_scheduler.step()
beforeoptimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()
beforelr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value ofthe learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
[Training] [2023-09-05T20:28:22.950009] warnings.warn("Detected call of
lr_scheduler.step()
beforeoptimizer.step()
. "[Training] [2023-09-05T20:55:58.183015] Error no kernel image is available for execution on the device at line 167 in file D:\ai\tool\bitsandbytes\csrc\ops.cu