getting strange error when trying to train #371
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#371
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
this is the output I'm currently getting when trying to train it. not sure what else to share to help solve this (I've already tried reinstalling and using different cuda versions Im also using windows 10 and have a GTX 980 Ti):
Spawning process: train.bat ./training/markiplier/train.yaml
[Training] [2023-09-05T20:27:18.993858]
[Training] [2023-09-05T20:27:18.998854] (venv) D:\TorToise\ai-voice-cloning>call .\venv\Scripts\activate.bat
[Training] [2023-09-05T20:27:22.547885] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-09-05T20:27:26.897653] 23-09-05 20:27:26.897 - INFO: name: markiplier
[Training] [2023-09-05T20:27:26.902648] model: extensibletrainer
[Training] [2023-09-05T20:27:26.907642] scale: 1
[Training] [2023-09-05T20:27:26.912638] gpu_ids: [0]
[Training] [2023-09-05T20:27:26.919633] start_step: 0
[Training] [2023-09-05T20:27:26.924628] checkpointing_enabled: True
[Training] [2023-09-05T20:27:26.930623] fp16: False
[Training] [2023-09-05T20:27:26.935616] bitsandbytes: True
[Training] [2023-09-05T20:27:26.939614] gpus: 1
[Training] [2023-09-05T20:27:26.943609] datasets:[
[Training] [2023-09-05T20:27:26.948606] train:[
[Training] [2023-09-05T20:27:26.953601] name: training
[Training] [2023-09-05T20:27:26.957596] n_workers: 2
[Training] [2023-09-05T20:27:26.961593] batch_size: 128
[Training] [2023-09-05T20:27:26.966589] mode: paired_voice_audio
[Training] [2023-09-05T20:27:26.971584] path: ./training/markiplier/train.txt
[Training] [2023-09-05T20:27:26.976578] fetcher_mode: ['lj']
[Training] [2023-09-05T20:27:26.981575] phase: train
[Training] [2023-09-05T20:27:26.986569] max_wav_length: 255995
[Training] [2023-09-05T20:27:26.990565] max_text_length: 200
[Training] [2023-09-05T20:27:26.995561] sample_rate: 22050
[Training] [2023-09-05T20:27:26.999558] load_conditioning: True
[Training] [2023-09-05T20:27:27.004604] num_conditioning_candidates: 2
[Training] [2023-09-05T20:27:27.008550] conditioning_length: 44000
[Training] [2023-09-05T20:27:27.016542] use_bpe_tokenizer: True
[Training] [2023-09-05T20:27:27.020537] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-09-05T20:27:27.027533] load_aligned_codes: False
[Training] [2023-09-05T20:27:27.035524] data_type: img
[Training] [2023-09-05T20:27:27.041519] ]
[Training] [2023-09-05T20:27:27.047516] val:[
[Training] [2023-09-05T20:27:27.052509] name: validation
[Training] [2023-09-05T20:27:27.058503] n_workers: 2
[Training] [2023-09-05T20:27:27.064496] batch_size: 2
[Training] [2023-09-05T20:27:27.069493] mode: paired_voice_audio
[Training] [2023-09-05T20:27:27.073489] path: ./training/markiplier/validation.txt
[Training] [2023-09-05T20:27:27.078484] fetcher_mode: ['lj']
[Training] [2023-09-05T20:27:27.083480] phase: val
[Training] [2023-09-05T20:27:27.087476] max_wav_length: 255995
[Training] [2023-09-05T20:27:27.092471] max_text_length: 200
[Training] [2023-09-05T20:27:27.097467] sample_rate: 22050
[Training] [2023-09-05T20:27:27.102462] load_conditioning: True
[Training] [2023-09-05T20:27:27.106456] num_conditioning_candidates: 2
[Training] [2023-09-05T20:27:27.111452] conditioning_length: 44000
[Training] [2023-09-05T20:27:27.116447] use_bpe_tokenizer: True
[Training] [2023-09-05T20:27:27.120443] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-09-05T20:27:27.125440] load_aligned_codes: False
[Training] [2023-09-05T20:27:27.131434] data_type: img
[Training] [2023-09-05T20:27:27.136430] ]
[Training] [2023-09-05T20:27:27.141423] ]
[Training] [2023-09-05T20:27:27.146421] steps:[
[Training] [2023-09-05T20:27:27.150416] gpt_train:[
[Training] [2023-09-05T20:27:27.155413] training: gpt
[Training] [2023-09-05T20:27:27.161408] loss_log_buffer: 500
[Training] [2023-09-05T20:27:27.166402] optimizer: adamw
[Training] [2023-09-05T20:27:27.170399] optimizer_params:[
[Training] [2023-09-05T20:27:27.175393] lr: 1e-05
[Training] [2023-09-05T20:27:27.180388] weight_decay: 0.01
[Training] [2023-09-05T20:27:27.185384] beta1: 0.9
[Training] [2023-09-05T20:27:27.190381] beta2: 0.96
[Training] [2023-09-05T20:27:27.195374] ]
[Training] [2023-09-05T20:27:27.200371] clip_grad_eps: 4
[Training] [2023-09-05T20:27:27.205365] injectors:[
[Training] [2023-09-05T20:27:27.209360] paired_to_mel:[
[Training] [2023-09-05T20:27:27.214358] type: torch_mel_spectrogram
[Training] [2023-09-05T20:27:27.219353] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-09-05T20:27:27.223348] in: wav
[Training] [2023-09-05T20:27:27.229343] out: paired_mel
[Training] [2023-09-05T20:27:27.234339] ]
[Training] [2023-09-05T20:27:27.238335] paired_cond_to_mel:[
[Training] [2023-09-05T20:27:27.243329] type: for_each
[Training] [2023-09-05T20:27:27.248326] subtype: torch_mel_spectrogram
[Training] [2023-09-05T20:27:27.253320] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-09-05T20:27:27.258315] in: conditioning
[Training] [2023-09-05T20:27:27.263310] out: paired_conditioning_mel
[Training] [2023-09-05T20:27:27.269307] ]
[Training] [2023-09-05T20:27:27.274302] to_codes:[
[Training] [2023-09-05T20:27:27.280296] type: discrete_token
[Training] [2023-09-05T20:27:27.284291] in: paired_mel
[Training] [2023-09-05T20:27:27.289286] out: paired_mel_codes
[Training] [2023-09-05T20:27:27.295282] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml
[Training] [2023-09-05T20:27:27.300277] ]
[Training] [2023-09-05T20:27:27.304272] paired_fwd_text:[
[Training] [2023-09-05T20:27:27.309268] type: generator
[Training] [2023-09-05T20:27:27.314263] generator: gpt
[Training] [2023-09-05T20:27:27.319258] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
[Training] [2023-09-05T20:27:27.325253] out: ['loss_text_ce', 'loss_mel_ce', 'logits']
[Training] [2023-09-05T20:27:27.330248] ]
[Training] [2023-09-05T20:27:27.335244] ]
[Training] [2023-09-05T20:27:27.340238] losses:[
[Training] [2023-09-05T20:27:27.345236] text_ce:[
[Training] [2023-09-05T20:27:27.350230] type: direct
[Training] [2023-09-05T20:27:27.354226] weight: 0.01
[Training] [2023-09-05T20:27:27.358222] key: loss_text_ce
[Training] [2023-09-05T20:27:27.362220] ]
[Training] [2023-09-05T20:27:27.367215] mel_ce:[
[Training] [2023-09-05T20:27:27.373208] type: direct
[Training] [2023-09-05T20:27:27.378204] weight: 1
[Training] [2023-09-05T20:27:27.383200] key: loss_mel_ce
[Training] [2023-09-05T20:27:27.388194] ]
[Training] [2023-09-05T20:27:27.392191] ]
[Training] [2023-09-05T20:27:27.397187] ]
[Training] [2023-09-05T20:27:27.402182] ]
[Training] [2023-09-05T20:27:27.406178] networks:[
[Training] [2023-09-05T20:27:27.411173] gpt:[
[Training] [2023-09-05T20:27:27.416167] type: generator
[Training] [2023-09-05T20:27:27.421163] which_model_G: unified_voice2
[Training] [2023-09-05T20:27:27.425160] kwargs:[
[Training] [2023-09-05T20:27:27.431154] layers: 30
[Training] [2023-09-05T20:27:27.435151] model_dim: 1024
[Training] [2023-09-05T20:27:27.440147] heads: 16
[Training] [2023-09-05T20:27:27.444144] max_text_tokens: 402
[Training] [2023-09-05T20:27:27.449137] max_mel_tokens: 604
[Training] [2023-09-05T20:27:27.453133] max_conditioning_inputs: 2
[Training] [2023-09-05T20:27:27.458130] mel_length_compression: 1024
[Training] [2023-09-05T20:27:27.464124] number_text_tokens: 256
[Training] [2023-09-05T20:27:27.469119] number_mel_codes: 8194
[Training] [2023-09-05T20:27:27.474115] start_mel_token: 8192
[Training] [2023-09-05T20:27:27.479111] stop_mel_token: 8193
[Training] [2023-09-05T20:27:27.484104] start_text_token: 255
[Training] [2023-09-05T20:27:27.489101] train_solo_embeddings: False
[Training] [2023-09-05T20:27:27.495094] use_mel_codes_as_input: True
[Training] [2023-09-05T20:27:27.500091] checkpointing: True
[Training] [2023-09-05T20:27:27.505084] tortoise_compat: True
[Training] [2023-09-05T20:27:27.510082] ]
[Training] [2023-09-05T20:27:27.515075] ]
[Training] [2023-09-05T20:27:27.520071] ]
[Training] [2023-09-05T20:27:27.525066] path:[
[Training] [2023-09-05T20:27:27.530062] strict_load: True
[Training] [2023-09-05T20:27:27.535057] pretrain_model_gpt: ./models/tortoise/autoregressive.pth
[Training] [2023-09-05T20:27:27.540052] root: ./
[Training] [2023-09-05T20:27:27.545049] experiments_root: ./training\markiplier\finetune
[Training] [2023-09-05T20:27:27.550044] models: ./training\markiplier\finetune\models
[Training] [2023-09-05T20:27:27.555039] training_state: ./training\markiplier\finetune\training_state
[Training] [2023-09-05T20:27:27.560034] log: ./training\markiplier\finetune
[Training] [2023-09-05T20:27:27.565028] val_images: ./training\markiplier\finetune\val_images
[Training] [2023-09-05T20:27:27.569025] ]
[Training] [2023-09-05T20:27:27.574021] train:[
[Training] [2023-09-05T20:27:27.579016] niter: 500
[Training] [2023-09-05T20:27:27.584012] warmup_iter: -1
[Training] [2023-09-05T20:27:27.588008] mega_batch_factor: 64
[Training] [2023-09-05T20:27:27.593005] val_freq: 25
[Training] [2023-09-05T20:27:27.599996] ema_enabled: False
[Training] [2023-09-05T20:27:27.604992] default_lr_scheme: MultiStepLR
[Training] [2023-09-05T20:27:27.609986] gen_lr_steps: [10, 20, 45, 90, 125, 165, 250]
[Training] [2023-09-05T20:27:27.614983] lr_gamma: 0.5
[Training] [2023-09-05T20:27:27.619979] ]
[Training] [2023-09-05T20:27:27.623975] eval:[
[Training] [2023-09-05T20:27:27.629970] pure: False
[Training] [2023-09-05T20:27:27.633966] output_state: gen
[Training] [2023-09-05T20:27:27.638961] ]
[Training] [2023-09-05T20:27:27.642957] logger:[
[Training] [2023-09-05T20:27:27.647952] save_checkpoint_freq: 25
[Training] [2023-09-05T20:27:27.652946] visuals: ['gen', 'mel']
[Training] [2023-09-05T20:27:27.658942] visual_debug_rate: 25
[Training] [2023-09-05T20:27:27.663936] is_mel_spectrogram: True
[Training] [2023-09-05T20:27:27.668931] ]
[Training] [2023-09-05T20:27:27.672928] is_train: True
[Training] [2023-09-05T20:27:27.676924] dist: False
[Training] [2023-09-05T20:27:27.682919]
[Training] [2023-09-05T20:27:27.686916] 23-09-05 20:27:26.897 - INFO: Random seed: 8574
[Training] [2023-09-05T20:27:29.059635] 23-09-05 20:27:29.059 - INFO: Number of training data elements: 541, iters: 5
[Training] [2023-09-05T20:27:29.065629] 23-09-05 20:27:29.059 - INFO: Total epochs needed: 100 for iters 500
[Training] [2023-09-05T20:27:31.678695] D:\TorToise\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing
gradient_checkpointing
to a config initialization is deprecated and will be removed in v5 Transformers. Usingmodel.gradient_checkpointing_enable()
instead, or if you are using theTrainer
API, passgradient_checkpointing=True
in yourTrainingArguments
.[Training] [2023-09-05T20:27:31.686686] warnings.warn(
[Training] [2023-09-05T20:28:12.404770] 23-09-05 20:28:12.403 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2023-09-05T20:28:13.941333] 23-09-05 20:28:13.933 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-09-05T20:28:17.386937] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-09-05T20:28:21.151689] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-09-05T20:28:22.950009] D:\TorToise\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of
lr_scheduler.step()
beforeoptimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()
beforelr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value ofthe learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
[Training] [2023-09-05T20:28:22.950009] warnings.warn("Detected call of
lr_scheduler.step()
beforeoptimizer.step()
. "[Training] [2023-09-05T20:55:58.183015] Error no kernel image is available for execution on the device at line 167 in file D:\ai\tool\bitsandbytes\csrc\ops.cu