Spawning process: train.bat ./training/Bang_Shishigami/train.yaml [Training] [2023-02-21T04:48:13.917570] [Training] [2023-02-21T04:48:13.920571] (venv) G:\Tortoise-TTS\ai-voice-cloning>call .\venv\Scripts\activate.bat [Training] [2023-02-21T04:48:16.941251] 23-02-21 04:48:16.940 - INFO: name: Bang_Shishigami-finetune [Training] [2023-02-21T04:48:16.945252] model: extensibletrainer [Training] [2023-02-21T04:48:16.948253] scale: 1 [Training] [2023-02-21T04:48:16.951254] gpu_ids: [0] [Training] [2023-02-21T04:48:16.954254] start_step: -1 [Training] [2023-02-21T04:48:16.956255] checkpointing_enabled: True [Training] [2023-02-21T04:48:16.959255] fp16: False [Training] [2023-02-21T04:48:16.961258] wandb: False [Training] [2023-02-21T04:48:16.965257] use_tb_logger: True [Training] [2023-02-21T04:48:16.970258] datasets:[ [Training] [2023-02-21T04:48:16.972258] train:[ [Training] [2023-02-21T04:48:16.975259] name: Bang_Shishigami-train [Training] [2023-02-21T04:48:16.978260] n_workers: 8 [Training] [2023-02-21T04:48:16.982260] batch_size: 20 [Training] [2023-02-21T04:48:16.986261] mode: paired_voice_audio [Training] [2023-02-21T04:48:16.989262] path: ./training/Bang_Shishigami/train.txt [Training] [2023-02-21T04:48:16.991263] fetcher_mode: ['lj'] [Training] [2023-02-21T04:48:16.994263] phase: train [Training] [2023-02-21T04:48:16.997264] max_wav_length: 255995 [Training] [2023-02-21T04:48:17.001265] max_text_length: 200 [Training] [2023-02-21T04:48:17.004265] sample_rate: 22050 [Training] [2023-02-21T04:48:17.007267] load_conditioning: True [Training] [2023-02-21T04:48:17.009267] num_conditioning_candidates: 2 [Training] [2023-02-21T04:48:17.012267] conditioning_length: 44000 [Training] [2023-02-21T04:48:17.015270] use_bpe_tokenizer: True [Training] [2023-02-21T04:48:17.017269] tokenizer_vocab: ./models/tortoise/bpe_lowercase_asr_256.json [Training] [2023-02-21T04:48:17.020269] load_aligned_codes: False [Training] [2023-02-21T04:48:17.023270] data_type: img [Training] [2023-02-21T04:48:17.025270] ] [Training] [2023-02-21T04:48:17.028271] val:[ [Training] [2023-02-21T04:48:17.033272] name: Bang_Shishigami-val [Training] [2023-02-21T04:48:17.036273] n_workers: 1 [Training] [2023-02-21T04:48:17.039274] batch_size: 32 [Training] [2023-02-21T04:48:17.041274] mode: paired_voice_audio [Training] [2023-02-21T04:48:17.044275] path: ./training/Bang_Shishigami/train.txt [Training] [2023-02-21T04:48:17.048276] fetcher_mode: ['lj'] [Training] [2023-02-21T04:48:17.050276] phase: val [Training] [2023-02-21T04:48:17.053277] max_wav_length: 255995 [Training] [2023-02-21T04:48:17.055277] max_text_length: 200 [Training] [2023-02-21T04:48:17.057278] sample_rate: 22050 [Training] [2023-02-21T04:48:17.060278] load_conditioning: True [Training] [2023-02-21T04:48:17.063280] num_conditioning_candidates: 2 [Training] [2023-02-21T04:48:17.066280] conditioning_length: 44000 [Training] [2023-02-21T04:48:17.069280] use_bpe_tokenizer: True [Training] [2023-02-21T04:48:17.072281] tokenizer_vocab: ./models/tortoise/bpe_lowercase_asr_256.json [Training] [2023-02-21T04:48:17.075282] load_aligned_codes: False [Training] [2023-02-21T04:48:17.078282] data_type: img [Training] [2023-02-21T04:48:17.082283] ] [Training] [2023-02-21T04:48:17.085284] ] [Training] [2023-02-21T04:48:17.087284] steps:[ [Training] [2023-02-21T04:48:17.090285] gpt_train:[ [Training] [2023-02-21T04:48:17.092285] training: gpt [Training] [2023-02-21T04:48:17.096286] loss_log_buffer: 500 [Training] [2023-02-21T04:48:17.099287] optimizer: adamw [Training] [2023-02-21T04:48:17.102288] optimizer_params:[ [Training] [2023-02-21T04:48:17.104288] lr: 1e-05 [Training] [2023-02-21T04:48:17.108289] weight_decay: 0.01 [Training] [2023-02-21T04:48:17.111290] beta1: 0.9 [Training] [2023-02-21T04:48:17.114291] beta2: 0.96 [Training] [2023-02-21T04:48:17.117291] ] [Training] [2023-02-21T04:48:17.119292] clip_grad_eps: 4 [Training] [2023-02-21T04:48:17.122292] injectors:[ [Training] [2023-02-21T04:48:17.126293] paired_to_mel:[ [Training] [2023-02-21T04:48:17.129294] type: torch_mel_spectrogram [Training] [2023-02-21T04:48:17.132295] mel_norm_file: ./models/tortoise/clips_mel_norms.pth [Training] [2023-02-21T04:48:17.135295] in: wav [Training] [2023-02-21T04:48:17.137296] out: paired_mel [Training] [2023-02-21T04:48:17.141298] ] [Training] [2023-02-21T04:48:17.143297] paired_cond_to_mel:[ [Training] [2023-02-21T04:48:17.146298] type: for_each [Training] [2023-02-21T04:48:17.149298] subtype: torch_mel_spectrogram [Training] [2023-02-21T04:48:17.151299] mel_norm_file: ./models/tortoise/clips_mel_norms.pth [Training] [2023-02-21T04:48:17.154299] in: conditioning [Training] [2023-02-21T04:48:17.157300] out: paired_conditioning_mel [Training] [2023-02-21T04:48:17.159301] ] [Training] [2023-02-21T04:48:17.162301] to_codes:[ [Training] [2023-02-21T04:48:17.165302] type: discrete_token [Training] [2023-02-21T04:48:17.168303] in: paired_mel [Training] [2023-02-21T04:48:17.171303] out: paired_mel_codes [Training] [2023-02-21T04:48:17.175304] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml [Training] [2023-02-21T04:48:17.177305] ] [Training] [2023-02-21T04:48:17.180305] paired_fwd_text:[ [Training] [2023-02-21T04:48:17.183306] type: generator [Training] [2023-02-21T04:48:17.186306] generator: gpt [Training] [2023-02-21T04:48:17.188307] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths'] [Training] [2023-02-21T04:48:17.191308] out: ['loss_text_ce', 'loss_mel_ce', 'logits'] [Training] [2023-02-21T04:48:17.193308] ] [Training] [2023-02-21T04:48:17.197309] ] [Training] [2023-02-21T04:48:17.199310] losses:[ [Training] [2023-02-21T04:48:17.202310] text_ce:[ [Training] [2023-02-21T04:48:17.205311] type: direct [Training] [2023-02-21T04:48:17.208312] weight: 0.01 [Training] [2023-02-21T04:48:17.211312] key: loss_text_ce [Training] [2023-02-21T04:48:17.214313] ] [Training] [2023-02-21T04:48:17.217313] mel_ce:[ [Training] [2023-02-21T04:48:17.220315] type: direct [Training] [2023-02-21T04:48:17.223315] weight: 1 [Training] [2023-02-21T04:48:17.226316] key: loss_mel_ce [Training] [2023-02-21T04:48:17.228318] ] [Training] [2023-02-21T04:48:17.231317] ] [Training] [2023-02-21T04:48:17.234317] ] [Training] [2023-02-21T04:48:17.237318] ] [Training] [2023-02-21T04:48:17.240319] networks:[ [Training] [2023-02-21T04:48:17.242319] gpt:[ [Training] [2023-02-21T04:48:17.245320] type: generator [Training] [2023-02-21T04:48:17.248320] which_model_G: unified_voice2 [Training] [2023-02-21T04:48:17.251321] kwargs:[ [Training] [2023-02-21T04:48:17.255322] layers: 30 [Training] [2023-02-21T04:48:17.258323] model_dim: 1024 [Training] [2023-02-21T04:48:17.261323] heads: 16 [Training] [2023-02-21T04:48:17.264324] max_text_tokens: 402 [Training] [2023-02-21T04:48:17.267325] max_mel_tokens: 604 [Training] [2023-02-21T04:48:17.270325] max_conditioning_inputs: 2 [Training] [2023-02-21T04:48:17.273326] mel_length_compression: 1024 [Training] [2023-02-21T04:48:17.275327] number_text_tokens: 256 [Training] [2023-02-21T04:48:17.279328] number_mel_codes: 8194 [Training] [2023-02-21T04:48:17.281328] start_mel_token: 8192 [Training] [2023-02-21T04:48:17.285329] stop_mel_token: 8193 [Training] [2023-02-21T04:48:17.288330] start_text_token: 255 [Training] [2023-02-21T04:48:17.291330] train_solo_embeddings: False [Training] [2023-02-21T04:48:17.294331] use_mel_codes_as_input: True [Training] [2023-02-21T04:48:17.296331] checkpointing: True [Training] [2023-02-21T04:48:17.300332] ] [Training] [2023-02-21T04:48:17.303333] ] [Training] [2023-02-21T04:48:17.305334] ] [Training] [2023-02-21T04:48:17.307334] path:[ [Training] [2023-02-21T04:48:17.310334] pretrain_model_gpt: ./models/tortoise/autoregressive.pth [Training] [2023-02-21T04:48:17.312335] strict_load: True [Training] [2023-02-21T04:48:17.315336] root: G:\Tortoise-TTS\ai-voice-cloning [Training] [2023-02-21T04:48:17.318336] experiments_root: G:\Tortoise-TTS\ai-voice-cloning\training\Bang_Shishigami-finetune [Training] [2023-02-21T04:48:17.321337] models: G:\Tortoise-TTS\ai-voice-cloning\training\Bang_Shishigami-finetune\models [Training] [2023-02-21T04:48:17.323337] training_state: G:\Tortoise-TTS\ai-voice-cloning\training\Bang_Shishigami-finetune\training_state [Training] [2023-02-21T04:48:17.326338] log: G:\Tortoise-TTS\ai-voice-cloning\training\Bang_Shishigami-finetune [Training] [2023-02-21T04:48:17.329339] val_images: G:\Tortoise-TTS\ai-voice-cloning\training\Bang_Shishigami-finetune\val_images [Training] [2023-02-21T04:48:17.334340] ] [Training] [2023-02-21T04:48:17.337341] train:[ [Training] [2023-02-21T04:48:17.340342] niter: 500 [Training] [2023-02-21T04:48:17.342342] warmup_iter: -1 [Training] [2023-02-21T04:48:17.346343] mega_batch_factor: 4 [Training] [2023-02-21T04:48:17.349343] val_freq: 500 [Training] [2023-02-21T04:48:17.352344] default_lr_scheme: MultiStepLR [Training] [2023-02-21T04:48:17.355345] gen_lr_steps: [9, 18, 25, 33] [Training] [2023-02-21T04:48:17.358345] lr_gamma: 0.5 [Training] [2023-02-21T04:48:17.361346] ] [Training] [2023-02-21T04:48:17.365348] eval:[ [Training] [2023-02-21T04:48:17.368348] output_state: gen [Training] [2023-02-21T04:48:17.371349] injectors:[ [Training] [2023-02-21T04:48:17.374349] gen_inj_eval:[ [Training] [2023-02-21T04:48:17.377350] type: generator [Training] [2023-02-21T04:48:17.382351] generator: generator [Training] [2023-02-21T04:48:17.385351] in: hq [Training] [2023-02-21T04:48:17.388352] out: ['gen', 'codebook_commitment_loss'] [Training] [2023-02-21T04:48:17.391353] ] [Training] [2023-02-21T04:48:17.393353] ] [Training] [2023-02-21T04:48:17.398354] ] [Training] [2023-02-21T04:48:17.401356] logger:[ [Training] [2023-02-21T04:48:17.404356] print_freq: 5 [Training] [2023-02-21T04:48:17.406356] save_checkpoint_freq: 5 [Training] [2023-02-21T04:48:17.409357] visuals: ['gen', 'mel'] [Training] [2023-02-21T04:48:17.414358] visual_debug_rate: 5 [Training] [2023-02-21T04:48:17.417359] is_mel_spectrogram: True [Training] [2023-02-21T04:48:17.420359] ] [Training] [2023-02-21T04:48:17.423360] is_train: True [Training] [2023-02-21T04:48:17.426361] dist: False [Training] [2023-02-21T04:48:17.429362] [Training] [2023-02-21T04:48:17.432362] 23-02-21 04:48:17.203 - INFO: Random seed: 9846 [Training] [2023-02-21T04:48:18.141522] 23-02-21 04:48:18.141 - INFO: Number of training data elements: 20, iters: 1 [Training] [2023-02-21T04:48:18.145523] 23-02-21 04:48:18.141 - INFO: Total epochs needed: 500 for iters 500 [Training] [2023-02-21T04:48:18.148523] 23-02-21 04:48:18.142 - INFO: Number of val images in [Bang_Shishigami-val]: 20 [Training] [2023-02-21T04:48:19.148748] G:\Tortoise-TTS\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`. [Training] [2023-02-21T04:48:19.152749] warnings.warn( [Training] [2023-02-21T04:48:28.860936] 23-02-21 04:48:28.860 - INFO: Network gpt structure: DataParallel, with parameters: 421,526,786 [Training] [2023-02-21T04:48:28.864938] 23-02-21 04:48:28.860 - INFO: UnifiedVoice( [Training] [2023-02-21T04:48:28.867938] (conditioning_encoder): ConditioningEncoder( [Training] [2023-02-21T04:48:28.870939] (init): Conv1d(80, 1024, kernel_size=(1,), stride=(1,)) [Training] [2023-02-21T04:48:28.872940] (attn): Sequential( [Training] [2023-02-21T04:48:28.874940] (0): AttentionBlock( [Training] [2023-02-21T04:48:28.877941] (norm): GroupNorm32(32, 1024, eps=1e-05, affine=True) [Training] [2023-02-21T04:48:28.880941] (qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,)) [Training] [2023-02-21T04:48:28.883942] (attention): QKVAttentionLegacy() [Training] [2023-02-21T04:48:28.885943] (x_proj): Identity() [Training] [2023-02-21T04:48:28.888943] (proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,)) [Training] [2023-02-21T04:48:28.890944] ) [Training] [2023-02-21T04:48:28.894945] (1): AttentionBlock( [Training] [2023-02-21T04:48:28.896944] (norm): GroupNorm32(32, 1024, eps=1e-05, affine=True) [Training] [2023-02-21T04:48:28.900946] (qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,)) [Training] [2023-02-21T04:48:28.903946] (attention): QKVAttentionLegacy() [Training] [2023-02-21T04:48:28.905947] (x_proj): Identity() [Training] [2023-02-21T04:48:28.907947] (proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,)) [Training] [2023-02-21T04:48:28.910949] ) [Training] [2023-02-21T04:48:28.912949] (2): AttentionBlock( [Training] [2023-02-21T04:48:28.915949] (norm): GroupNorm32(32, 1024, eps=1e-05, affine=True) [Training] [2023-02-21T04:48:28.917950] (qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,)) [Training] [2023-02-21T04:48:28.920950] (attention): QKVAttentionLegacy() [Training] [2023-02-21T04:48:28.922951] (x_proj): Identity() [Training] [2023-02-21T04:48:28.924951] (proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,)) [Training] [2023-02-21T04:48:28.926952] ) [Training] [2023-02-21T04:48:28.930952] (3): AttentionBlock( [Training] [2023-02-21T04:48:28.933953] (norm): GroupNorm32(32, 1024, eps=1e-05, affine=True) [Training] [2023-02-21T04:48:28.935954] (qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,)) [Training] [2023-02-21T04:48:28.938955] (attention): QKVAttentionLegacy() [Training] [2023-02-21T04:48:28.940955] (x_proj): Identity() [Training] [2023-02-21T04:48:28.942955] (proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,)) [Training] [2023-02-21T04:48:28.945956] ) [Training] [2023-02-21T04:48:28.950957] (4): AttentionBlock( [Training] [2023-02-21T04:48:28.952958] (norm): GroupNorm32(32, 1024, eps=1e-05, affine=True) [Training] [2023-02-21T04:48:28.955958] (qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,)) [Training] [2023-02-21T04:48:28.957959] (attention): QKVAttentionLegacy() [Training] [2023-02-21T04:48:28.959959] (x_proj): Identity() [Training] [2023-02-21T04:48:28.962960] (proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,)) [Training] [2023-02-21T04:48:28.965960] ) [Training] [2023-02-21T04:48:28.968961] (5): AttentionBlock( [Training] [2023-02-21T04:48:28.970962] (norm): GroupNorm32(32, 1024, eps=1e-05, affine=True) [Training] [2023-02-21T04:48:28.972962] (qkv): Conv1d(1024, 3072, kernel_size=(1,), stride=(1,)) [Training] [2023-02-21T04:48:28.975963] (attention): QKVAttentionLegacy() [Training] [2023-02-21T04:48:28.978963] (x_proj): Identity() [Training] [2023-02-21T04:48:28.982964] (proj_out): Conv1d(1024, 1024, kernel_size=(1,), stride=(1,)) [Training] [2023-02-21T04:48:28.985965] ) [Training] [2023-02-21T04:48:28.987965] ) [Training] [2023-02-21T04:48:28.989966] ) [Training] [2023-02-21T04:48:28.992966] (text_embedding): Embedding(256, 1024) [Training] [2023-02-21T04:48:28.995968] (mel_embedding): Embedding(8194, 1024) [Training] [2023-02-21T04:48:28.999968] (gpt): GPT2Model( [Training] [2023-02-21T04:48:29.001969] (drop): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.003969] (h): ModuleList( [Training] [2023-02-21T04:48:29.006970] (0): GPT2Block( [Training] [2023-02-21T04:48:29.008970] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.011971] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.013971] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.015972] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.018972] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.021973] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.023974] ) [Training] [2023-02-21T04:48:29.026974] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.029975] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.032976] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.034976] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.037977] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.039977] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.041978] ) [Training] [2023-02-21T04:48:29.046979] ) [Training] [2023-02-21T04:48:29.049980] (1): GPT2Block( [Training] [2023-02-21T04:48:29.051980] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.054980] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.056981] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.060982] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.062982] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.065983] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.068984] ) [Training] [2023-02-21T04:48:29.070984] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.072984] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.075985] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.078986] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.081987] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.084987] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.086988] ) [Training] [2023-02-21T04:48:29.088988] ) [Training] [2023-02-21T04:48:29.091989] (2): GPT2Block( [Training] [2023-02-21T04:48:29.094990] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.097990] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.099991] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.101991] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.104992] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.106992] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.109993] ) [Training] [2023-02-21T04:48:29.111993] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.114994] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.117995] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.120996] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.122996] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.125996] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.127997] ) [Training] [2023-02-21T04:48:29.130998] ) [Training] [2023-02-21T04:48:29.133998] (3): GPT2Block( [Training] [2023-02-21T04:48:29.136999] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.138999] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.141000] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.144000] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.146001] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.149002] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.152002] ) [Training] [2023-02-21T04:48:29.154003] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.157003] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.159004] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.162005] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.164005] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.167006] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.170006] ) [Training] [2023-02-21T04:48:29.173007] ) [Training] [2023-02-21T04:48:29.175008] (4): GPT2Block( [Training] [2023-02-21T04:48:29.177008] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.181009] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.184009] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.186010] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.188011] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.190011] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.192012] ) [Training] [2023-02-21T04:48:29.195012] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.198013] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.202014] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.204014] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.206015] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.209015] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.212016] ) [Training] [2023-02-21T04:48:29.214016] ) [Training] [2023-02-21T04:48:29.218017] (5): GPT2Block( [Training] [2023-02-21T04:48:29.220018] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.223018] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.225019] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.227020] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.230020] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.234021] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.236021] ) [Training] [2023-02-21T04:48:29.239022] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.241022] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.244023] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.247023] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.251025] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.254026] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.256026] ) [Training] [2023-02-21T04:48:29.259027] ) [Training] [2023-02-21T04:48:29.262027] (6): GPT2Block( [Training] [2023-02-21T04:48:29.266028] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.268028] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.270029] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.272029] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.274030] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.277030] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.280033] ) [Training] [2023-02-21T04:48:29.283032] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.285032] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.288033] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.290033] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.293034] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.296037] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.299035] ) [Training] [2023-02-21T04:48:29.301036] ) [Training] [2023-02-21T04:48:29.304036] (7): GPT2Block( [Training] [2023-02-21T04:48:29.306037] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.309038] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.312038] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.315039] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.317040] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.319040] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.321040] ) [Training] [2023-02-21T04:48:29.324041] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.326041] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.330043] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.333043] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.336044] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.338044] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.341045] ) [Training] [2023-02-21T04:48:29.343045] ) [Training] [2023-02-21T04:48:29.346046] (8): GPT2Block( [Training] [2023-02-21T04:48:29.348047] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.350047] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.353048] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.356048] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.358049] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.362050] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.365050] ) [Training] [2023-02-21T04:48:29.368051] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.371052] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.373052] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.376053] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.378053] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.380054] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.383054] ) [Training] [2023-02-21T04:48:29.385055] ) [Training] [2023-02-21T04:48:29.387055] (9): GPT2Block( [Training] [2023-02-21T04:48:29.389056] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.392056] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.394057] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.397058] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.400058] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.402059] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.404059] ) [Training] [2023-02-21T04:48:29.407060] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.409060] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.412062] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.415062] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.417062] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.420063] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.422063] ) [Training] [2023-02-21T04:48:29.426064] ) [Training] [2023-02-21T04:48:29.428064] (10): GPT2Block( [Training] [2023-02-21T04:48:29.431065] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.434066] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.436066] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.439067] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.442068] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.444068] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.448069] ) [Training] [2023-02-21T04:48:29.451070] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.453070] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.457071] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.460072] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.462072] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.465073] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.467073] ) [Training] [2023-02-21T04:48:29.471074] ) [Training] [2023-02-21T04:48:29.473075] (11): GPT2Block( [Training] [2023-02-21T04:48:29.476075] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.479077] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.481077] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.484077] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.489078] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.491079] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.494079] ) [Training] [2023-02-21T04:48:29.498080] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.502081] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.505082] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.507082] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.509083] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.512083] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.515084] ) [Training] [2023-02-21T04:48:29.518084] ) [Training] [2023-02-21T04:48:29.521086] (12): GPT2Block( [Training] [2023-02-21T04:48:29.523086] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.525087] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.528087] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.531088] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.534088] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.536089] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.539090] ) [Training] [2023-02-21T04:48:29.541090] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.543090] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.546091] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.550092] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.553093] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.555093] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.557094] ) [Training] [2023-02-21T04:48:29.560094] ) [Training] [2023-02-21T04:48:29.562095] (13): GPT2Block( [Training] [2023-02-21T04:48:29.565096] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.569096] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.571097] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.573097] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.576098] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.579098] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.583099] ) [Training] [2023-02-21T04:48:29.586100] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.588100] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.591101] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.593102] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.597102] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.600103] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.603104] ) [Training] [2023-02-21T04:48:29.605104] ) [Training] [2023-02-21T04:48:29.607105] (14): GPT2Block( [Training] [2023-02-21T04:48:29.609105] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.614106] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.617107] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.619108] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.622108] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.625110] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.630110] ) [Training] [2023-02-21T04:48:29.633111] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.636111] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.638112] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.640112] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.643113] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.646114] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.649115] ) [Training] [2023-02-21T04:48:29.653115] ) [Training] [2023-02-21T04:48:29.655116] (15): GPT2Block( [Training] [2023-02-21T04:48:29.657116] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.660117] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.663118] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.665118] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.668119] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.670119] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.673120] ) [Training] [2023-02-21T04:48:29.677121] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.679121] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.682122] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.685122] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.688123] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.692124] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.694125] ) [Training] [2023-02-21T04:48:29.697125] ) [Training] [2023-02-21T04:48:29.700126] (16): GPT2Block( [Training] [2023-02-21T04:48:29.704127] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.709128] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.713130] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.716129] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.719130] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.722131] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.726132] ) [Training] [2023-02-21T04:48:29.728133] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.731133] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.734134] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.737134] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.741135] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.744136] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.747136] ) [Training] [2023-02-21T04:48:29.750137] ) [Training] [2023-02-21T04:48:29.753138] (17): GPT2Block( [Training] [2023-02-21T04:48:29.757139] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.760139] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.763140] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.765141] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.767141] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.770142] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.773142] ) [Training] [2023-02-21T04:48:29.775143] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.778143] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.782144] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.786145] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.788146] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.790146] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.793147] ) [Training] [2023-02-21T04:48:29.795148] ) [Training] [2023-02-21T04:48:29.800148] (18): GPT2Block( [Training] [2023-02-21T04:48:29.803149] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.806150] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.809150] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.813151] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.817153] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.821153] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.824154] ) [Training] [2023-02-21T04:48:29.827155] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.830156] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.834156] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.836157] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.838157] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.842158] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.845158] ) [Training] [2023-02-21T04:48:29.849159] ) [Training] [2023-02-21T04:48:29.851160] (19): GPT2Block( [Training] [2023-02-21T04:48:29.853160] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.857161] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.860162] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.862162] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.866163] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.870164] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.873165] ) [Training] [2023-02-21T04:48:29.876165] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.879167] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.882167] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.885168] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.888169] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.891169] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.893169] ) [Training] [2023-02-21T04:48:29.896170] ) [Training] [2023-02-21T04:48:29.900171] (20): GPT2Block( [Training] [2023-02-21T04:48:29.903172] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.906172] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.909173] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.911174] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.915174] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.917175] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.919175] ) [Training] [2023-02-21T04:48:29.921176] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.924176] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.927177] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.931178] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.934179] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.936179] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.938180] ) [Training] [2023-02-21T04:48:29.941180] ) [Training] [2023-02-21T04:48:29.944180] (21): GPT2Block( [Training] [2023-02-21T04:48:29.949182] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.952183] (attn): GPT2Attention( [Training] [2023-02-21T04:48:29.955183] (c_attn): Conv1D() [Training] [2023-02-21T04:48:29.957184] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.962185] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.965185] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.967186] ) [Training] [2023-02-21T04:48:29.970187] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:29.972187] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:29.975188] (c_fc): Conv1D() [Training] [2023-02-21T04:48:29.979190] (c_proj): Conv1D() [Training] [2023-02-21T04:48:29.982189] (act): NewGELUActivation() [Training] [2023-02-21T04:48:29.986190] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:29.989191] ) [Training] [2023-02-21T04:48:29.991191] ) [Training] [2023-02-21T04:48:29.993192] (22): GPT2Block( [Training] [2023-02-21T04:48:29.998193] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.001194] (attn): GPT2Attention( [Training] [2023-02-21T04:48:30.004194] (c_attn): Conv1D() [Training] [2023-02-21T04:48:30.007195] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.010196] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.013197] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.015197] ) [Training] [2023-02-21T04:48:30.017197] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.020198] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:30.023198] (c_fc): Conv1D() [Training] [2023-02-21T04:48:30.026199] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.029200] (act): NewGELUActivation() [Training] [2023-02-21T04:48:30.032200] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.035201] ) [Training] [2023-02-21T04:48:30.037201] ) [Training] [2023-02-21T04:48:30.039202] (23): GPT2Block( [Training] [2023-02-21T04:48:30.042203] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.045204] (attn): GPT2Attention( [Training] [2023-02-21T04:48:30.048204] (c_attn): Conv1D() [Training] [2023-02-21T04:48:30.050205] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.052205] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.055206] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.059207] ) [Training] [2023-02-21T04:48:30.062208] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.065208] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:30.068209] (c_fc): Conv1D() [Training] [2023-02-21T04:48:30.071209] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.073210] (act): NewGELUActivation() [Training] [2023-02-21T04:48:30.076210] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.079211] ) [Training] [2023-02-21T04:48:30.081212] ) [Training] [2023-02-21T04:48:30.084212] (24): GPT2Block( [Training] [2023-02-21T04:48:30.088213] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.090214] (attn): GPT2Attention( [Training] [2023-02-21T04:48:30.093214] (c_attn): Conv1D() [Training] [2023-02-21T04:48:30.096215] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.099216] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.102216] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.104217] ) [Training] [2023-02-21T04:48:30.107218] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.110218] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:30.112219] (c_fc): Conv1D() [Training] [2023-02-21T04:48:30.115220] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.118220] (act): NewGELUActivation() [Training] [2023-02-21T04:48:30.121221] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.123221] ) [Training] [2023-02-21T04:48:30.125222] ) [Training] [2023-02-21T04:48:30.128222] (25): GPT2Block( [Training] [2023-02-21T04:48:30.130223] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.133223] (attn): GPT2Attention( [Training] [2023-02-21T04:48:30.136224] (c_attn): Conv1D() [Training] [2023-02-21T04:48:30.139225] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.141225] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.145227] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.150227] ) [Training] [2023-02-21T04:48:30.153228] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.156229] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:30.158229] (c_fc): Conv1D() [Training] [2023-02-21T04:48:30.160230] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.164230] (act): NewGELUActivation() [Training] [2023-02-21T04:48:30.168231] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.170232] ) [Training] [2023-02-21T04:48:30.173232] ) [Training] [2023-02-21T04:48:30.175233] (26): GPT2Block( [Training] [2023-02-21T04:48:30.177234] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.179234] (attn): GPT2Attention( [Training] [2023-02-21T04:48:30.181235] (c_attn): Conv1D() [Training] [2023-02-21T04:48:30.184235] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.187236] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.189236] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.191237] ) [Training] [2023-02-21T04:48:30.194238] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.197238] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:30.201239] (c_fc): Conv1D() [Training] [2023-02-21T04:48:30.203239] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.205240] (act): NewGELUActivation() [Training] [2023-02-21T04:48:30.207240] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.210241] ) [Training] [2023-02-21T04:48:30.214242] ) [Training] [2023-02-21T04:48:30.217242] (27): GPT2Block( [Training] [2023-02-21T04:48:30.220243] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.222244] (attn): GPT2Attention( [Training] [2023-02-21T04:48:30.225244] (c_attn): Conv1D() [Training] [2023-02-21T04:48:30.227245] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.230245] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.233246] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.236247] ) [Training] [2023-02-21T04:48:30.238247] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.240248] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:30.243248] (c_fc): Conv1D() [Training] [2023-02-21T04:48:30.247249] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.249250] (act): NewGELUActivation() [Training] [2023-02-21T04:48:30.252250] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.254251] ) [Training] [2023-02-21T04:48:30.257251] ) [Training] [2023-02-21T04:48:30.259252] (28): GPT2Block( [Training] [2023-02-21T04:48:30.263253] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.266253] (attn): GPT2Attention( [Training] [2023-02-21T04:48:30.269254] (c_attn): Conv1D() [Training] [2023-02-21T04:48:30.271255] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.274255] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.277256] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.281257] ) [Training] [2023-02-21T04:48:30.284257] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.286258] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:30.288258] (c_fc): Conv1D() [Training] [2023-02-21T04:48:30.290259] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.293260] (act): NewGELUActivation() [Training] [2023-02-21T04:48:30.296260] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.299261] ) [Training] [2023-02-21T04:48:30.302262] ) [Training] [2023-02-21T04:48:30.305262] (29): GPT2Block( [Training] [2023-02-21T04:48:30.307263] (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.310264] (attn): GPT2Attention( [Training] [2023-02-21T04:48:30.314265] (c_attn): Conv1D() [Training] [2023-02-21T04:48:30.316265] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.319265] (attn_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.322266] (resid_dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.325267] ) [Training] [2023-02-21T04:48:30.328267] (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.332268] (mlp): GPT2MLP( [Training] [2023-02-21T04:48:30.335269] (c_fc): Conv1D() [Training] [2023-02-21T04:48:30.337269] (c_proj): Conv1D() [Training] [2023-02-21T04:48:30.340270] (act): NewGELUActivation() [Training] [2023-02-21T04:48:30.345271] (dropout): Dropout(p=0.1, inplace=False) [Training] [2023-02-21T04:48:30.347272] ) [Training] [2023-02-21T04:48:30.350273] ) [Training] [2023-02-21T04:48:30.352273] ) [Training] [2023-02-21T04:48:30.355273] (ln_f): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.358274] ) [Training] [2023-02-21T04:48:30.361275] (mel_pos_embedding): LearnedPositionEmbeddings( [Training] [2023-02-21T04:48:30.364275] (emb): Embedding(608, 1024) [Training] [2023-02-21T04:48:30.367276] ) [Training] [2023-02-21T04:48:30.370277] (text_pos_embedding): LearnedPositionEmbeddings( [Training] [2023-02-21T04:48:30.372277] (emb): Embedding(404, 1024) [Training] [2023-02-21T04:48:30.376278] ) [Training] [2023-02-21T04:48:30.378280] (final_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) [Training] [2023-02-21T04:48:30.381279] (text_head): Linear(in_features=1024, out_features=256, bias=True) [Training] [2023-02-21T04:48:30.384280] (mel_head): Linear(in_features=1024, out_features=8194, bias=True) [Training] [2023-02-21T04:48:30.386280] ) [Training] [2023-02-21T04:48:30.388281] 23-02-21 04:48:29.442 - INFO: Loading model for [./models/tortoise/autoregressive.pth] [Training] [2023-02-21T04:48:34.058108] 23-02-21 04:48:34.058 - INFO: Start training from epoch: 0, iter: -1 [Training] [2023-02-21T04:48:34.129123] Disabled distributed training. [Training] [2023-02-21T04:48:34.129123] Loading from ./models/tortoise/dvae.pth [Training] [2023-02-21T04:48:34.131124] WARNING! Unable to find EMA network! Starting a new EMA from given model parameters. [Training] [2023-02-21T04:48:34.134125] [Training] [2023-02-21T04:48:43.202168] 0%| | 0/1 [00:00 [Training] [2023-02-21T04:49:06.929034] train(args.opt, args.launcher) [Training] [2023-02-21T04:49:06.929034] File "G:\Tortoise-TTS\ai-voice-cloning\src\train.py", line 53, in train [Training] [2023-02-21T04:49:06.930035] trainer.do_training() [Training] [2023-02-21T04:49:06.930035] File "G:\Tortoise-TTS\ai-voice-cloning\./dlas\codes\train.py", line 330, in do_training [Training] [2023-02-21T04:49:06.949038] self.do_step(train_data) [Training] [2023-02-21T04:49:06.949038] File "G:\Tortoise-TTS\ai-voice-cloning\./dlas\codes\train.py", line 211, in do_step [Training] [2023-02-21T04:49:06.952040] gradient_norms_dict = self.model.optimize_parameters(self.current_step, return_grad_norms=will_log) [Training] [2023-02-21T04:49:06.952040] File "G:\Tortoise-TTS\ai-voice-cloning\./dlas/codes\trainer\ExtensibleTrainer.py", line 372, in optimize_parameters [Training] [2023-02-21T04:49:06.965042] self.consume_gradients(state, step, it) [Training] [2023-02-21T04:49:06.966042] File "G:\Tortoise-TTS\ai-voice-cloning\./dlas/codes\trainer\ExtensibleTrainer.py", line 417, in consume_gradients [Training] [2023-02-21T04:49:06.967043] step.do_step(it) [Training] [2023-02-21T04:49:06.967043] File "G:\Tortoise-TTS\ai-voice-cloning\./dlas/codes\trainer\steps.py", line 359, in do_step [Training] [2023-02-21T04:49:07.014053] self.scaler.step(opt) [Training] [2023-02-21T04:49:07.014053] File "G:\Tortoise-TTS\ai-voice-cloning\venv\lib\site-packages\torch\cuda\amp\grad_scaler.py", line 313, in step [Training] [2023-02-21T04:49:07.023056] return optimizer.step(*args, **kwargs) [Training] [2023-02-21T04:49:07.023056] File "G:\Tortoise-TTS\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py", line 68, in wrapper [Training] [2023-02-21T04:49:07.024056] return wrapped(*args, **kwargs) [Training] [2023-02-21T04:49:07.024056] File "G:\Tortoise-TTS\ai-voice-cloning\venv\lib\site-packages\torch\optim\optimizer.py", line 140, in wrapper [Training] [2023-02-21T04:49:07.033057] out = func(*args, **kwargs) [Training] [2023-02-21T04:49:07.034058] File "G:\Tortoise-TTS\ai-voice-cloning\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context [Training] [2023-02-21T04:49:07.035058] return func(*args, **kwargs) [Training] [2023-02-21T04:49:07.035058] File "G:\Tortoise-TTS\ai-voice-cloning\venv\lib\site-packages\torch\optim\adamw.py", line 147, in step [Training] [2023-02-21T04:49:07.036058] state['exp_avg'] = torch.zeros_like(p, memory_format=torch.preserve_format) [Training] [2023-02-21T04:49:07.037058] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 10.00 GiB total capacity; 5.88 GiB already allocated; 0 bytes free; 6.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF