Commit 008a1f5f8f Seems to have broken multi-GPU training on Windows due to lack of nccl support #115

New Issue

psammites · 2023-03-11T18:47:31Z

psammites commented

2023-03-11 18:47:31 +00:00

Per PyTorch documentation on Torch.Distributed, nccl is not supported on Windows and consequently the training process fails to initialize when run with multiple GPU's.

Error produced is "The client socket has failed to connect to [localhost]:1234"

Per [PyTorch documentation on Torch.Distributed](https://pytorch.org/docs/stable/distributed.html), nccl is not supported on Windows and consequently the training process fails to initialize when run with multiple GPU's. Error produced is "The client socket has failed to connect to [localhost]:1234"

mrq commented

2023-03-11 19:12:14 +00:00

Seems to have broken multi-GPU training on Windows

To be technical, there never was. I'll never be able to validate it myself for Windows, as my GPUs are two 6800XTs and a 2060.

However, I imagine you can edit ./src/train.py:74 to change nccl to whatever other backend. It's only that because that's what base DLAS used.

...which seems like only MPI, if you somehow compile PyTorch yourself.

> Seems to have broken multi-GPU training on Windows To be technical, there never was. I'll never be able to validate it myself for Windows, as my GPUs are two 6800XTs and a 2060. However, I imagine you can edit [./src/train.py:74](https://git.ecker.tech/mrq/ai-voice-cloning/src/branch/master/src/train.py#L74) to change nccl to whatever other backend. It's only that because that's what base DLAS used. --- ...which seems like only MPI, if you somehow compile PyTorch yourself.

psammites commented

2023-03-11 19:18:07 +00:00

Seems to have broken multi-GPU training on Windows

To be technical, there never was. I'll never be able to validate it myself for Windows, as my GPUs are two 6800XTs and a 2060.

However, I imagine you can edit ./src/train.py:74 to change nccl to whatever other backend. It's only that because that's what base DLAS used.

...which seems like only MPI, if you somehow compile PyTorch yourself.

It worked for me quite well previous to the change, using 2x RTX 3060's and Windows 10.

Just to make sure I wasn't mis-remembering, I reverted to the previous commit (2feb6da0c0):

PS D:\ai-voice-cloning> git reset --hard 2feb6da0c0a36cf0139186b7b093927591465658
HEAD is now at 2feb6da cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription
PS D:\ai-voice-cloning> .\start.bat

D:\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

['text', 'delimiter', 'emotion', 'prompt', 'voice', 'mic_audio', 'voice_latents_chunks', 'candidates', 'seed', 'num_autoregressive_samples', 'diffusion_iterations', 'temperature', 'diffusion_sampler', 'breathing_room', 'cvvp_weight', 'top_p', 'diffusion_temperature', 'length_penalty', 'repetition_penalty', 'cond_free_k', 'experimentals']
{'text': None, 'delimiter': None, 'emotion': None, 'prompt': None, 'voice': None, 'mic_audio': None, 'voice_latents_chunks': None, 'candidates': None, 'seed': None, 'num_autoregressive_samples': 16, 'diffusion_iterations': 30, 'temperature': 0.8, 'diffusion_sampler': 'DDIM', 'breathing_room': 8, 'cvvp_weight': 0.0, 'top_p': 0.8, 'diffusion_temperature': 1.0, 'length_penalty': 1.0, 'repetition_penalty': 2.0, 'cond_free_k': 2.0, 'experimentals': None}
[None, None, None, None, None, None, None, None, None, 16, 30, 0.8, 'DDIM', 8, 0.0, 0.8, 1.0, 1.0, 2.0, 2.0, None]
Loading Whisper model: large-v2
Loaded Whisper model
['text', 'delimiter', 'emotion', 'prompt', 'voice', 'mic_audio', 'voice_latents_chunks', 'candidates', 'seed', 'num_autoregressive_samples', 'diffusion_iterations', 'temperature', 'diffusion_sampler', 'breathing_room', 'cvvp_weight', 'top_p', 'diffusion_temperature', 'length_penalty', 'repetition_penalty', 'cond_free_k', 'experimentals']
{'text': None, 'delimiter': None, 'emotion': None, 'prompt': None, 'voice': None, 'mic_audio': None, 'voice_latents_chunks': None, 'candidates': None, 'seed': None, 'num_autoregressive_samples': 16, 'diffusion_iterations': 30, 'temperature': 0.8, 'diffusion_sampler': 'DDIM', 'breathing_room': 8, 'cvvp_weight': 0.0, 'top_p': 0.8, 'diffusion_temperature': 1.0, 'length_penalty': 1.0, 'repetition_penalty': 2.0, 'cond_free_k': 2.0, 'experimentals': None}
[None, None, None, None, None, None, None, None, None, 16, 30, 0.8, 'DDIM', 8, 0.0, 0.8, 1.0, 1.0, 2.0, 2.0, None]
Transcribed file: ./voices\Kiwi\kiwi.wav, 48 found.
Unloaded Whisper
Culled 3 lines
Culled 6 lines
Culled 6 lines
Culled 7 lines
Culled 16 lines
Spawning process:  train.bat ./training/Kiwi/train.yaml
[Training] [2023-03-11T11:31:16.779062]
[Training] [2023-03-11T11:31:16.784285] (venv) D:\ai-voice-cloning>call .\venv\Scripts\activate.bat
[Training] [2023-03-11T11:31:20.181173] 23-03-11 11:31:20.180 - INFO:   name: Kiwi
[Training] [2023-03-11T11:31:20.186409]   model: extensibletrainer
[Training] [2023-03-11T11:31:20.191711]   scale: 1
[Training] [2023-03-11T11:31:20.197506]   gpu_ids: [0]
[Training] [2023-03-11T11:31:20.202260]   start_step: 0
[Training] [2023-03-11T11:31:20.207014]   checkpointing_enabled: True
[Training] [2023-03-11T11:31:20.212312]   fp16: False
[Training] [2023-03-11T11:31:20.217594]   bitsandbytes: True
[Training] [2023-03-11T11:31:20.222881]   gpus: 2
[Training] [2023-03-11T11:31:20.227635]   datasets:[
[Training] [2023-03-11T11:31:20.231852]     train:[
[Training] [2023-03-11T11:31:20.237150]       name: training
[Training] [2023-03-11T11:31:20.241957]       n_workers: 2
[Training] [2023-03-11T11:31:20.246724]       batch_size: 32
[Training] [2023-03-11T11:31:20.251995]       mode: paired_voice_audio
[Training] [2023-03-11T11:31:20.257301]       path: ./training/Kiwi/train.txt
[Training] [2023-03-11T11:31:20.261497]       fetcher_mode: ['lj']
[Training] [2023-03-11T11:31:20.267319]       phase: train
[Training] [2023-03-11T11:31:20.272613]       max_wav_length: 255995
[Training] [2023-03-11T11:31:20.277890]       max_text_length: 200
[Training] [2023-03-11T11:31:20.283200]       sample_rate: 22050
[Training] [2023-03-11T11:31:20.287976]       load_conditioning: True
[Training] [2023-03-11T11:31:20.292200]       num_conditioning_candidates: 2
[Training] [2023-03-11T11:31:20.296440]       conditioning_length: 44000
[Training] [2023-03-11T11:31:20.302276]       use_bpe_tokenizer: True
[Training] [2023-03-11T11:31:20.307020]       tokenizer_vocab: ./models/tortoise/bpe_lowercase_asr_256.json
[Training] [2023-03-11T11:31:20.311244]       load_aligned_codes: False
[Training] [2023-03-11T11:31:20.316021]       data_type: img
[Training] [2023-03-11T11:31:20.322429]     ]
[Training] [2023-03-11T11:31:20.327176]     val:[
[Training] [2023-03-11T11:31:20.332460]       name: validation
[Training] [2023-03-11T11:31:20.337245]       n_workers: 2
[Training] [2023-03-11T11:31:20.343082]       batch_size: 6
[Training] [2023-03-11T11:31:20.347300]       mode: paired_voice_audio
[Training] [2023-03-11T11:31:20.352062]       path: ./training/Kiwi/validation.txt
[Training] [2023-03-11T11:31:20.357349]       fetcher_mode: ['lj']
[Training] [2023-03-11T11:31:20.361574]       phase: val
[Training] [2023-03-11T11:31:20.365805]       max_wav_length: 255995
[Training] [2023-03-11T11:31:20.370559]       max_text_length: 200
[Training] [2023-03-11T11:31:20.374784]       sample_rate: 22050
[Training] [2023-03-11T11:31:20.379001]       load_conditioning: True
[Training] [2023-03-11T11:31:20.383768]       num_conditioning_candidates: 2
[Training] [2023-03-11T11:31:20.389048]       conditioning_length: 44000
[Training] [2023-03-11T11:31:20.393782]       use_bpe_tokenizer: True
[Training] [2023-03-11T11:31:20.398022]       tokenizer_vocab: ./models/tortoise/bpe_lowercase_asr_256.json
[Training] [2023-03-11T11:31:20.403361]       load_aligned_codes: False
[Training] [2023-03-11T11:31:20.408130]       data_type: img
[Training] [2023-03-11T11:31:20.412364]     ]
[Training] [2023-03-11T11:31:20.416039]   ]
[Training] [2023-03-11T11:31:20.420291]   steps:[
[Training] [2023-03-11T11:31:20.423957]     gpt_train:[
[Training] [2023-03-11T11:31:20.427137]       training: gpt
[Training] [2023-03-11T11:31:20.430822]       loss_log_buffer: 500
[Training] [2023-03-11T11:31:20.434503]       optimizer: adamw
[Training] [2023-03-11T11:31:20.438735]       optimizer_params:[
[Training] [2023-03-11T11:31:20.443515]         lr: 1e-05
[Training] [2023-03-11T11:31:20.447747]         weight_decay: 0.01
[Training] [2023-03-11T11:31:20.451993]         beta1: 0.9
[Training] [2023-03-11T11:31:20.457261]         beta2: 0.96
[Training] [2023-03-11T11:31:20.462588]       ]
[Training] [2023-03-11T11:31:20.467883]       clip_grad_eps: 4
[Training] [2023-03-11T11:31:20.472630]       injectors:[
[Training] [2023-03-11T11:31:20.477393]         paired_to_mel:[
[Training] [2023-03-11T11:31:20.482172]           type: torch_mel_spectrogram
[Training] [2023-03-11T11:31:20.488030]           mel_norm_file: ./models/tortoise/clips_mel_norms.pth
[Training] [2023-03-11T11:31:20.492797]           in: wav
[Training] [2023-03-11T11:31:20.498093]           out: paired_mel
[Training] [2023-03-11T11:31:20.502860]         ]
[Training] [2023-03-11T11:31:20.507624]         paired_cond_to_mel:[
[Training] [2023-03-11T11:31:20.512940]           type: for_each
[Training] [2023-03-11T11:31:20.518262]           subtype: torch_mel_spectrogram
[Training] [2023-03-11T11:31:20.523578]           mel_norm_file: ./models/tortoise/clips_mel_norms.pth
[Training] [2023-03-11T11:31:20.527840]           in: conditioning
[Training] [2023-03-11T11:31:20.532608]           out: paired_conditioning_mel
[Training] [2023-03-11T11:31:20.537916]         ]
[Training] [2023-03-11T11:31:20.543741]         to_codes:[
[Training] [2023-03-11T11:31:20.548499]           type: discrete_token
[Training] [2023-03-11T11:31:20.554328]           in: paired_mel
[Training] [2023-03-11T11:31:20.558551]           out: paired_mel_codes
[Training] [2023-03-11T11:31:20.563340]           dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml
[Training] [2023-03-11T11:31:20.567590]         ]
[Training] [2023-03-11T11:31:20.575504]         paired_fwd_text:[
[Training] [2023-03-11T11:31:20.581309]           type: generator
[Training] [2023-03-11T11:31:20.586618]           generator: gpt
[Training] [2023-03-11T11:31:20.595056]           in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
[Training] [2023-03-11T11:31:20.600899]           out: ['loss_text_ce', 'loss_mel_ce', 'logits']
[Training] [2023-03-11T11:31:20.606203]         ]
[Training] [2023-03-11T11:31:20.610427]       ]
[Training] [2023-03-11T11:31:20.614660]       losses:[
[Training] [2023-03-11T11:31:20.618894]         text_ce:[
[Training] [2023-03-11T11:31:20.624193]           type: direct
[Training] [2023-03-11T11:31:20.630016]           weight: 0.01
[Training] [2023-03-11T11:31:20.635323]           key: loss_text_ce
[Training] [2023-03-11T11:31:20.640593]         ]
[Training] [2023-03-11T11:31:20.645381]         mel_ce:[
[Training] [2023-03-11T11:31:20.651764]           type: direct
[Training] [2023-03-11T11:31:20.655998]           weight: 1
[Training] [2023-03-11T11:31:20.660230]           key: loss_mel_ce
[Training] [2023-03-11T11:31:20.663913]         ]
[Training] [2023-03-11T11:31:20.668176]       ]
[Training] [2023-03-11T11:31:20.672958]     ]
[Training] [2023-03-11T11:31:20.677175]   ]
[Training] [2023-03-11T11:31:20.681948]   networks:[
[Training] [2023-03-11T11:31:20.687256]     gpt:[
[Training] [2023-03-11T11:31:20.693065]       type: generator
[Training] [2023-03-11T11:31:20.698362]       which_model_G: unified_voice2
[Training] [2023-03-11T11:31:20.703169]       kwargs:[
[Training] [2023-03-11T11:31:20.707949]         layers: 30
[Training] [2023-03-11T11:31:20.713261]         model_dim: 1024
[Training] [2023-03-11T11:31:20.718548]         heads: 16
[Training] [2023-03-11T11:31:20.723301]         max_text_tokens: 402
[Training] [2023-03-11T11:31:20.728057]         max_mel_tokens: 604
[Training] [2023-03-11T11:31:20.733330]         max_conditioning_inputs: 2
[Training] [2023-03-11T11:31:20.738079]         mel_length_compression: 1024
[Training] [2023-03-11T11:31:20.743896]         number_text_tokens: 256
[Training] [2023-03-11T11:31:20.749199]         number_mel_codes: 8194
[Training] [2023-03-11T11:31:20.753941]         start_mel_token: 8192
[Training] [2023-03-11T11:31:20.759242]         stop_mel_token: 8193
[Training] [2023-03-11T11:31:20.764552]         start_text_token: 255
[Training] [2023-03-11T11:31:20.770378]         train_solo_embeddings: False
[Training] [2023-03-11T11:31:20.774597]         use_mel_codes_as_input: True
[Training] [2023-03-11T11:31:20.778811]         checkpointing: True
[Training] [2023-03-11T11:31:20.783576]         tortoise_compat: True
[Training] [2023-03-11T11:31:20.788837]       ]
[Training] [2023-03-11T11:31:20.794120]     ]
[Training] [2023-03-11T11:31:20.798342]   ]
[Training] [2023-03-11T11:31:20.803626]   path:[
[Training] [2023-03-11T11:31:20.808371]     strict_load: True
[Training] [2023-03-11T11:31:20.813136]     pretrain_model_gpt: ./models/tortoise/autoregressive.pth
[Training] [2023-03-11T11:31:20.817909]     root: ./
[Training] [2023-03-11T11:31:20.822663]     experiments_root: ./training\Kiwi\finetune
[Training] [2023-03-11T11:31:20.827391]     models: ./training\Kiwi\finetune\models
[Training] [2023-03-11T11:31:20.832656]     training_state: ./training\Kiwi\finetune\training_state
[Training] [2023-03-11T11:31:20.837424]     log: ./training\Kiwi\finetune
[Training] [2023-03-11T11:31:20.841667]     val_images: ./training\Kiwi\finetune\val_images
[Training] [2023-03-11T11:31:20.846937]   ]
[Training] [2023-03-11T11:31:20.852774]   train:[
[Training] [2023-03-11T11:31:20.857533]     niter: 500
[Training] [2023-03-11T11:31:20.861757]     warmup_iter: -1
[Training] [2023-03-11T11:31:20.865472]     mega_batch_factor: 5
[Training] [2023-03-11T11:31:20.870232]     val_freq: 5
[Training] [2023-03-11T11:31:20.875012]     ema_enabled: False
[Training] [2023-03-11T11:31:20.880293]     default_lr_scheme: MultiStepLR
[Training] [2023-03-11T11:31:20.885099]     gen_lr_steps: [9, 18, 25, 33]
[Training] [2023-03-11T11:31:20.889360]     lr_gamma: 0.5
[Training] [2023-03-11T11:31:20.894124]   ]
[Training] [2023-03-11T11:31:20.898878]   eval:[
[Training] [2023-03-11T11:31:20.903638]     pure: True
[Training] [2023-03-11T11:31:20.908948]     output_state: gen
[Training] [2023-03-11T11:31:20.913713]   ]
[Training] [2023-03-11T11:31:20.918486]   logger:[
[Training] [2023-03-11T11:31:20.922715]     save_checkpoint_freq: 100
[Training] [2023-03-11T11:31:20.928026]     visuals: ['gen', 'mel']
[Training] [2023-03-11T11:31:20.933375]     visual_debug_rate: 100
[Training] [2023-03-11T11:31:20.938679]     is_mel_spectrogram: True
[Training] [2023-03-11T11:31:20.943976]   ]
[Training] [2023-03-11T11:31:20.948226]   is_train: True
[Training] [2023-03-11T11:31:20.953543]   dist: False
[Training] [2023-03-11T11:31:20.958833]
[Training] [2023-03-11T11:31:20.963599] 23-03-11 11:31:20.181 - INFO: Random seed: 5371
[Training] [2023-03-11T11:31:21.327598] 23-03-11 11:31:21.327 - INFO: Number of training data elements: 32, iters: 1
[Training] [2023-03-11T11:31:21.332884] 23-03-11 11:31:21.327 - INFO: Total epochs needed: 500 for iters 500
[Training] [2023-03-11T11:31:21.338187] 23-03-11 11:31:21.328 - INFO: Number of val images in [validation]: 16
[Training] [2023-03-11T11:31:22.513448] D:\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
[Training] [2023-03-11T11:31:22.519745]   warnings.warn(
[Training] [2023-03-11T11:31:34.062301] 23-03-11 11:31:34.061 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2023-03-11T11:31:35.259653] 23-03-11 11:31:35.259 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-03-11T11:31:39.787446] D:\ai-voice-cloning\./modules/dlas/codes\models\audio\tts\tacotron2\taco_utils.py:17: WavFileWarning: Chunk (non-data) not understood, skipping it.
[Training] [2023-03-11T11:31:39.787971]   sampling_rate, data = read(full_path)
[Training] [2023-03-11T11:31:40.588502] D:\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:138: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
[Training] [2023-03-11T11:31:40.588517]   warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
[Training] [2023-03-11T11:31:49.033178] 23-03-11 11:31:49.032 - INFO: Training Metrics: {"loss_text_ce": 6.235421657562256, "loss_mel_ce": 2.967024803161621, "loss_gpt_total": 3.029378890991211, "lr": 1e-05, "it": 1, "step": 1, "steps": 1, "epoch": 0, "iteration_rate": 0.26371678709983826}

> > Seems to have broken multi-GPU training on Windows > > To be technical, there never was. I'll never be able to validate it myself for Windows, as my GPUs are two 6800XTs and a 2060. > > However, I imagine you can edit [./src/train.py:74](https://git.ecker.tech/mrq/ai-voice-cloning/src/branch/master/src/train.py#L74) to change nccl to whatever other backend. It's only that because that's what base DLAS used. > > --- > > ...which seems like only MPI, if you somehow compile PyTorch yourself. It worked for me quite well previous to the change, using 2x RTX 3060's and Windows 10. Just to make sure I wasn't mis-remembering, I reverted to the previous commit (2feb6da0c0a36cf0139186b7b093927591465658): ``` PS D:\ai-voice-cloning> git reset --hard 2feb6da0c0a36cf0139186b7b093927591465658 HEAD is now at 2feb6da cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription PS D:\ai-voice-cloning> .\start.bat D:\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. ['text', 'delimiter', 'emotion', 'prompt', 'voice', 'mic_audio', 'voice_latents_chunks', 'candidates', 'seed', 'num_autoregressive_samples', 'diffusion_iterations', 'temperature', 'diffusion_sampler', 'breathing_room', 'cvvp_weight', 'top_p', 'diffusion_temperature', 'length_penalty', 'repetition_penalty', 'cond_free_k', 'experimentals'] {'text': None, 'delimiter': None, 'emotion': None, 'prompt': None, 'voice': None, 'mic_audio': None, 'voice_latents_chunks': None, 'candidates': None, 'seed': None, 'num_autoregressive_samples': 16, 'diffusion_iterations': 30, 'temperature': 0.8, 'diffusion_sampler': 'DDIM', 'breathing_room': 8, 'cvvp_weight': 0.0, 'top_p': 0.8, 'diffusion_temperature': 1.0, 'length_penalty': 1.0, 'repetition_penalty': 2.0, 'cond_free_k': 2.0, 'experimentals': None} [None, None, None, None, None, None, None, None, None, 16, 30, 0.8, 'DDIM', 8, 0.0, 0.8, 1.0, 1.0, 2.0, 2.0, None] Loading Whisper model: large-v2 Loaded Whisper model ['text', 'delimiter', 'emotion', 'prompt', 'voice', 'mic_audio', 'voice_latents_chunks', 'candidates', 'seed', 'num_autoregressive_samples', 'diffusion_iterations', 'temperature', 'diffusion_sampler', 'breathing_room', 'cvvp_weight', 'top_p', 'diffusion_temperature', 'length_penalty', 'repetition_penalty', 'cond_free_k', 'experimentals'] {'text': None, 'delimiter': None, 'emotion': None, 'prompt': None, 'voice': None, 'mic_audio': None, 'voice_latents_chunks': None, 'candidates': None, 'seed': None, 'num_autoregressive_samples': 16, 'diffusion_iterations': 30, 'temperature': 0.8, 'diffusion_sampler': 'DDIM', 'breathing_room': 8, 'cvvp_weight': 0.0, 'top_p': 0.8, 'diffusion_temperature': 1.0, 'length_penalty': 1.0, 'repetition_penalty': 2.0, 'cond_free_k': 2.0, 'experimentals': None} [None, None, None, None, None, None, None, None, None, 16, 30, 0.8, 'DDIM', 8, 0.0, 0.8, 1.0, 1.0, 2.0, 2.0, None] Transcribed file: ./voices\Kiwi\kiwi.wav, 48 found. Unloaded Whisper Culled 3 lines Culled 6 lines Culled 6 lines Culled 7 lines Culled 16 lines Spawning process: train.bat ./training/Kiwi/train.yaml [Training] [2023-03-11T11:31:16.779062] [Training] [2023-03-11T11:31:16.784285] (venv) D:\ai-voice-cloning>call .\venv\Scripts\activate.bat [Training] [2023-03-11T11:31:20.181173] 23-03-11 11:31:20.180 - INFO: name: Kiwi [Training] [2023-03-11T11:31:20.186409] model: extensibletrainer [Training] [2023-03-11T11:31:20.191711] scale: 1 [Training] [2023-03-11T11:31:20.197506] gpu_ids: [0] [Training] [2023-03-11T11:31:20.202260] start_step: 0 [Training] [2023-03-11T11:31:20.207014] checkpointing_enabled: True [Training] [2023-03-11T11:31:20.212312] fp16: False [Training] [2023-03-11T11:31:20.217594] bitsandbytes: True [Training] [2023-03-11T11:31:20.222881] gpus: 2 [Training] [2023-03-11T11:31:20.227635] datasets:[ [Training] [2023-03-11T11:31:20.231852] train:[ [Training] [2023-03-11T11:31:20.237150] name: training [Training] [2023-03-11T11:31:20.241957] n_workers: 2 [Training] [2023-03-11T11:31:20.246724] batch_size: 32 [Training] [2023-03-11T11:31:20.251995] mode: paired_voice_audio [Training] [2023-03-11T11:31:20.257301] path: ./training/Kiwi/train.txt [Training] [2023-03-11T11:31:20.261497] fetcher_mode: ['lj'] [Training] [2023-03-11T11:31:20.267319] phase: train [Training] [2023-03-11T11:31:20.272613] max_wav_length: 255995 [Training] [2023-03-11T11:31:20.277890] max_text_length: 200 [Training] [2023-03-11T11:31:20.283200] sample_rate: 22050 [Training] [2023-03-11T11:31:20.287976] load_conditioning: True [Training] [2023-03-11T11:31:20.292200] num_conditioning_candidates: 2 [Training] [2023-03-11T11:31:20.296440] conditioning_length: 44000 [Training] [2023-03-11T11:31:20.302276] use_bpe_tokenizer: True [Training] [2023-03-11T11:31:20.307020] tokenizer_vocab: ./models/tortoise/bpe_lowercase_asr_256.json [Training] [2023-03-11T11:31:20.311244] load_aligned_codes: False [Training] [2023-03-11T11:31:20.316021] data_type: img [Training] [2023-03-11T11:31:20.322429] ] [Training] [2023-03-11T11:31:20.327176] val:[ [Training] [2023-03-11T11:31:20.332460] name: validation [Training] [2023-03-11T11:31:20.337245] n_workers: 2 [Training] [2023-03-11T11:31:20.343082] batch_size: 6 [Training] [2023-03-11T11:31:20.347300] mode: paired_voice_audio [Training] [2023-03-11T11:31:20.352062] path: ./training/Kiwi/validation.txt [Training] [2023-03-11T11:31:20.357349] fetcher_mode: ['lj'] [Training] [2023-03-11T11:31:20.361574] phase: val [Training] [2023-03-11T11:31:20.365805] max_wav_length: 255995 [Training] [2023-03-11T11:31:20.370559] max_text_length: 200 [Training] [2023-03-11T11:31:20.374784] sample_rate: 22050 [Training] [2023-03-11T11:31:20.379001] load_conditioning: True [Training] [2023-03-11T11:31:20.383768] num_conditioning_candidates: 2 [Training] [2023-03-11T11:31:20.389048] conditioning_length: 44000 [Training] [2023-03-11T11:31:20.393782] use_bpe_tokenizer: True [Training] [2023-03-11T11:31:20.398022] tokenizer_vocab: ./models/tortoise/bpe_lowercase_asr_256.json [Training] [2023-03-11T11:31:20.403361] load_aligned_codes: False [Training] [2023-03-11T11:31:20.408130] data_type: img [Training] [2023-03-11T11:31:20.412364] ] [Training] [2023-03-11T11:31:20.416039] ] [Training] [2023-03-11T11:31:20.420291] steps:[ [Training] [2023-03-11T11:31:20.423957] gpt_train:[ [Training] [2023-03-11T11:31:20.427137] training: gpt [Training] [2023-03-11T11:31:20.430822] loss_log_buffer: 500 [Training] [2023-03-11T11:31:20.434503] optimizer: adamw [Training] [2023-03-11T11:31:20.438735] optimizer_params:[ [Training] [2023-03-11T11:31:20.443515] lr: 1e-05 [Training] [2023-03-11T11:31:20.447747] weight_decay: 0.01 [Training] [2023-03-11T11:31:20.451993] beta1: 0.9 [Training] [2023-03-11T11:31:20.457261] beta2: 0.96 [Training] [2023-03-11T11:31:20.462588] ] [Training] [2023-03-11T11:31:20.467883] clip_grad_eps: 4 [Training] [2023-03-11T11:31:20.472630] injectors:[ [Training] [2023-03-11T11:31:20.477393] paired_to_mel:[ [Training] [2023-03-11T11:31:20.482172] type: torch_mel_spectrogram [Training] [2023-03-11T11:31:20.488030] mel_norm_file: ./models/tortoise/clips_mel_norms.pth [Training] [2023-03-11T11:31:20.492797] in: wav [Training] [2023-03-11T11:31:20.498093] out: paired_mel [Training] [2023-03-11T11:31:20.502860] ] [Training] [2023-03-11T11:31:20.507624] paired_cond_to_mel:[ [Training] [2023-03-11T11:31:20.512940] type: for_each [Training] [2023-03-11T11:31:20.518262] subtype: torch_mel_spectrogram [Training] [2023-03-11T11:31:20.523578] mel_norm_file: ./models/tortoise/clips_mel_norms.pth [Training] [2023-03-11T11:31:20.527840] in: conditioning [Training] [2023-03-11T11:31:20.532608] out: paired_conditioning_mel [Training] [2023-03-11T11:31:20.537916] ] [Training] [2023-03-11T11:31:20.543741] to_codes:[ [Training] [2023-03-11T11:31:20.548499] type: discrete_token [Training] [2023-03-11T11:31:20.554328] in: paired_mel [Training] [2023-03-11T11:31:20.558551] out: paired_mel_codes [Training] [2023-03-11T11:31:20.563340] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml [Training] [2023-03-11T11:31:20.567590] ] [Training] [2023-03-11T11:31:20.575504] paired_fwd_text:[ [Training] [2023-03-11T11:31:20.581309] type: generator [Training] [2023-03-11T11:31:20.586618] generator: gpt [Training] [2023-03-11T11:31:20.595056] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths'] [Training] [2023-03-11T11:31:20.600899] out: ['loss_text_ce', 'loss_mel_ce', 'logits'] [Training] [2023-03-11T11:31:20.606203] ] [Training] [2023-03-11T11:31:20.610427] ] [Training] [2023-03-11T11:31:20.614660] losses:[ [Training] [2023-03-11T11:31:20.618894] text_ce:[ [Training] [2023-03-11T11:31:20.624193] type: direct [Training] [2023-03-11T11:31:20.630016] weight: 0.01 [Training] [2023-03-11T11:31:20.635323] key: loss_text_ce [Training] [2023-03-11T11:31:20.640593] ] [Training] [2023-03-11T11:31:20.645381] mel_ce:[ [Training] [2023-03-11T11:31:20.651764] type: direct [Training] [2023-03-11T11:31:20.655998] weight: 1 [Training] [2023-03-11T11:31:20.660230] key: loss_mel_ce [Training] [2023-03-11T11:31:20.663913] ] [Training] [2023-03-11T11:31:20.668176] ] [Training] [2023-03-11T11:31:20.672958] ] [Training] [2023-03-11T11:31:20.677175] ] [Training] [2023-03-11T11:31:20.681948] networks:[ [Training] [2023-03-11T11:31:20.687256] gpt:[ [Training] [2023-03-11T11:31:20.693065] type: generator [Training] [2023-03-11T11:31:20.698362] which_model_G: unified_voice2 [Training] [2023-03-11T11:31:20.703169] kwargs:[ [Training] [2023-03-11T11:31:20.707949] layers: 30 [Training] [2023-03-11T11:31:20.713261] model_dim: 1024 [Training] [2023-03-11T11:31:20.718548] heads: 16 [Training] [2023-03-11T11:31:20.723301] max_text_tokens: 402 [Training] [2023-03-11T11:31:20.728057] max_mel_tokens: 604 [Training] [2023-03-11T11:31:20.733330] max_conditioning_inputs: 2 [Training] [2023-03-11T11:31:20.738079] mel_length_compression: 1024 [Training] [2023-03-11T11:31:20.743896] number_text_tokens: 256 [Training] [2023-03-11T11:31:20.749199] number_mel_codes: 8194 [Training] [2023-03-11T11:31:20.753941] start_mel_token: 8192 [Training] [2023-03-11T11:31:20.759242] stop_mel_token: 8193 [Training] [2023-03-11T11:31:20.764552] start_text_token: 255 [Training] [2023-03-11T11:31:20.770378] train_solo_embeddings: False [Training] [2023-03-11T11:31:20.774597] use_mel_codes_as_input: True [Training] [2023-03-11T11:31:20.778811] checkpointing: True [Training] [2023-03-11T11:31:20.783576] tortoise_compat: True [Training] [2023-03-11T11:31:20.788837] ] [Training] [2023-03-11T11:31:20.794120] ] [Training] [2023-03-11T11:31:20.798342] ] [Training] [2023-03-11T11:31:20.803626] path:[ [Training] [2023-03-11T11:31:20.808371] strict_load: True [Training] [2023-03-11T11:31:20.813136] pretrain_model_gpt: ./models/tortoise/autoregressive.pth [Training] [2023-03-11T11:31:20.817909] root: ./ [Training] [2023-03-11T11:31:20.822663] experiments_root: ./training\Kiwi\finetune [Training] [2023-03-11T11:31:20.827391] models: ./training\Kiwi\finetune\models [Training] [2023-03-11T11:31:20.832656] training_state: ./training\Kiwi\finetune\training_state [Training] [2023-03-11T11:31:20.837424] log: ./training\Kiwi\finetune [Training] [2023-03-11T11:31:20.841667] val_images: ./training\Kiwi\finetune\val_images [Training] [2023-03-11T11:31:20.846937] ] [Training] [2023-03-11T11:31:20.852774] train:[ [Training] [2023-03-11T11:31:20.857533] niter: 500 [Training] [2023-03-11T11:31:20.861757] warmup_iter: -1 [Training] [2023-03-11T11:31:20.865472] mega_batch_factor: 5 [Training] [2023-03-11T11:31:20.870232] val_freq: 5 [Training] [2023-03-11T11:31:20.875012] ema_enabled: False [Training] [2023-03-11T11:31:20.880293] default_lr_scheme: MultiStepLR [Training] [2023-03-11T11:31:20.885099] gen_lr_steps: [9, 18, 25, 33] [Training] [2023-03-11T11:31:20.889360] lr_gamma: 0.5 [Training] [2023-03-11T11:31:20.894124] ] [Training] [2023-03-11T11:31:20.898878] eval:[ [Training] [2023-03-11T11:31:20.903638] pure: True [Training] [2023-03-11T11:31:20.908948] output_state: gen [Training] [2023-03-11T11:31:20.913713] ] [Training] [2023-03-11T11:31:20.918486] logger:[ [Training] [2023-03-11T11:31:20.922715] save_checkpoint_freq: 100 [Training] [2023-03-11T11:31:20.928026] visuals: ['gen', 'mel'] [Training] [2023-03-11T11:31:20.933375] visual_debug_rate: 100 [Training] [2023-03-11T11:31:20.938679] is_mel_spectrogram: True [Training] [2023-03-11T11:31:20.943976] ] [Training] [2023-03-11T11:31:20.948226] is_train: True [Training] [2023-03-11T11:31:20.953543] dist: False [Training] [2023-03-11T11:31:20.958833] [Training] [2023-03-11T11:31:20.963599] 23-03-11 11:31:20.181 - INFO: Random seed: 5371 [Training] [2023-03-11T11:31:21.327598] 23-03-11 11:31:21.327 - INFO: Number of training data elements: 32, iters: 1 [Training] [2023-03-11T11:31:21.332884] 23-03-11 11:31:21.327 - INFO: Total epochs needed: 500 for iters 500 [Training] [2023-03-11T11:31:21.338187] 23-03-11 11:31:21.328 - INFO: Number of val images in [validation]: 16 [Training] [2023-03-11T11:31:22.513448] D:\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`. [Training] [2023-03-11T11:31:22.519745] warnings.warn( [Training] [2023-03-11T11:31:34.062301] 23-03-11 11:31:34.061 - INFO: Loading model for [./models/tortoise/autoregressive.pth] [Training] [2023-03-11T11:31:35.259653] 23-03-11 11:31:35.259 - INFO: Start training from epoch: 0, iter: 0 [Training] [2023-03-11T11:31:39.787446] D:\ai-voice-cloning\./modules/dlas/codes\models\audio\tts\tacotron2\taco_utils.py:17: WavFileWarning: Chunk (non-data) not understood, skipping it. [Training] [2023-03-11T11:31:39.787971] sampling_rate, data = read(full_path) [Training] [2023-03-11T11:31:40.588502] D:\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:138: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate [Training] [2023-03-11T11:31:40.588517] warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " [Training] [2023-03-11T11:31:49.033178] 23-03-11 11:31:49.032 - INFO: Training Metrics: {"loss_text_ce": 6.235421657562256, "loss_mel_ce": 2.967024803161621, "loss_gpt_total": 3.029378890991211, "lr": 1e-05, "it": 1, "step": 1, "steps": 1, "epoch": 0, "iteration_rate": 0.26371678709983826} ```

mrq commented

2023-03-11 19:29:19 +00:00

It worked for me quite well previous to the change, using 2x RTX 3060's and Windows 10.

Not possible. The GPU count never got passed on Windows from the UI => train.bat => ./src/train.py. The launcher is default to none, so it won't even bother using a job launcher.

The training script will force one GPU, per this block:

if launcher == 'none':  # disabled distributed training
        opt['dist'] = False
        trainer.rank = -1
        if len(opt['gpu_ids']) == 1:
            torch.cuda.set_device(opt['gpu_ids'][0])
        print('Disabled distributed training.')

> It worked for me quite well previous to the change, using 2x RTX 3060's and Windows 10. Not possible. The GPU count never got passed on Windows from the UI => `train.bat` => `./src/train.py`. The launcher is default to `none`, so it won't even bother using a job launcher. The training script will force one GPU, per this block: ``` if launcher == 'none': # disabled distributed training opt['dist'] = False trainer.rank = -1 if len(opt['gpu_ids']) == 1: torch.cuda.set_device(opt['gpu_ids'][0]) print('Disabled distributed training.') ```

👍 1

psammites commented

2023-03-11 19:36:07 +00:00

Ahh, bugger. I could swear I saw a performance boost but it must have been from offloading everything else I was doing to the other GPU.

Will try in WSL, thanks!

Ahh, bugger. I could swear I saw a performance boost but it must have been from offloading everything else I was doing to the other GPU. Will try in WSL, thanks!

mrq commented

2023-03-13 17:43:38 +00:00

Allegedly WSL2 does support nccl, per NVIDIA's doc/blog/guide. I'm not too well-versed in how robust WSL2 is, but I imagine just using the Linux install scripts will work.

Allegedly WSL2 does support nccl, per [NVIDIA's doc/blog/guide](https://docs.nvidia.com/cuda/wsl-user-guide/index.html). I'm not too well-versed in how robust WSL2 is, but I imagine just using the Linux install scripts will work.

mrq closed this issue

2023-03-13 17:43:38 +00:00

psammites commented

2023-03-13 21:12:37 +00:00

Allegedly WSL2 does support nccl, per NVIDIA's doc/blog/guide. I'm not too well-versed in how robust WSL2 is, but I imagine just using the Linux install scripts will work.

Can confirm working, but only with the Ubuntu distribution and not out of the box (needs some massaging of library paths to get bitsandbytes to find CUDA).

> Allegedly WSL2 does support nccl, per [NVIDIA's doc/blog/guide](https://docs.nvidia.com/cuda/wsl-user-guide/index.html). I'm not too well-versed in how robust WSL2 is, but I imagine just using the Linux install scripts will work. Can confirm working, but only with the Ubuntu distribution and not out of the box (needs some massaging of library paths to get bitsandbytes to find CUDA).

mrq commented

2023-03-13 21:49:10 +00:00

Noted, I'll make sure to add that as a note in the wiki.

Sign in to join this conversation.