Commit 008a1f5f8f Seems to have broken multi-GPU training on Windows due to lack of nccl support #115

Closed
opened 2023-03-11 18:47:31 +00:00 by psammites · 7 comments

Per PyTorch documentation on Torch.Distributed, nccl is not supported on Windows and consequently the training process fails to initialize when run with multiple GPU's.

Error produced is "The client socket has failed to connect to [localhost]:1234"

Per [PyTorch documentation on Torch.Distributed](https://pytorch.org/docs/stable/distributed.html), nccl is not supported on Windows and consequently the training process fails to initialize when run with multiple GPU's. Error produced is "The client socket has failed to connect to [localhost]:1234"
Owner

Seems to have broken multi-GPU training on Windows

To be technical, there never was. I'll never be able to validate it myself for Windows, as my GPUs are two 6800XTs and a 2060.

However, I imagine you can edit ./src/train.py:74 to change nccl to whatever other backend. It's only that because that's what base DLAS used.


...which seems like only MPI, if you somehow compile PyTorch yourself.

> Seems to have broken multi-GPU training on Windows To be technical, there never was. I'll never be able to validate it myself for Windows, as my GPUs are two 6800XTs and a 2060. However, I imagine you can edit [./src/train.py:74](https://git.ecker.tech/mrq/ai-voice-cloning/src/branch/master/src/train.py#L74) to change nccl to whatever other backend. It's only that because that's what base DLAS used. --- ...which seems like only MPI, if you somehow compile PyTorch yourself.
Author

Seems to have broken multi-GPU training on Windows

To be technical, there never was. I'll never be able to validate it myself for Windows, as my GPUs are two 6800XTs and a 2060.

However, I imagine you can edit ./src/train.py:74 to change nccl to whatever other backend. It's only that because that's what base DLAS used.


...which seems like only MPI, if you somehow compile PyTorch yourself.

It worked for me quite well previous to the change, using 2x RTX 3060's and Windows 10.

Just to make sure I wasn't mis-remembering, I reverted to the previous commit (2feb6da0c0):

PS D:\ai-voice-cloning> git reset --hard 2feb6da0c0a36cf0139186b7b093927591465658
HEAD is now at 2feb6da cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription
PS D:\ai-voice-cloning> .\start.bat

D:\ai-voice-cloning>call .\venv\Scripts\activate.bat
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

['text', 'delimiter', 'emotion', 'prompt', 'voice', 'mic_audio', 'voice_latents_chunks', 'candidates', 'seed', 'num_autoregressive_samples', 'diffusion_iterations', 'temperature', 'diffusion_sampler', 'breathing_room', 'cvvp_weight', 'top_p', 'diffusion_temperature', 'length_penalty', 'repetition_penalty', 'cond_free_k', 'experimentals']
{'text': None, 'delimiter': None, 'emotion': None, 'prompt': None, 'voice': None, 'mic_audio': None, 'voice_latents_chunks': None, 'candidates': None, 'seed': None, 'num_autoregressive_samples': 16, 'diffusion_iterations': 30, 'temperature': 0.8, 'diffusion_sampler': 'DDIM', 'breathing_room': 8, 'cvvp_weight': 0.0, 'top_p': 0.8, 'diffusion_temperature': 1.0, 'length_penalty': 1.0, 'repetition_penalty': 2.0, 'cond_free_k': 2.0, 'experimentals': None}
[None, None, None, None, None, None, None, None, None, 16, 30, 0.8, 'DDIM', 8, 0.0, 0.8, 1.0, 1.0, 2.0, 2.0, None]
Loading Whisper model: large-v2
Loaded Whisper model
['text', 'delimiter', 'emotion', 'prompt', 'voice', 'mic_audio', 'voice_latents_chunks', 'candidates', 'seed', 'num_autoregressive_samples', 'diffusion_iterations', 'temperature', 'diffusion_sampler', 'breathing_room', 'cvvp_weight', 'top_p', 'diffusion_temperature', 'length_penalty', 'repetition_penalty', 'cond_free_k', 'experimentals']
{'text': None, 'delimiter': None, 'emotion': None, 'prompt': None, 'voice': None, 'mic_audio': None, 'voice_latents_chunks': None, 'candidates': None, 'seed': None, 'num_autoregressive_samples': 16, 'diffusion_iterations': 30, 'temperature': 0.8, 'diffusion_sampler': 'DDIM', 'breathing_room': 8, 'cvvp_weight': 0.0, 'top_p': 0.8, 'diffusion_temperature': 1.0, 'length_penalty': 1.0, 'repetition_penalty': 2.0, 'cond_free_k': 2.0, 'experimentals': None}
[None, None, None, None, None, None, None, None, None, 16, 30, 0.8, 'DDIM', 8, 0.0, 0.8, 1.0, 1.0, 2.0, 2.0, None]
Transcribed file: ./voices\Kiwi\kiwi.wav, 48 found.
Unloaded Whisper
Culled 3 lines
Culled 6 lines
Culled 6 lines
Culled 7 lines
Culled 16 lines
Spawning process:  train.bat ./training/Kiwi/train.yaml
[Training] [2023-03-11T11:31:16.779062]
[Training] [2023-03-11T11:31:16.784285] (venv) D:\ai-voice-cloning>call .\venv\Scripts\activate.bat
[Training] [2023-03-11T11:31:20.181173] 23-03-11 11:31:20.180 - INFO:   name: Kiwi
[Training] [2023-03-11T11:31:20.186409]   model: extensibletrainer
[Training] [2023-03-11T11:31:20.191711]   scale: 1
[Training] [2023-03-11T11:31:20.197506]   gpu_ids: [0]
[Training] [2023-03-11T11:31:20.202260]   start_step: 0
[Training] [2023-03-11T11:31:20.207014]   checkpointing_enabled: True
[Training] [2023-03-11T11:31:20.212312]   fp16: False
[Training] [2023-03-11T11:31:20.217594]   bitsandbytes: True
[Training] [2023-03-11T11:31:20.222881]   gpus: 2
[Training] [2023-03-11T11:31:20.227635]   datasets:[
[Training] [2023-03-11T11:31:20.231852]     train:[
[Training] [2023-03-11T11:31:20.237150]       name: training
[Training] [2023-03-11T11:31:20.241957]       n_workers: 2
[Training] [2023-03-11T11:31:20.246724]       batch_size: 32
[Training] [2023-03-11T11:31:20.251995]       mode: paired_voice_audio
[Training] [2023-03-11T11:31:20.257301]       path: ./training/Kiwi/train.txt
[Training] [2023-03-11T11:31:20.261497]       fetcher_mode: ['lj']
[Training] [2023-03-11T11:31:20.267319]       phase: train
[Training] [2023-03-11T11:31:20.272613]       max_wav_length: 255995
[Training] [2023-03-11T11:31:20.277890]       max_text_length: 200
[Training] [2023-03-11T11:31:20.283200]       sample_rate: 22050
[Training] [2023-03-11T11:31:20.287976]       load_conditioning: True
[Training] [2023-03-11T11:31:20.292200]       num_conditioning_candidates: 2
[Training] [2023-03-11T11:31:20.296440]       conditioning_length: 44000
[Training] [2023-03-11T11:31:20.302276]       use_bpe_tokenizer: True
[Training] [2023-03-11T11:31:20.307020]       tokenizer_vocab: ./models/tortoise/bpe_lowercase_asr_256.json
[Training] [2023-03-11T11:31:20.311244]       load_aligned_codes: False
[Training] [2023-03-11T11:31:20.316021]       data_type: img
[Training] [2023-03-11T11:31:20.322429]     ]
[Training] [2023-03-11T11:31:20.327176]     val:[
[Training] [2023-03-11T11:31:20.332460]       name: validation
[Training] [2023-03-11T11:31:20.337245]       n_workers: 2
[Training] [2023-03-11T11:31:20.343082]       batch_size: 6
[Training] [2023-03-11T11:31:20.347300]       mode: paired_voice_audio
[Training] [2023-03-11T11:31:20.352062]       path: ./training/Kiwi/validation.txt
[Training] [2023-03-11T11:31:20.357349]       fetcher_mode: ['lj']
[Training] [2023-03-11T11:31:20.361574]       phase: val
[Training] [2023-03-11T11:31:20.365805]       max_wav_length: 255995
[Training] [2023-03-11T11:31:20.370559]       max_text_length: 200
[Training] [2023-03-11T11:31:20.374784]       sample_rate: 22050
[Training] [2023-03-11T11:31:20.379001]       load_conditioning: True
[Training] [2023-03-11T11:31:20.383768]       num_conditioning_candidates: 2
[Training] [2023-03-11T11:31:20.389048]       conditioning_length: 44000
[Training] [2023-03-11T11:31:20.393782]       use_bpe_tokenizer: True
[Training] [2023-03-11T11:31:20.398022]       tokenizer_vocab: ./models/tortoise/bpe_lowercase_asr_256.json
[Training] [2023-03-11T11:31:20.403361]       load_aligned_codes: False
[Training] [2023-03-11T11:31:20.408130]       data_type: img
[Training] [2023-03-11T11:31:20.412364]     ]
[Training] [2023-03-11T11:31:20.416039]   ]
[Training] [2023-03-11T11:31:20.420291]   steps:[
[Training] [2023-03-11T11:31:20.423957]     gpt_train:[
[Training] [2023-03-11T11:31:20.427137]       training: gpt
[Training] [2023-03-11T11:31:20.430822]       loss_log_buffer: 500
[Training] [2023-03-11T11:31:20.434503]       optimizer: adamw
[Training] [2023-03-11T11:31:20.438735]       optimizer_params:[
[Training] [2023-03-11T11:31:20.443515]         lr: 1e-05
[Training] [2023-03-11T11:31:20.447747]         weight_decay: 0.01
[Training] [2023-03-11T11:31:20.451993]         beta1: 0.9
[Training] [2023-03-11T11:31:20.457261]         beta2: 0.96
[Training] [2023-03-11T11:31:20.462588]       ]
[Training] [2023-03-11T11:31:20.467883]       clip_grad_eps: 4
[Training] [2023-03-11T11:31:20.472630]       injectors:[
[Training] [2023-03-11T11:31:20.477393]         paired_to_mel:[
[Training] [2023-03-11T11:31:20.482172]           type: torch_mel_spectrogram
[Training] [2023-03-11T11:31:20.488030]           mel_norm_file: ./models/tortoise/clips_mel_norms.pth
[Training] [2023-03-11T11:31:20.492797]           in: wav
[Training] [2023-03-11T11:31:20.498093]           out: paired_mel
[Training] [2023-03-11T11:31:20.502860]         ]
[Training] [2023-03-11T11:31:20.507624]         paired_cond_to_mel:[
[Training] [2023-03-11T11:31:20.512940]           type: for_each
[Training] [2023-03-11T11:31:20.518262]           subtype: torch_mel_spectrogram
[Training] [2023-03-11T11:31:20.523578]           mel_norm_file: ./models/tortoise/clips_mel_norms.pth
[Training] [2023-03-11T11:31:20.527840]           in: conditioning
[Training] [2023-03-11T11:31:20.532608]           out: paired_conditioning_mel
[Training] [2023-03-11T11:31:20.537916]         ]
[Training] [2023-03-11T11:31:20.543741]         to_codes:[
[Training] [2023-03-11T11:31:20.548499]           type: discrete_token
[Training] [2023-03-11T11:31:20.554328]           in: paired_mel
[Training] [2023-03-11T11:31:20.558551]           out: paired_mel_codes
[Training] [2023-03-11T11:31:20.563340]           dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml
[Training] [2023-03-11T11:31:20.567590]         ]
[Training] [2023-03-11T11:31:20.575504]         paired_fwd_text:[
[Training] [2023-03-11T11:31:20.581309]           type: generator
[Training] [2023-03-11T11:31:20.586618]           generator: gpt
[Training] [2023-03-11T11:31:20.595056]           in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
[Training] [2023-03-11T11:31:20.600899]           out: ['loss_text_ce', 'loss_mel_ce', 'logits']
[Training] [2023-03-11T11:31:20.606203]         ]
[Training] [2023-03-11T11:31:20.610427]       ]
[Training] [2023-03-11T11:31:20.614660]       losses:[
[Training] [2023-03-11T11:31:20.618894]         text_ce:[
[Training] [2023-03-11T11:31:20.624193]           type: direct
[Training] [2023-03-11T11:31:20.630016]           weight: 0.01
[Training] [2023-03-11T11:31:20.635323]           key: loss_text_ce
[Training] [2023-03-11T11:31:20.640593]         ]
[Training] [2023-03-11T11:31:20.645381]         mel_ce:[
[Training] [2023-03-11T11:31:20.651764]           type: direct
[Training] [2023-03-11T11:31:20.655998]           weight: 1
[Training] [2023-03-11T11:31:20.660230]           key: loss_mel_ce
[Training] [2023-03-11T11:31:20.663913]         ]
[Training] [2023-03-11T11:31:20.668176]       ]
[Training] [2023-03-11T11:31:20.672958]     ]
[Training] [2023-03-11T11:31:20.677175]   ]
[Training] [2023-03-11T11:31:20.681948]   networks:[
[Training] [2023-03-11T11:31:20.687256]     gpt:[
[Training] [2023-03-11T11:31:20.693065]       type: generator
[Training] [2023-03-11T11:31:20.698362]       which_model_G: unified_voice2
[Training] [2023-03-11T11:31:20.703169]       kwargs:[
[Training] [2023-03-11T11:31:20.707949]         layers: 30
[Training] [2023-03-11T11:31:20.713261]         model_dim: 1024
[Training] [2023-03-11T11:31:20.718548]         heads: 16
[Training] [2023-03-11T11:31:20.723301]         max_text_tokens: 402
[Training] [2023-03-11T11:31:20.728057]         max_mel_tokens: 604
[Training] [2023-03-11T11:31:20.733330]         max_conditioning_inputs: 2
[Training] [2023-03-11T11:31:20.738079]         mel_length_compression: 1024
[Training] [2023-03-11T11:31:20.743896]         number_text_tokens: 256
[Training] [2023-03-11T11:31:20.749199]         number_mel_codes: 8194
[Training] [2023-03-11T11:31:20.753941]         start_mel_token: 8192
[Training] [2023-03-11T11:31:20.759242]         stop_mel_token: 8193
[Training] [2023-03-11T11:31:20.764552]         start_text_token: 255
[Training] [2023-03-11T11:31:20.770378]         train_solo_embeddings: False
[Training] [2023-03-11T11:31:20.774597]         use_mel_codes_as_input: True
[Training] [2023-03-11T11:31:20.778811]         checkpointing: True
[Training] [2023-03-11T11:31:20.783576]         tortoise_compat: True
[Training] [2023-03-11T11:31:20.788837]       ]
[Training] [2023-03-11T11:31:20.794120]     ]
[Training] [2023-03-11T11:31:20.798342]   ]
[Training] [2023-03-11T11:31:20.803626]   path:[
[Training] [2023-03-11T11:31:20.808371]     strict_load: True
[Training] [2023-03-11T11:31:20.813136]     pretrain_model_gpt: ./models/tortoise/autoregressive.pth
[Training] [2023-03-11T11:31:20.817909]     root: ./
[Training] [2023-03-11T11:31:20.822663]     experiments_root: ./training\Kiwi\finetune
[Training] [2023-03-11T11:31:20.827391]     models: ./training\Kiwi\finetune\models
[Training] [2023-03-11T11:31:20.832656]     training_state: ./training\Kiwi\finetune\training_state
[Training] [2023-03-11T11:31:20.837424]     log: ./training\Kiwi\finetune
[Training] [2023-03-11T11:31:20.841667]     val_images: ./training\Kiwi\finetune\val_images
[Training] [2023-03-11T11:31:20.846937]   ]
[Training] [2023-03-11T11:31:20.852774]   train:[
[Training] [2023-03-11T11:31:20.857533]     niter: 500
[Training] [2023-03-11T11:31:20.861757]     warmup_iter: -1
[Training] [2023-03-11T11:31:20.865472]     mega_batch_factor: 5
[Training] [2023-03-11T11:31:20.870232]     val_freq: 5
[Training] [2023-03-11T11:31:20.875012]     ema_enabled: False
[Training] [2023-03-11T11:31:20.880293]     default_lr_scheme: MultiStepLR
[Training] [2023-03-11T11:31:20.885099]     gen_lr_steps: [9, 18, 25, 33]
[Training] [2023-03-11T11:31:20.889360]     lr_gamma: 0.5
[Training] [2023-03-11T11:31:20.894124]   ]
[Training] [2023-03-11T11:31:20.898878]   eval:[
[Training] [2023-03-11T11:31:20.903638]     pure: True
[Training] [2023-03-11T11:31:20.908948]     output_state: gen
[Training] [2023-03-11T11:31:20.913713]   ]
[Training] [2023-03-11T11:31:20.918486]   logger:[
[Training] [2023-03-11T11:31:20.922715]     save_checkpoint_freq: 100
[Training] [2023-03-11T11:31:20.928026]     visuals: ['gen', 'mel']
[Training] [2023-03-11T11:31:20.933375]     visual_debug_rate: 100
[Training] [2023-03-11T11:31:20.938679]     is_mel_spectrogram: True
[Training] [2023-03-11T11:31:20.943976]   ]
[Training] [2023-03-11T11:31:20.948226]   is_train: True
[Training] [2023-03-11T11:31:20.953543]   dist: False
[Training] [2023-03-11T11:31:20.958833]
[Training] [2023-03-11T11:31:20.963599] 23-03-11 11:31:20.181 - INFO: Random seed: 5371
[Training] [2023-03-11T11:31:21.327598] 23-03-11 11:31:21.327 - INFO: Number of training data elements: 32, iters: 1
[Training] [2023-03-11T11:31:21.332884] 23-03-11 11:31:21.327 - INFO: Total epochs needed: 500 for iters 500
[Training] [2023-03-11T11:31:21.338187] 23-03-11 11:31:21.328 - INFO: Number of val images in [validation]: 16
[Training] [2023-03-11T11:31:22.513448] D:\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
[Training] [2023-03-11T11:31:22.519745]   warnings.warn(
[Training] [2023-03-11T11:31:34.062301] 23-03-11 11:31:34.061 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2023-03-11T11:31:35.259653] 23-03-11 11:31:35.259 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-03-11T11:31:39.787446] D:\ai-voice-cloning\./modules/dlas/codes\models\audio\tts\tacotron2\taco_utils.py:17: WavFileWarning: Chunk (non-data) not understood, skipping it.
[Training] [2023-03-11T11:31:39.787971]   sampling_rate, data = read(full_path)
[Training] [2023-03-11T11:31:40.588502] D:\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:138: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
[Training] [2023-03-11T11:31:40.588517]   warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
[Training] [2023-03-11T11:31:49.033178] 23-03-11 11:31:49.032 - INFO: Training Metrics: {"loss_text_ce": 6.235421657562256, "loss_mel_ce": 2.967024803161621, "loss_gpt_total": 3.029378890991211, "lr": 1e-05, "it": 1, "step": 1, "steps": 1, "epoch": 0, "iteration_rate": 0.26371678709983826}
> > Seems to have broken multi-GPU training on Windows > > To be technical, there never was. I'll never be able to validate it myself for Windows, as my GPUs are two 6800XTs and a 2060. > > However, I imagine you can edit [./src/train.py:74](https://git.ecker.tech/mrq/ai-voice-cloning/src/branch/master/src/train.py#L74) to change nccl to whatever other backend. It's only that because that's what base DLAS used. > > --- > > ...which seems like only MPI, if you somehow compile PyTorch yourself. It worked for me quite well previous to the change, using 2x RTX 3060's and Windows 10. Just to make sure I wasn't mis-remembering, I reverted to the previous commit (2feb6da0c0a36cf0139186b7b093927591465658): ``` PS D:\ai-voice-cloning> git reset --hard 2feb6da0c0a36cf0139186b7b093927591465658 HEAD is now at 2feb6da cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription PS D:\ai-voice-cloning> .\start.bat D:\ai-voice-cloning>call .\venv\Scripts\activate.bat Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. ['text', 'delimiter', 'emotion', 'prompt', 'voice', 'mic_audio', 'voice_latents_chunks', 'candidates', 'seed', 'num_autoregressive_samples', 'diffusion_iterations', 'temperature', 'diffusion_sampler', 'breathing_room', 'cvvp_weight', 'top_p', 'diffusion_temperature', 'length_penalty', 'repetition_penalty', 'cond_free_k', 'experimentals'] {'text': None, 'delimiter': None, 'emotion': None, 'prompt': None, 'voice': None, 'mic_audio': None, 'voice_latents_chunks': None, 'candidates': None, 'seed': None, 'num_autoregressive_samples': 16, 'diffusion_iterations': 30, 'temperature': 0.8, 'diffusion_sampler': 'DDIM', 'breathing_room': 8, 'cvvp_weight': 0.0, 'top_p': 0.8, 'diffusion_temperature': 1.0, 'length_penalty': 1.0, 'repetition_penalty': 2.0, 'cond_free_k': 2.0, 'experimentals': None} [None, None, None, None, None, None, None, None, None, 16, 30, 0.8, 'DDIM', 8, 0.0, 0.8, 1.0, 1.0, 2.0, 2.0, None] Loading Whisper model: large-v2 Loaded Whisper model ['text', 'delimiter', 'emotion', 'prompt', 'voice', 'mic_audio', 'voice_latents_chunks', 'candidates', 'seed', 'num_autoregressive_samples', 'diffusion_iterations', 'temperature', 'diffusion_sampler', 'breathing_room', 'cvvp_weight', 'top_p', 'diffusion_temperature', 'length_penalty', 'repetition_penalty', 'cond_free_k', 'experimentals'] {'text': None, 'delimiter': None, 'emotion': None, 'prompt': None, 'voice': None, 'mic_audio': None, 'voice_latents_chunks': None, 'candidates': None, 'seed': None, 'num_autoregressive_samples': 16, 'diffusion_iterations': 30, 'temperature': 0.8, 'diffusion_sampler': 'DDIM', 'breathing_room': 8, 'cvvp_weight': 0.0, 'top_p': 0.8, 'diffusion_temperature': 1.0, 'length_penalty': 1.0, 'repetition_penalty': 2.0, 'cond_free_k': 2.0, 'experimentals': None} [None, None, None, None, None, None, None, None, None, 16, 30, 0.8, 'DDIM', 8, 0.0, 0.8, 1.0, 1.0, 2.0, 2.0, None] Transcribed file: ./voices\Kiwi\kiwi.wav, 48 found. Unloaded Whisper Culled 3 lines Culled 6 lines Culled 6 lines Culled 7 lines Culled 16 lines Spawning process: train.bat ./training/Kiwi/train.yaml [Training] [2023-03-11T11:31:16.779062] [Training] [2023-03-11T11:31:16.784285] (venv) D:\ai-voice-cloning>call .\venv\Scripts\activate.bat [Training] [2023-03-11T11:31:20.181173] 23-03-11 11:31:20.180 - INFO: name: Kiwi [Training] [2023-03-11T11:31:20.186409] model: extensibletrainer [Training] [2023-03-11T11:31:20.191711] scale: 1 [Training] [2023-03-11T11:31:20.197506] gpu_ids: [0] [Training] [2023-03-11T11:31:20.202260] start_step: 0 [Training] [2023-03-11T11:31:20.207014] checkpointing_enabled: True [Training] [2023-03-11T11:31:20.212312] fp16: False [Training] [2023-03-11T11:31:20.217594] bitsandbytes: True [Training] [2023-03-11T11:31:20.222881] gpus: 2 [Training] [2023-03-11T11:31:20.227635] datasets:[ [Training] [2023-03-11T11:31:20.231852] train:[ [Training] [2023-03-11T11:31:20.237150] name: training [Training] [2023-03-11T11:31:20.241957] n_workers: 2 [Training] [2023-03-11T11:31:20.246724] batch_size: 32 [Training] [2023-03-11T11:31:20.251995] mode: paired_voice_audio [Training] [2023-03-11T11:31:20.257301] path: ./training/Kiwi/train.txt [Training] [2023-03-11T11:31:20.261497] fetcher_mode: ['lj'] [Training] [2023-03-11T11:31:20.267319] phase: train [Training] [2023-03-11T11:31:20.272613] max_wav_length: 255995 [Training] [2023-03-11T11:31:20.277890] max_text_length: 200 [Training] [2023-03-11T11:31:20.283200] sample_rate: 22050 [Training] [2023-03-11T11:31:20.287976] load_conditioning: True [Training] [2023-03-11T11:31:20.292200] num_conditioning_candidates: 2 [Training] [2023-03-11T11:31:20.296440] conditioning_length: 44000 [Training] [2023-03-11T11:31:20.302276] use_bpe_tokenizer: True [Training] [2023-03-11T11:31:20.307020] tokenizer_vocab: ./models/tortoise/bpe_lowercase_asr_256.json [Training] [2023-03-11T11:31:20.311244] load_aligned_codes: False [Training] [2023-03-11T11:31:20.316021] data_type: img [Training] [2023-03-11T11:31:20.322429] ] [Training] [2023-03-11T11:31:20.327176] val:[ [Training] [2023-03-11T11:31:20.332460] name: validation [Training] [2023-03-11T11:31:20.337245] n_workers: 2 [Training] [2023-03-11T11:31:20.343082] batch_size: 6 [Training] [2023-03-11T11:31:20.347300] mode: paired_voice_audio [Training] [2023-03-11T11:31:20.352062] path: ./training/Kiwi/validation.txt [Training] [2023-03-11T11:31:20.357349] fetcher_mode: ['lj'] [Training] [2023-03-11T11:31:20.361574] phase: val [Training] [2023-03-11T11:31:20.365805] max_wav_length: 255995 [Training] [2023-03-11T11:31:20.370559] max_text_length: 200 [Training] [2023-03-11T11:31:20.374784] sample_rate: 22050 [Training] [2023-03-11T11:31:20.379001] load_conditioning: True [Training] [2023-03-11T11:31:20.383768] num_conditioning_candidates: 2 [Training] [2023-03-11T11:31:20.389048] conditioning_length: 44000 [Training] [2023-03-11T11:31:20.393782] use_bpe_tokenizer: True [Training] [2023-03-11T11:31:20.398022] tokenizer_vocab: ./models/tortoise/bpe_lowercase_asr_256.json [Training] [2023-03-11T11:31:20.403361] load_aligned_codes: False [Training] [2023-03-11T11:31:20.408130] data_type: img [Training] [2023-03-11T11:31:20.412364] ] [Training] [2023-03-11T11:31:20.416039] ] [Training] [2023-03-11T11:31:20.420291] steps:[ [Training] [2023-03-11T11:31:20.423957] gpt_train:[ [Training] [2023-03-11T11:31:20.427137] training: gpt [Training] [2023-03-11T11:31:20.430822] loss_log_buffer: 500 [Training] [2023-03-11T11:31:20.434503] optimizer: adamw [Training] [2023-03-11T11:31:20.438735] optimizer_params:[ [Training] [2023-03-11T11:31:20.443515] lr: 1e-05 [Training] [2023-03-11T11:31:20.447747] weight_decay: 0.01 [Training] [2023-03-11T11:31:20.451993] beta1: 0.9 [Training] [2023-03-11T11:31:20.457261] beta2: 0.96 [Training] [2023-03-11T11:31:20.462588] ] [Training] [2023-03-11T11:31:20.467883] clip_grad_eps: 4 [Training] [2023-03-11T11:31:20.472630] injectors:[ [Training] [2023-03-11T11:31:20.477393] paired_to_mel:[ [Training] [2023-03-11T11:31:20.482172] type: torch_mel_spectrogram [Training] [2023-03-11T11:31:20.488030] mel_norm_file: ./models/tortoise/clips_mel_norms.pth [Training] [2023-03-11T11:31:20.492797] in: wav [Training] [2023-03-11T11:31:20.498093] out: paired_mel [Training] [2023-03-11T11:31:20.502860] ] [Training] [2023-03-11T11:31:20.507624] paired_cond_to_mel:[ [Training] [2023-03-11T11:31:20.512940] type: for_each [Training] [2023-03-11T11:31:20.518262] subtype: torch_mel_spectrogram [Training] [2023-03-11T11:31:20.523578] mel_norm_file: ./models/tortoise/clips_mel_norms.pth [Training] [2023-03-11T11:31:20.527840] in: conditioning [Training] [2023-03-11T11:31:20.532608] out: paired_conditioning_mel [Training] [2023-03-11T11:31:20.537916] ] [Training] [2023-03-11T11:31:20.543741] to_codes:[ [Training] [2023-03-11T11:31:20.548499] type: discrete_token [Training] [2023-03-11T11:31:20.554328] in: paired_mel [Training] [2023-03-11T11:31:20.558551] out: paired_mel_codes [Training] [2023-03-11T11:31:20.563340] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml [Training] [2023-03-11T11:31:20.567590] ] [Training] [2023-03-11T11:31:20.575504] paired_fwd_text:[ [Training] [2023-03-11T11:31:20.581309] type: generator [Training] [2023-03-11T11:31:20.586618] generator: gpt [Training] [2023-03-11T11:31:20.595056] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths'] [Training] [2023-03-11T11:31:20.600899] out: ['loss_text_ce', 'loss_mel_ce', 'logits'] [Training] [2023-03-11T11:31:20.606203] ] [Training] [2023-03-11T11:31:20.610427] ] [Training] [2023-03-11T11:31:20.614660] losses:[ [Training] [2023-03-11T11:31:20.618894] text_ce:[ [Training] [2023-03-11T11:31:20.624193] type: direct [Training] [2023-03-11T11:31:20.630016] weight: 0.01 [Training] [2023-03-11T11:31:20.635323] key: loss_text_ce [Training] [2023-03-11T11:31:20.640593] ] [Training] [2023-03-11T11:31:20.645381] mel_ce:[ [Training] [2023-03-11T11:31:20.651764] type: direct [Training] [2023-03-11T11:31:20.655998] weight: 1 [Training] [2023-03-11T11:31:20.660230] key: loss_mel_ce [Training] [2023-03-11T11:31:20.663913] ] [Training] [2023-03-11T11:31:20.668176] ] [Training] [2023-03-11T11:31:20.672958] ] [Training] [2023-03-11T11:31:20.677175] ] [Training] [2023-03-11T11:31:20.681948] networks:[ [Training] [2023-03-11T11:31:20.687256] gpt:[ [Training] [2023-03-11T11:31:20.693065] type: generator [Training] [2023-03-11T11:31:20.698362] which_model_G: unified_voice2 [Training] [2023-03-11T11:31:20.703169] kwargs:[ [Training] [2023-03-11T11:31:20.707949] layers: 30 [Training] [2023-03-11T11:31:20.713261] model_dim: 1024 [Training] [2023-03-11T11:31:20.718548] heads: 16 [Training] [2023-03-11T11:31:20.723301] max_text_tokens: 402 [Training] [2023-03-11T11:31:20.728057] max_mel_tokens: 604 [Training] [2023-03-11T11:31:20.733330] max_conditioning_inputs: 2 [Training] [2023-03-11T11:31:20.738079] mel_length_compression: 1024 [Training] [2023-03-11T11:31:20.743896] number_text_tokens: 256 [Training] [2023-03-11T11:31:20.749199] number_mel_codes: 8194 [Training] [2023-03-11T11:31:20.753941] start_mel_token: 8192 [Training] [2023-03-11T11:31:20.759242] stop_mel_token: 8193 [Training] [2023-03-11T11:31:20.764552] start_text_token: 255 [Training] [2023-03-11T11:31:20.770378] train_solo_embeddings: False [Training] [2023-03-11T11:31:20.774597] use_mel_codes_as_input: True [Training] [2023-03-11T11:31:20.778811] checkpointing: True [Training] [2023-03-11T11:31:20.783576] tortoise_compat: True [Training] [2023-03-11T11:31:20.788837] ] [Training] [2023-03-11T11:31:20.794120] ] [Training] [2023-03-11T11:31:20.798342] ] [Training] [2023-03-11T11:31:20.803626] path:[ [Training] [2023-03-11T11:31:20.808371] strict_load: True [Training] [2023-03-11T11:31:20.813136] pretrain_model_gpt: ./models/tortoise/autoregressive.pth [Training] [2023-03-11T11:31:20.817909] root: ./ [Training] [2023-03-11T11:31:20.822663] experiments_root: ./training\Kiwi\finetune [Training] [2023-03-11T11:31:20.827391] models: ./training\Kiwi\finetune\models [Training] [2023-03-11T11:31:20.832656] training_state: ./training\Kiwi\finetune\training_state [Training] [2023-03-11T11:31:20.837424] log: ./training\Kiwi\finetune [Training] [2023-03-11T11:31:20.841667] val_images: ./training\Kiwi\finetune\val_images [Training] [2023-03-11T11:31:20.846937] ] [Training] [2023-03-11T11:31:20.852774] train:[ [Training] [2023-03-11T11:31:20.857533] niter: 500 [Training] [2023-03-11T11:31:20.861757] warmup_iter: -1 [Training] [2023-03-11T11:31:20.865472] mega_batch_factor: 5 [Training] [2023-03-11T11:31:20.870232] val_freq: 5 [Training] [2023-03-11T11:31:20.875012] ema_enabled: False [Training] [2023-03-11T11:31:20.880293] default_lr_scheme: MultiStepLR [Training] [2023-03-11T11:31:20.885099] gen_lr_steps: [9, 18, 25, 33] [Training] [2023-03-11T11:31:20.889360] lr_gamma: 0.5 [Training] [2023-03-11T11:31:20.894124] ] [Training] [2023-03-11T11:31:20.898878] eval:[ [Training] [2023-03-11T11:31:20.903638] pure: True [Training] [2023-03-11T11:31:20.908948] output_state: gen [Training] [2023-03-11T11:31:20.913713] ] [Training] [2023-03-11T11:31:20.918486] logger:[ [Training] [2023-03-11T11:31:20.922715] save_checkpoint_freq: 100 [Training] [2023-03-11T11:31:20.928026] visuals: ['gen', 'mel'] [Training] [2023-03-11T11:31:20.933375] visual_debug_rate: 100 [Training] [2023-03-11T11:31:20.938679] is_mel_spectrogram: True [Training] [2023-03-11T11:31:20.943976] ] [Training] [2023-03-11T11:31:20.948226] is_train: True [Training] [2023-03-11T11:31:20.953543] dist: False [Training] [2023-03-11T11:31:20.958833] [Training] [2023-03-11T11:31:20.963599] 23-03-11 11:31:20.181 - INFO: Random seed: 5371 [Training] [2023-03-11T11:31:21.327598] 23-03-11 11:31:21.327 - INFO: Number of training data elements: 32, iters: 1 [Training] [2023-03-11T11:31:21.332884] 23-03-11 11:31:21.327 - INFO: Total epochs needed: 500 for iters 500 [Training] [2023-03-11T11:31:21.338187] 23-03-11 11:31:21.328 - INFO: Number of val images in [validation]: 16 [Training] [2023-03-11T11:31:22.513448] D:\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`. [Training] [2023-03-11T11:31:22.519745] warnings.warn( [Training] [2023-03-11T11:31:34.062301] 23-03-11 11:31:34.061 - INFO: Loading model for [./models/tortoise/autoregressive.pth] [Training] [2023-03-11T11:31:35.259653] 23-03-11 11:31:35.259 - INFO: Start training from epoch: 0, iter: 0 [Training] [2023-03-11T11:31:39.787446] D:\ai-voice-cloning\./modules/dlas/codes\models\audio\tts\tacotron2\taco_utils.py:17: WavFileWarning: Chunk (non-data) not understood, skipping it. [Training] [2023-03-11T11:31:39.787971] sampling_rate, data = read(full_path) [Training] [2023-03-11T11:31:40.588502] D:\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:138: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate [Training] [2023-03-11T11:31:40.588517] warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " [Training] [2023-03-11T11:31:49.033178] 23-03-11 11:31:49.032 - INFO: Training Metrics: {"loss_text_ce": 6.235421657562256, "loss_mel_ce": 2.967024803161621, "loss_gpt_total": 3.029378890991211, "lr": 1e-05, "it": 1, "step": 1, "steps": 1, "epoch": 0, "iteration_rate": 0.26371678709983826} ```
Owner

It worked for me quite well previous to the change, using 2x RTX 3060's and Windows 10.

Not possible. The GPU count never got passed on Windows from the UI => train.bat => ./src/train.py. The launcher is default to none, so it won't even bother using a job launcher.

The training script will force one GPU, per this block:

if launcher == 'none':  # disabled distributed training
        opt['dist'] = False
        trainer.rank = -1
        if len(opt['gpu_ids']) == 1:
            torch.cuda.set_device(opt['gpu_ids'][0])
        print('Disabled distributed training.')
> It worked for me quite well previous to the change, using 2x RTX 3060's and Windows 10. Not possible. The GPU count never got passed on Windows from the UI => `train.bat` => `./src/train.py`. The launcher is default to `none`, so it won't even bother using a job launcher. The training script will force one GPU, per this block: ``` if launcher == 'none': # disabled distributed training opt['dist'] = False trainer.rank = -1 if len(opt['gpu_ids']) == 1: torch.cuda.set_device(opt['gpu_ids'][0]) print('Disabled distributed training.') ```
Author

Ahh, bugger. I could swear I saw a performance boost but it must have been from offloading everything else I was doing to the other GPU.

Will try in WSL, thanks!

Ahh, bugger. I could swear I saw a performance boost but it must have been from offloading everything else I was doing to the other GPU. Will try in WSL, thanks!
Owner

Allegedly WSL2 does support nccl, per NVIDIA's doc/blog/guide. I'm not too well-versed in how robust WSL2 is, but I imagine just using the Linux install scripts will work.

Allegedly WSL2 does support nccl, per [NVIDIA's doc/blog/guide](https://docs.nvidia.com/cuda/wsl-user-guide/index.html). I'm not too well-versed in how robust WSL2 is, but I imagine just using the Linux install scripts will work.
mrq closed this issue 2023-03-13 17:43:38 +00:00
Author

Allegedly WSL2 does support nccl, per NVIDIA's doc/blog/guide. I'm not too well-versed in how robust WSL2 is, but I imagine just using the Linux install scripts will work.

Can confirm working, but only with the Ubuntu distribution and not out of the box (needs some massaging of library paths to get bitsandbytes to find CUDA).

> Allegedly WSL2 does support nccl, per [NVIDIA's doc/blog/guide](https://docs.nvidia.com/cuda/wsl-user-guide/index.html). I'm not too well-versed in how robust WSL2 is, but I imagine just using the Linux install scripts will work. Can confirm working, but only with the Ubuntu distribution and not out of the box (needs some massaging of library paths to get bitsandbytes to find CUDA).
Owner

Noted, I'll make sure to add that as a note in the wiki.

Noted, I'll make sure to add that as a note in the wiki.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#115
No description provided.