Commit 008a1f5f8f
Seems to have broken multi-GPU training on Windows due to lack of nccl support
#115
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#115
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Per PyTorch documentation on Torch.Distributed, nccl is not supported on Windows and consequently the training process fails to initialize when run with multiple GPU's.
Error produced is "The client socket has failed to connect to [localhost]:1234"
To be technical, there never was. I'll never be able to validate it myself for Windows, as my GPUs are two 6800XTs and a 2060.
However, I imagine you can edit ./src/train.py:74 to change nccl to whatever other backend. It's only that because that's what base DLAS used.
...which seems like only MPI, if you somehow compile PyTorch yourself.
It worked for me quite well previous to the change, using 2x RTX 3060's and Windows 10.
Just to make sure I wasn't mis-remembering, I reverted to the previous commit (
2feb6da0c0
):Not possible. The GPU count never got passed on Windows from the UI =>
train.bat
=>./src/train.py
. The launcher is default tonone
, so it won't even bother using a job launcher.The training script will force one GPU, per this block:
Ahh, bugger. I could swear I saw a performance boost but it must have been from offloading everything else I was doing to the other GPU.
Will try in WSL, thanks!
Allegedly WSL2 does support nccl, per NVIDIA's doc/blog/guide. I'm not too well-versed in how robust WSL2 is, but I imagine just using the Linux install scripts will work.
Can confirm working, but only with the Ubuntu distribution and not out of the box (needs some massaging of library paths to get bitsandbytes to find CUDA).
Noted, I'll make sure to add that as a note in the wiki.