Deepspeed - Windows (Yes I know) #404
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#404
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
After playing about after seeing 370 I actually managed to get Deepspeed on my Windows install working after installing deepspeed-0.8.3+6eca037c-cp310-cp310-win_amd64.whl
pip install torch==2.0.1+cu117 torchvision==0.15.2+cu117 torchaudio==2.0.2+cu117 --index-url https://download.pytorch.org/whl/cu117
and changing
python -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118
to
python -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117
in setup-cuda.bat, running that and then turning on LowVRAM and restarting.
I was really surprised, generation took half the time when using Deepspeed. Something that previously took 30 seconds now only takes 15 seconds. That is pretty awesome.
However whenever I change models and attempt to generate I get this message:
Stored autoregressive model to settings: ./training/queen2/finetune/models/111_gpt.pth
Loading autoregressive model: ./training/queen2/finetune/models/111_gpt.pth
[2023-10-07 06:19:49,779] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.8.3+6eca037c, git-hash=6eca037c, git-branch=master
[2023-10-07 06:19:49,780] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-10-07 06:19:49,780] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
WARNING! Setting BLOOMLayerPolicy._orig_layer_class to None due to Exception: module 'transformers.models' has no attribute 'bloom'
Loaded autoregressive model
Unloaded Voicefixer
[1/1] Generating line: Wow this is quick
Loading voice: q2 with model 1dd22c6a
Loading voice: q2
Reading from latent: ./voices/q2//cond_latents_1dd22c6a.pth
Traceback (most recent call last):
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 394, in run_predict
output = await app.get_blocks().process_api(
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1075, in process_api
result = await self.call_function(
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 884, in call_function
prediction = await anyio.to_thread.run_sync(
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\gradio\helpers.py", line 587, in tracked_fn
response = fn(*args)
File "F:\SD\ai-voice-cloning\src\webui.py", line 94, in generate_proxy
raise e
File "F:\SD\ai-voice-cloning\src\webui.py", line 88, in generate_proxy
sample, outputs, stats = generate(**kwargs)
File "F:\SD\ai-voice-cloning\src\utils.py", line 351, in generate
return generate_tortoise(**kwargs)
File "F:\SD\ai-voice-cloning\src\utils.py", line 1211, in generate_tortoise
gen, additionals = tts.tts(cut_text, **settings )
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "f:\sd\ai-voice-cloning\modules\tortoise-tts\tortoise\api.py", line 746, in tts
codes = self.autoregressive.inference_speech(auto_conditioning, text_tokens,
File "f:\sd\ai-voice-cloning\modules\tortoise-tts\tortoise\models\autoregressive.py", line 526, in inference_speech
gen = self.inference_model.generate(inputs, bos_token_id=self.start_mel_token, pad_token_id=self.stop_mel_token, eos_token_id=self.stop_mel_token,
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\transformers\generation_utils.py", line 1310, in generate
return self.sample(
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\transformers\generation_utils.py", line 1926, in sample
outputs = self(
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "f:\sd\ai-voice-cloning\modules\tortoise-tts\tortoise\models\autoregressive.py", line 147, in forward
transformer_outputs = self.transformer(
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 889, in forward
outputs = block(
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\deepspeed\model_implementations\transformers\ds_transformer.py", line 140, in forward
self.attention(input,
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\deepspeed\ops\transformer\inference\ds_attention.py", line 123, in forward
context_layer, key_layer, value_layer = self.compute_attention(qkv_out=qkv_out,
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\deepspeed\ops\transformer\inference\ds_attention.py", line 78, in compute_attention
attn_key_value = self.score_context_func(
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "F:\SD\ai-voice-cloning\venv\lib\site-packages\deepspeed\ops\transformer\inference\op_binding\softmax_context.py", line 31, in forward
output = self.softmax_context_func(query_key_value, attn_mask, self.config.rotary_dim, self.config.rotate_half,
RuntimeError: The specified pointer resides on host memory and is not registered with any CUDA device.
I tend to change models a lot is there a way around this or is this probably an issue with version of Deepspeed I'm using?
After more testing some I seem to hit OOM quite often. Think I might try WSL
It looks like the Win10 wheels for DeepSpeed were released. I tried following the tutorial by Jarod's Journey (https://www.youtube.com/watch?v=RVzpjYOV-Tk), but it still will not work. Is this something you could look into MRQ?
WSL will be the easiest way if it is an option. You will need to restart the tool and WSL every hour or two, but aside from that, it will work and a bit faster.
I managed to build a wheel for windows using deepspeed-0.11.1 for Python 3.10 and Cuda 11.8.
DeepSpeed 0.11.1 allows me to change models without having to restart. Performance is pretty decent but for my RTX 2080 Ti I need to switch ON CUDA - Sysmem Fallback Policy in NVIDIA Control Panel to prevent OOM. However must remember to switch this off for training, otherwise training is epically slow.
Not sure where I stick this wheel for other people to download. Max file to attach here is 4MB and the file is 26.6MB