CUDA out of memory error after installing and toggling on deepspeed #425
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#425
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I installed deepspeed by following the instructions from this link
https://github.com/microsoft/DeepSpeed/issues/2902#issuecomment-1530051657
[2023-10-23 11:49:34,127] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+e2383511, git-hash=e2383511, git-branch=master
[2023-10-23 11:49:34,128] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-10-23 11:49:34,128] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
WARNING! Setting BLOOMLayerPolicy._orig_layer_class to None due to Exception: module 'transformers.models' has no attribute 'bloom'
No ROCm runtime is found, using ROCM_HOME='/opt/rocm'
Using /run/media/user/ehdd/ai-voice-cloning/models/torch_extensions/py310_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /run/media/user/ehdd/ai-voice-cloning/models/torch_extensions/py310_cu121/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.0514678955078125 seconds
[2023-10-23 11:49:34,470] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False, 'num_kv': -1, 'rope_theta': 10000}
Using /run/media/user/ehdd/ai-voice-cloning/models/torch_extensions/py310_cu121 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.001470804214477539 seconds
Free memory : 11.151306 (GigaBytes)
Total memory: 15.697632 (GigaBytes)
Requested memory: 5.375000 (GigaBytes)
Setting maximum total tokens (input + output) to 1024
WorkSpace: 0x7fcb00000000
Traceback (most recent call last):
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/routes.py", line 394, in run_predict
output = await app.get_blocks().process_api(
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1075, in process_api
result = await self.call_function(
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/blocks.py", line 884, in call_function
prediction = await anyio.to_thread.run_sync(
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/helpers.py", line 587, in tracked_fn
response = fn(*args)
File "/run/media/user/hdd/ai-voice-cloning/src/webui.py", line 94, in generate_proxy
raise e
File "/run/media/user/hdd/ai-voice-cloning/src/webui.py", line 88, in generate_proxy
sample, outputs, stats = generate(**kwargs)
File "/run/media/user/hdd/ai-voice-cloning/src/utils.py", line 363, in generate
return generate_tortoise(**kwargs)
File "/run/media/user/hdd/ai-voice-cloning/src/utils.py", line 1223, in generate_tortoise
gen, additionals = tts.tts(cut_text, **settings )
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/api.py", line 799, in tts
clvp = self.clvp(text_tokens.repeat(batch.shape[0], 1), batch, return_loss=False)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/clvp.py", line 134, in forward
speech_latents = self.to_speech_latent(masked_mean(self.speech_transformer(speech_emb, mask=voice_mask), voice_mask, dim=1))
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/arch_util.py", line 368, in forward
h = self.transformer(x, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/xtransformers.py", line 1252, in forward
x, intermediates = self.attn_layers(x, mask=mask, mems=mems, return_hiddens=True, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/xtransformers.py", line 981, in forward
out, inter, k, v = block(x, None, mask, None, attn_mask, self.pia_pos_emb, rotary_pos_emb,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/arch_util.py", line 345, in forward
return partial(x, *args)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/xtransformers.py", line 718, in forward
post_softmax_attn = attn.clone()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 184.00 MiB. GPU 0 has a total capacty of 15.70 GiB of which 182.88 MiB is free. Including non-PyTorch memory, this process has 15.28 GiB memory in use. Of the allocated memory 9.55 GiB is allocated by PyTorch, and 211.48 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[1/1] Generating line: Your prompt here.
CUDA out of memory error after installing deepspeedto CUDA out of memory error after installing and toggling on deepspeed