CUDA out of memory error after installing and toggling on deepspeed #425

Open
opened 2023-10-23 18:30:47 +00:00 by Bluebomber182 · 0 comments

I installed deepspeed by following the instructions from this link
https://github.com/microsoft/DeepSpeed/issues/2902#issuecomment-1530051657
[2023-10-23 11:49:34,127] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+e2383511, git-hash=e2383511, git-branch=master
[2023-10-23 11:49:34,128] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-10-23 11:49:34,128] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
WARNING! Setting BLOOMLayerPolicy._orig_layer_class to None due to Exception: module 'transformers.models' has no attribute 'bloom'
No ROCm runtime is found, using ROCM_HOME='/opt/rocm'
Using /run/media/user/ehdd/ai-voice-cloning/models/torch_extensions/py310_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /run/media/user/ehdd/ai-voice-cloning/models/torch_extensions/py310_cu121/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.0514678955078125 seconds
[2023-10-23 11:49:34,470] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False, 'num_kv': -1, 'rope_theta': 10000}
Using /run/media/user/ehdd/ai-voice-cloning/models/torch_extensions/py310_cu121 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.001470804214477539 seconds

Free memory : 11.151306 (GigaBytes)
Total memory: 15.697632 (GigaBytes)
Requested memory: 5.375000 (GigaBytes)
Setting maximum total tokens (input + output) to 1024
WorkSpace: 0x7fcb00000000

Traceback (most recent call last):
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/routes.py", line 394, in run_predict
output = await app.get_blocks().process_api(
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1075, in process_api
result = await self.call_function(
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/blocks.py", line 884, in call_function
prediction = await anyio.to_thread.run_sync(
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/helpers.py", line 587, in tracked_fn
response = fn(*args)
File "/run/media/user/hdd/ai-voice-cloning/src/webui.py", line 94, in generate_proxy
raise e
File "/run/media/user/hdd/ai-voice-cloning/src/webui.py", line 88, in generate_proxy
sample, outputs, stats = generate(**kwargs)
File "/run/media/user/hdd/ai-voice-cloning/src/utils.py", line 363, in generate
return generate_tortoise(**kwargs)
File "/run/media/user/hdd/ai-voice-cloning/src/utils.py", line 1223, in generate_tortoise
gen, additionals = tts.tts(cut_text, **settings )
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/api.py", line 799, in tts
clvp = self.clvp(text_tokens.repeat(batch.shape[0], 1), batch, return_loss=False)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/clvp.py", line 134, in forward
speech_latents = self.to_speech_latent(masked_mean(self.speech_transformer(speech_emb, mask=voice_mask), voice_mask, dim=1))
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/arch_util.py", line 368, in forward
h = self.transformer(x, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/xtransformers.py", line 1252, in forward
x, intermediates = self.attn_layers(x, mask=mask, mems=mems, return_hiddens=True, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/xtransformers.py", line 981, in forward
out, inter, k, v = block(x, None, mask, None, attn_mask, self.pia_pos_emb, rotary_pos_emb,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/arch_util.py", line 345, in forward
return partial(x, *args)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/xtransformers.py", line 718, in forward
post_softmax_attn = attn.clone()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 184.00 MiB. GPU 0 has a total capacty of 15.70 GiB of which 182.88 MiB is free. Including non-PyTorch memory, this process has 15.28 GiB memory in use. Of the allocated memory 9.55 GiB is allocated by PyTorch, and 211.48 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[1/1] Generating line: Your prompt here.

I installed deepspeed by following the instructions from this link https://github.com/microsoft/DeepSpeed/issues/2902#issuecomment-1530051657 [2023-10-23 11:49:34,127] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+e2383511, git-hash=e2383511, git-branch=master [2023-10-23 11:49:34,128] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2023-10-23 11:49:34,128] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 WARNING! Setting BLOOMLayerPolicy._orig_layer_class to None due to Exception: module 'transformers.models' has no attribute 'bloom' No ROCm runtime is found, using ROCM_HOME='/opt/rocm' Using /run/media/user/ehdd/ai-voice-cloning/models/torch_extensions/py310_cu121 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /run/media/user/ehdd/ai-voice-cloning/models/torch_extensions/py310_cu121/transformer_inference/build.ninja... Building extension module transformer_inference... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module transformer_inference... Time to load transformer_inference op: 0.0514678955078125 seconds [2023-10-23 11:49:34,470] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False, 'num_kv': -1, 'rope_theta': 10000} Using /run/media/user/ehdd/ai-voice-cloning/models/torch_extensions/py310_cu121 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.001470804214477539 seconds Free memory : 11.151306 (GigaBytes) Total memory: 15.697632 (GigaBytes) Requested memory: 5.375000 (GigaBytes) Setting maximum total tokens (input + output) to 1024 WorkSpace: 0x7fcb00000000 ------------------------------------------------------ Traceback (most recent call last): File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/routes.py", line 394, in run_predict output = await app.get_blocks().process_api( File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1075, in process_api result = await self.call_function( File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/blocks.py", line 884, in call_function prediction = await anyio.to_thread.run_sync( File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, *args) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/gradio/helpers.py", line 587, in tracked_fn response = fn(*args) File "/run/media/user/hdd/ai-voice-cloning/src/webui.py", line 94, in generate_proxy raise e File "/run/media/user/hdd/ai-voice-cloning/src/webui.py", line 88, in generate_proxy sample, outputs, stats = generate(**kwargs) File "/run/media/user/hdd/ai-voice-cloning/src/utils.py", line 363, in generate return generate_tortoise(**kwargs) File "/run/media/user/hdd/ai-voice-cloning/src/utils.py", line 1223, in generate_tortoise gen, additionals = tts.tts(cut_text, **settings ) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/api.py", line 799, in tts clvp = self.clvp(text_tokens.repeat(batch.shape[0], 1), batch, return_loss=False) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/clvp.py", line 134, in forward speech_latents = self.to_speech_latent(masked_mean(self.speech_transformer(speech_emb, mask=voice_mask), voice_mask, dim=1)) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/arch_util.py", line 368, in forward h = self.transformer(x, **kwargs) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/xtransformers.py", line 1252, in forward x, intermediates = self.attn_layers(x, mask=mask, mems=mems, return_hiddens=True, **kwargs) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/xtransformers.py", line 981, in forward out, inter, k, v = block(x, None, mask, None, attn_mask, self.pia_pos_emb, rotary_pos_emb, File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/arch_util.py", line 345, in forward return partial(x, *args) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/xtransformers.py", line 718, in forward post_softmax_attn = attn.clone() torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 184.00 MiB. GPU 0 has a total capacty of 15.70 GiB of which 182.88 MiB is free. Including non-PyTorch memory, this process has 15.28 GiB memory in use. Of the allocated memory 9.55 GiB is allocated by PyTorch, and 211.48 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF [1/1] Generating line: Your prompt here.
Bluebomber182 changed title from CUDA out of memory error after installing deepspeed to CUDA out of memory error after installing and toggling on deepspeed 2023-10-23 18:34:49 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#425
No description provided.