RuntimeError: Error building extension 'transformer_inference' #424
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#424
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I got this error message after toggling on deepspeed in the settings.
Using /run/media/user/hdd/ai-voice-cloning/models/torch_extensions/py310_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /run/media/user/hdd/ai-voice-cloning/models/torch_extensions/py310_cu118/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu -o rms_norm.cuda.o
FAILED: rms_norm.cuda.o
/opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu -o rms_norm.cuda.o
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(178): error: no operator "+" matches these operands
operand types are: const __half + const __half
return lhs + rhs;
^
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(188): error: no operator ">" matches these operands
operand types are: const __half > const __half
return (lhs > rhs) ? lhs : rhs;
^
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(199): error: no operator "<" matches these operands
operand types are: const __half < const __half
return (lhs < rhs) ? lhs : rhs;
^
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(207): error: no operator "+" matches these operands
operand types are: const __half2 + const __half2
return lhs + rhs;
^
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(217): error: no operator ">" matches these operands
operand types are: const __half > const __half
ret_val.x = (lhs.x > rhs.x) ? lhs.x : rhs.x;
^
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(218): error: no operator ">" matches these operands
operand types are: const __half > const __half
ret_val.y = (lhs.y > rhs.y) ? lhs.y : rhs.y;
^
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(230): error: no operator "<" matches these operands
operand types are: const __half < const __half
ret_val.x = (lhs.x < rhs.x) ? lhs.x : rhs.x;
^
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(231): error: no operator "<" matches these operands
operand types are: const __half < const __half
ret_val.y = (lhs.y < rhs.y) ? lhs.y : rhs.y;
^
8 errors detected in the compilation of "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu".
[2/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu -o layer_norm.cuda.o
FAILED: layer_norm.cuda.o
/opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu -o layer_norm.cuda.o
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(178): error: no operator "+" matches these operands
operand types are: const __half + const __half
return lhs + rhs;
^
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(188): error: no operator ">" matches these operands
operand types are: const __half > const __half
return (lhs > rhs) ? lhs : rhs;
^
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(199): error: no operator "<" matches these operands
operand types are: const __half < const __half
return (lhs < rhs) ? lhs : rhs;
^
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(207): error: no operator "+" matches these operands
operand types are: const __half2 + const __half2
return lhs + rhs;
^
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(217): error: no operator ">" matches these operands
operand types are: const __half > const __half
ret_val.x = (lhs.x > rhs.x) ? lhs.x : rhs.x;
^
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(218): error: no operator ">" matches these operands
operand types are: const __half > const __half
ret_val.y = (lhs.y > rhs.y) ? lhs.y : rhs.y;
^
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(230): error: no operator "<" matches these operands
operand types are: const __half < const __half
ret_val.x = (lhs.x < rhs.x) ? lhs.x : rhs.x;
^
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(231): error: no operator "<" matches these operands
operand types are: const __half < const __half
ret_val.y = (lhs.y < rhs.y) ? lhs.y : rhs.y;
^
8 errors detected in the compilation of "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu".
[3/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -o gelu.cuda.o
FAILED: gelu.cuda.o
/opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -o gelu.cuda.o
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(451): error: no operator "*" matches these operands
operand types are: __half * __half
mlp[idx] = mlp[idx] * coef2[idx] + res[idx] * coef1[idx];
^
detected during:
instantiation of "void moe_res_matmul(T *, T *, T *, int, int) [with T=__half]" at line 469
instantiation of "void launch_moe_res_matmul(T *, T *, T *, int, int, cudaStream_t) [with T=__half]" at line 479
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(451): error: no operator "*" matches these operands
operand types are: __half * __half
mlp[idx] = mlp[idx] * coef2[idx] + res[idx] * coef1[idx];
^
detected during:
instantiation of "void moe_res_matmul(T *, T *, T *, int, int) [with T=__half]" at line 469
instantiation of "void launch_moe_res_matmul(T *, T *, T *, int, int, cudaStream_t) [with T=__half]" at line 479
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(651): error: no operator "+" matches these operands
operand types are: __half + __half
T hidden_state = activation_buffer_1[v] + bias_buffer_1[v];
^
detected during:
instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=true]" at line 695
instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(652): error: no operator "+" matches these operands
operand types are: __half + __half
T pre_gate = activation_buffer_2[v] + bias_buffer_2[v];
^
detected during:
instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=true]" at line 695
instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(656): error: no operator "*" matches these operands
operand types are: __half * __half
activation_buffer_1[v] = hidden_state * gate;
^
detected during:
instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=true]" at line 695
instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(651): error: no operator "+" matches these operands
operand types are: __half + __half
T hidden_state = activation_buffer_1[v] + bias_buffer_1[v];
^
detected during:
instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=false]" at line 698
instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(652): error: no operator "+" matches these operands
operand types are: __half + __half
T pre_gate = activation_buffer_2[v] + bias_buffer_2[v];
^
detected during:
instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=false]" at line 698
instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(656): error: no operator "*" matches these operands
operand types are: __half * __half
activation_buffer_1[v] = hidden_state * gate;
^
detected during:
instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=false]" at line 698
instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706
8 errors detected in the compilation of "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu".
[4/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.cu -o pointwise_ops.cuda.o
[5/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.cu -o dequantize.cuda.o
[6/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.cu -o relu.cuda.o
[7/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu -o transform.cuda.o
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(37): warning #177-D: variable "d0_stride" was declared but never referenced
int d0_stride = hidden_dim * seq_length;
^
Remark: The warnings can be suppressed with "-diag-suppress "
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(65): warning #177-D: variable "lane" was declared but never referenced
int lane = d3 & 0x1f;
^
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(107): warning #177-D: variable "half_dim" was declared but never referenced
unsigned half_dim = (rotary_dim << 3) >> 1;
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" at line 276
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(108): warning #177-D: variable "d0_stride" was declared but never referenced
int d0_stride = hidden_dim * seq_length;
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" at line 276
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(124): warning #177-D: variable "vals_half" was declared but never referenced
T2* vals_half = reinterpret_cast<T2*>(&vals_arr);
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" at line 276
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(125): warning #177-D: variable "output_half" was declared but never referenced
T2* output_half = reinterpret_cast<T2*>(&output_arr);
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" at line 276
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(142): warning #177-D: variable "lane" was declared but never referenced
int lane = d3 & 0x1f;
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" at line 276
[8/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.cu -o softmax.cuda.o
[9/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu -o apply_rotary_pos_emb.cuda.o
[10/11] c++ -MMD -MF pt_binding.o.d -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp -o pt_binding.o
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vectorat::Tensor ds_softmax_context(at::Tensor&, at::Tensor&, int, bool, bool, int, int, float, bool, bool, int, bool, unsigned int, unsigned int, at::Tensor&) [with T = float]’:
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:2006:5: required from here
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:538:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
538 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:538:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:539:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
539 | k * InferenceContext::Instance().GetMaxTokenLength(),
|
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:539:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:547:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
547 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:547:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:548:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
548 | k * InferenceContext::Instance().GetMaxTokenLength(),
|
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:548:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vectorat::Tensor ds_rms_mlp_gemm(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, float, at::Tensor&, at::Tensor&, bool, int, bool) [with T = float]’:
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:2006:5: required from here
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1575:72: warning: narrowing conversion of ‘(size_t)mlp_1_out_neurons’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
1575 | at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options);
| ^~~~~~~~~~~~~~~~~
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1575:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vectorat::Tensor ds_softmax_context(at::Tensor&, at::Tensor&, int, bool, bool, int, int, float, bool, bool, int, bool, unsigned int, unsigned int, at::Tensor&) [with T = __half]’:
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:2007:5: required from here
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:538:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
538 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:538:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:539:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
539 | k * InferenceContext::Instance().GetMaxTokenLength(),
|
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:539:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:547:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
547 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:547:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:548:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
548 | k * InferenceContext::Instance().GetMaxTokenLength(),
|
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:548:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vectorat::Tensor ds_rms_mlp_gemm(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, float, at::Tensor&, at::Tensor&, bool, int, bool) [with T = __half]’:
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:2007:5: required from here
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1575:72: warning: narrowing conversion of ‘(size_t)mlp_1_out_neurons’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
1575 | at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options);
| ^~~~~~~~~~~~~~~~~
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1575:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing]
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/run/media/user/hdd/ai-voice-cloning/./src/main.py", line 27, in
tts = load_tts()
File "/run/media/user/hdd/ai-voice-cloning/src/utils.py", line 3666, in load_tts
tts = TorToise_TTS(minor_optimizations=not args.low_vram, autoregressive_model_path=autoregressive_model, diffusion_model_path=diffusion_model, vocoder_model=vocoder_model, tokenizer_json=tokenizer_json, unsqueeze_sample_batches=args.unsqueeze_sample_batches, use_deepspeed=args.use_deepspeed)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/api.py", line 308, in init
self.load_autoregressive_model(autoregressive_model_path)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/api.py", line 392, in load_autoregressive_model
self.autoregressive.post_init_gpt2_config(use_deepspeed=self.use_deepspeed, kv_cache=self.use_kv_cache)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/autoregressive.py", line 371, in post_init_gpt2_config
self.ds_engine = deepspeed.init_inference(model=self.inference_model,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/init.py", line 342, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 160, in init
self._apply_injection_policy(config)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 411, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 332, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 576, in replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 636, in _replace_module
_, layer_id = _replace_module(child,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 636, in _replace_module
_, layer_id = _replace_module(child,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 612, in _replace_module
replaced_module = policies[child.class][0](child,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 291, in replace_fn
new_module = replace_with_policy(child,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 246, in replace_with_policy
_container.create_module()
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/containers/gpt2.py", line 20, in create_module
self.module = DeepSpeedGPTInference(_config, mp_group=self.mp_group)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/model_implementations/transformers/ds_gpt.py", line 20, in init
super().init(config, mp_group, quantize_scales, quantize_groups, merge_count, mlp_extra_grouping)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 58, in init
inference_module = builder.load()
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 446, in load
return self.jit_load(verbose)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 489, in jit_load
op_module = load(name=self.name,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1308, in load
return _jit_compile(
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1710, in _jit_compile
_write_ninja_file_and_build_library(
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1823, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'transformer_inference'
I figured out the problem. First I activate the python enivroment.
source ./venv/bin/activate
Then I install torch torchvision torchaudio with command.
pip3 install torch torchvision torchaudio
Then I follow the instruction from this link. This also works with CUDA 12.2
if you must use CUDA 12.1 on your system you have two options: