RuntimeError: Error building extension 'transformer_inference' #424

Closed
opened 2023-10-22 01:57:32 +00:00 by Bluebomber182 · 1 comment

I got this error message after toggling on deepspeed in the settings.

Using /run/media/user/hdd/ai-voice-cloning/models/torch_extensions/py310_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /run/media/user/hdd/ai-voice-cloning/models/torch_extensions/py310_cu118/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu -o rms_norm.cuda.o
FAILED: rms_norm.cuda.o
/opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu -o rms_norm.cuda.o
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(178): error: no operator "+" matches these operands
operand types are: const __half + const __half
return lhs + rhs;
^

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(188): error: no operator ">" matches these operands
operand types are: const __half > const __half
return (lhs > rhs) ? lhs : rhs;
^

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(199): error: no operator "<" matches these operands
operand types are: const __half < const __half
return (lhs < rhs) ? lhs : rhs;
^

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(207): error: no operator "+" matches these operands
operand types are: const __half2 + const __half2
return lhs + rhs;
^

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(217): error: no operator ">" matches these operands
operand types are: const __half > const __half
ret_val.x = (lhs.x > rhs.x) ? lhs.x : rhs.x;
^

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(218): error: no operator ">" matches these operands
operand types are: const __half > const __half
ret_val.y = (lhs.y > rhs.y) ? lhs.y : rhs.y;
^

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(230): error: no operator "<" matches these operands
operand types are: const __half < const __half
ret_val.x = (lhs.x < rhs.x) ? lhs.x : rhs.x;
^

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(231): error: no operator "<" matches these operands
operand types are: const __half < const __half
ret_val.y = (lhs.y < rhs.y) ? lhs.y : rhs.y;
^

8 errors detected in the compilation of "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu".
[2/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu -o layer_norm.cuda.o
FAILED: layer_norm.cuda.o
/opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu -o layer_norm.cuda.o
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(178): error: no operator "+" matches these operands
operand types are: const __half + const __half
return lhs + rhs;
^

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(188): error: no operator ">" matches these operands
operand types are: const __half > const __half
return (lhs > rhs) ? lhs : rhs;
^

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(199): error: no operator "<" matches these operands
operand types are: const __half < const __half
return (lhs < rhs) ? lhs : rhs;
^

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(207): error: no operator "+" matches these operands
operand types are: const __half2 + const __half2
return lhs + rhs;
^

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(217): error: no operator ">" matches these operands
operand types are: const __half > const __half
ret_val.x = (lhs.x > rhs.x) ? lhs.x : rhs.x;
^

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(218): error: no operator ">" matches these operands
operand types are: const __half > const __half
ret_val.y = (lhs.y > rhs.y) ? lhs.y : rhs.y;
^

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(230): error: no operator "<" matches these operands
operand types are: const __half < const __half
ret_val.x = (lhs.x < rhs.x) ? lhs.x : rhs.x;
^

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(231): error: no operator "<" matches these operands
operand types are: const __half < const __half
ret_val.y = (lhs.y < rhs.y) ? lhs.y : rhs.y;
^

8 errors detected in the compilation of "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu".
[3/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -o gelu.cuda.o
FAILED: gelu.cuda.o
/opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -o gelu.cuda.o
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(451): error: no operator "*" matches these operands
operand types are: __half * __half
mlp[idx] = mlp[idx] * coef2[idx] + res[idx] * coef1[idx];
^
detected during:
instantiation of "void moe_res_matmul(T *, T *, T *, int, int) [with T=__half]" at line 469
instantiation of "void launch_moe_res_matmul(T *, T *, T *, int, int, cudaStream_t) [with T=__half]" at line 479

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(451): error: no operator "*" matches these operands
operand types are: __half * __half
mlp[idx] = mlp[idx] * coef2[idx] + res[idx] * coef1[idx];
^
detected during:
instantiation of "void moe_res_matmul(T *, T *, T *, int, int) [with T=__half]" at line 469
instantiation of "void launch_moe_res_matmul(T *, T *, T *, int, int, cudaStream_t) [with T=__half]" at line 479

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(651): error: no operator "+" matches these operands
operand types are: __half + __half
T hidden_state = activation_buffer_1[v] + bias_buffer_1[v];
^
detected during:
instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=true]" at line 695
instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(652): error: no operator "+" matches these operands
operand types are: __half + __half
T pre_gate = activation_buffer_2[v] + bias_buffer_2[v];
^
detected during:
instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=true]" at line 695
instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(656): error: no operator "*" matches these operands
operand types are: __half * __half
activation_buffer_1[v] = hidden_state * gate;
^
detected during:
instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=true]" at line 695
instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(651): error: no operator "+" matches these operands
operand types are: __half + __half
T hidden_state = activation_buffer_1[v] + bias_buffer_1[v];
^
detected during:
instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=false]" at line 698
instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(652): error: no operator "+" matches these operands
operand types are: __half + __half
T pre_gate = activation_buffer_2[v] + bias_buffer_2[v];
^
detected during:
instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=false]" at line 698
instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(656): error: no operator "*" matches these operands
operand types are: __half * __half
activation_buffer_1[v] = hidden_state * gate;
^
detected during:
instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=false]" at line 698
instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706

8 errors detected in the compilation of "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu".
[4/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.cu -o pointwise_ops.cuda.o
[5/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.cu -o dequantize.cuda.o
[6/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.cu -o relu.cuda.o
[7/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu -o transform.cuda.o
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(37): warning #177-D: variable "d0_stride" was declared but never referenced
int d0_stride = hidden_dim * seq_length;
^

Remark: The warnings can be suppressed with "-diag-suppress "

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(65): warning #177-D: variable "lane" was declared but never referenced
int lane = d3 & 0x1f;
^

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(107): warning #177-D: variable "half_dim" was declared but never referenced
unsigned half_dim = (rotary_dim << 3) >> 1;
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" at line 276

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(108): warning #177-D: variable "d0_stride" was declared but never referenced
int d0_stride = hidden_dim * seq_length;
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" at line 276

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(124): warning #177-D: variable "vals_half" was declared but never referenced
T2* vals_half = reinterpret_cast<T2*>(&vals_arr);
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" at line 276

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(125): warning #177-D: variable "output_half" was declared but never referenced
T2* output_half = reinterpret_cast<T2*>(&output_arr);
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" at line 276

/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(142): warning #177-D: variable "lane" was declared but never referenced
int lane = d3 & 0x1f;
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" at line 276

[8/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.cu -o softmax.cuda.o
[9/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu -o apply_rotary_pos_emb.cuda.o
[10/11] c++ -MMD -MF pt_binding.o.d -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp -o pt_binding.o
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vectorat::Tensor ds_softmax_context(at::Tensor&, at::Tensor&, int, bool, bool, int, int, float, bool, bool, int, bool, unsigned int, unsigned int, at::Tensor&) [with T = float]’:
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:2006:5: required from here
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:538:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
538 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:538:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:539:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
539 | k * InferenceContext::Instance().GetMaxTokenLength(),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:539:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:547:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
547 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:547:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:548:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
548 | k * InferenceContext::Instance().GetMaxTokenLength(),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:548:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vectorat::Tensor ds_rms_mlp_gemm(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, float, at::Tensor&, at::Tensor&, bool, int, bool) [with T = float]’:
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:2006:5: required from here
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1575:72: warning: narrowing conversion of ‘(size_t)mlp_1_out_neurons’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
1575 | at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options);
| ^~~~~~~~~~~~~~~~~
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1575:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vectorat::Tensor ds_softmax_context(at::Tensor&, at::Tensor&, int, bool, bool, int, int, float, bool, bool, int, bool, unsigned int, unsigned int, at::Tensor&) [with T = __half]’:
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:2007:5: required from here
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:538:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
538 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:538:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:539:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
539 | k * InferenceContext::Instance().GetMaxTokenLength(),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:539:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:547:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
547 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:547:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:548:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
548 | k * InferenceContext::Instance().GetMaxTokenLength(),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:548:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vectorat::Tensor ds_rms_mlp_gemm(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, float, at::Tensor&, at::Tensor&, bool, int, bool) [with T = __half]’:
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:2007:5: required from here
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1575:72: warning: narrowing conversion of ‘(size_t)mlp_1_out_neurons’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
1575 | at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options);
| ^~~~~~~~~~~~~~~~~
/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1575:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing]
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/run/media/user/hdd/ai-voice-cloning/./src/main.py", line 27, in
tts = load_tts()
File "/run/media/user/hdd/ai-voice-cloning/src/utils.py", line 3666, in load_tts
tts = TorToise_TTS(minor_optimizations=not args.low_vram, autoregressive_model_path=autoregressive_model, diffusion_model_path=diffusion_model, vocoder_model=vocoder_model, tokenizer_json=tokenizer_json, unsqueeze_sample_batches=args.unsqueeze_sample_batches, use_deepspeed=args.use_deepspeed)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/api.py", line 308, in init
self.load_autoregressive_model(autoregressive_model_path)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/api.py", line 392, in load_autoregressive_model
self.autoregressive.post_init_gpt2_config(use_deepspeed=self.use_deepspeed, kv_cache=self.use_kv_cache)
File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/autoregressive.py", line 371, in post_init_gpt2_config
self.ds_engine = deepspeed.init_inference(model=self.inference_model,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/init.py", line 342, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 160, in init
self._apply_injection_policy(config)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 411, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 332, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 576, in replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 636, in _replace_module
_, layer_id = _replace_module(child,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 636, in _replace_module
_, layer_id = _replace_module(child,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 612, in _replace_module
replaced_module = policies[child.class][0](child,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 291, in replace_fn
new_module = replace_with_policy(child,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 246, in replace_with_policy
_container.create_module()
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/containers/gpt2.py", line 20, in create_module
self.module = DeepSpeedGPTInference(_config, mp_group=self.mp_group)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/model_implementations/transformers/ds_gpt.py", line 20, in init
super().init(config, mp_group, quantize_scales, quantize_groups, merge_count, mlp_extra_grouping)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 58, in init
inference_module = builder.load()
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 446, in load
return self.jit_load(verbose)
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 489, in jit_load
op_module = load(name=self.name,
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1308, in load
return _jit_compile(
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1710, in _jit_compile
_write_ninja_file_and_build_library(
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1823, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'transformer_inference'

I got this error message after toggling on deepspeed in the settings. Using /run/media/user/hdd/ai-voice-cloning/models/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /run/media/user/hdd/ai-voice-cloning/models/torch_extensions/py310_cu118/transformer_inference/build.ninja... Building extension module transformer_inference... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu -o rms_norm.cuda.o FAILED: rms_norm.cuda.o /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu -o rms_norm.cuda.o /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(178): error: no operator "+" matches these operands operand types are: const __half + const __half return lhs + rhs; ^ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(188): error: no operator ">" matches these operands operand types are: const __half > const __half return (lhs > rhs) ? lhs : rhs; ^ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(199): error: no operator "<" matches these operands operand types are: const __half < const __half return (lhs < rhs) ? lhs : rhs; ^ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(207): error: no operator "+" matches these operands operand types are: const __half2 + const __half2 return lhs + rhs; ^ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(217): error: no operator ">" matches these operands operand types are: const __half > const __half ret_val.x = (lhs.x > rhs.x) ? lhs.x : rhs.x; ^ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(218): error: no operator ">" matches these operands operand types are: const __half > const __half ret_val.y = (lhs.y > rhs.y) ? lhs.y : rhs.y; ^ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(230): error: no operator "<" matches these operands operand types are: const __half < const __half ret_val.x = (lhs.x < rhs.x) ? lhs.x : rhs.x; ^ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(231): error: no operator "<" matches these operands operand types are: const __half < const __half ret_val.y = (lhs.y < rhs.y) ? lhs.y : rhs.y; ^ 8 errors detected in the compilation of "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu". [2/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu -o layer_norm.cuda.o FAILED: layer_norm.cuda.o /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu -o layer_norm.cuda.o /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(178): error: no operator "+" matches these operands operand types are: const __half + const __half return lhs + rhs; ^ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(188): error: no operator ">" matches these operands operand types are: const __half > const __half return (lhs > rhs) ? lhs : rhs; ^ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(199): error: no operator "<" matches these operands operand types are: const __half < const __half return (lhs < rhs) ? lhs : rhs; ^ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(207): error: no operator "+" matches these operands operand types are: const __half2 + const __half2 return lhs + rhs; ^ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(217): error: no operator ">" matches these operands operand types are: const __half > const __half ret_val.x = (lhs.x > rhs.x) ? lhs.x : rhs.x; ^ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(218): error: no operator ">" matches these operands operand types are: const __half > const __half ret_val.y = (lhs.y > rhs.y) ? lhs.y : rhs.y; ^ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(230): error: no operator "<" matches these operands operand types are: const __half < const __half ret_val.x = (lhs.x < rhs.x) ? lhs.x : rhs.x; ^ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes/reduction_utils.h(231): error: no operator "<" matches these operands operand types are: const __half < const __half ret_val.y = (lhs.y < rhs.y) ? lhs.y : rhs.y; ^ 8 errors detected in the compilation of "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu". [3/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -o gelu.cuda.o FAILED: gelu.cuda.o /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -o gelu.cuda.o /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(451): error: no operator "*" matches these operands operand types are: __half * __half mlp[idx] = mlp[idx] * coef2[idx] + res[idx] * coef1[idx]; ^ detected during: instantiation of "void moe_res_matmul(T *, T *, T *, int, int) [with T=__half]" at line 469 instantiation of "void launch_moe_res_matmul(T *, T *, T *, int, int, cudaStream_t) [with T=__half]" at line 479 /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(451): error: no operator "*" matches these operands operand types are: __half * __half mlp[idx] = mlp[idx] * coef2[idx] + res[idx] * coef1[idx]; ^ detected during: instantiation of "void moe_res_matmul(T *, T *, T *, int, int) [with T=__half]" at line 469 instantiation of "void launch_moe_res_matmul(T *, T *, T *, int, int, cudaStream_t) [with T=__half]" at line 479 /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(651): error: no operator "+" matches these operands operand types are: __half + __half T hidden_state = activation_buffer_1[v] + bias_buffer_1[v]; ^ detected during: instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=true]" at line 695 instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706 /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(652): error: no operator "+" matches these operands operand types are: __half + __half T pre_gate = activation_buffer_2[v] + bias_buffer_2[v]; ^ detected during: instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=true]" at line 695 instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706 /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(656): error: no operator "*" matches these operands operand types are: __half * __half activation_buffer_1[v] = hidden_state * gate; ^ detected during: instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=true]" at line 695 instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706 /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(651): error: no operator "+" matches these operands operand types are: __half + __half T hidden_state = activation_buffer_1[v] + bias_buffer_1[v]; ^ detected during: instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=false]" at line 698 instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706 /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(652): error: no operator "+" matches these operands operand types are: __half + __half T pre_gate = activation_buffer_2[v] + bias_buffer_2[v]; ^ detected during: instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=false]" at line 698 instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706 /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu(656): error: no operator "*" matches these operands operand types are: __half * __half activation_buffer_1[v] = hidden_state * gate; ^ detected during: instantiation of "void fused_gate_activation<T,useGelu>(T *, const T *, const T *, int, int, int) [with T=__half, useGelu=false]" at line 698 instantiation of "void launch_gated_activation(T *, const T *, const T *, int, int, int, __nv_bool, cudaStream_t) [with T=__half]" at line 706 8 errors detected in the compilation of "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu". [4/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.cu -o pointwise_ops.cuda.o [5/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.cu -o dequantize.cuda.o [6/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.cu -o relu.cuda.o [7/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu -o transform.cuda.o /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(37): warning #177-D: variable "d0_stride" was declared but never referenced int d0_stride = hidden_dim * seq_length; ^ Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(65): warning #177-D: variable "lane" was declared but never referenced int lane = d3 & 0x1f; ^ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(107): warning #177-D: variable "half_dim" was declared but never referenced unsigned half_dim = (rotary_dim << 3) >> 1; ^ detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" at line 276 /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(108): warning #177-D: variable "d0_stride" was declared but never referenced int d0_stride = hidden_dim * seq_length; ^ detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" at line 276 /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(124): warning #177-D: variable "vals_half" was declared but never referenced T2* vals_half = reinterpret_cast<T2*>(&vals_arr); ^ detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" at line 276 /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(125): warning #177-D: variable "output_half" was declared but never referenced T2* output_half = reinterpret_cast<T2*>(&output_arr); ^ detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" at line 276 /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(142): warning #177-D: variable "lane" was declared but never referenced int lane = d3 & 0x1f; ^ detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" at line 276 [8/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.cu -o softmax.cuda.o [9/11] /opt/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu -o apply_rotary_pos_emb.cuda.o [10/11] c++ -MMD -MF pt_binding.o.d -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/TH -isystem /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/include/THC -isystem /opt/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -c /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp -o pt_binding.o /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vector<at::Tensor> ds_softmax_context(at::Tensor&, at::Tensor&, int, bool, bool, int, int, float, bool, bool, int, bool, unsigned int, unsigned int, at::Tensor&) [with T = float]’: /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:2006:5: required from here /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:538:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 538 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(), | ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:538:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:539:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 539 | k * InferenceContext::Instance().GetMaxTokenLength(), | ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:539:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:547:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 547 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(), | ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:547:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:548:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 548 | k * InferenceContext::Instance().GetMaxTokenLength(), | ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:548:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vector<at::Tensor> ds_rms_mlp_gemm(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, float, at::Tensor&, at::Tensor&, bool, int, bool) [with T = float]’: /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:2006:5: required from here /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1575:72: warning: narrowing conversion of ‘(size_t)mlp_1_out_neurons’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 1575 | at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options); | ^~~~~~~~~~~~~~~~~ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1575:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing] /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vector<at::Tensor> ds_softmax_context(at::Tensor&, at::Tensor&, int, bool, bool, int, int, float, bool, bool, int, bool, unsigned int, unsigned int, at::Tensor&) [with T = __half]’: /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:2007:5: required from here /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:538:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 538 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(), | ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:538:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:539:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 539 | k * InferenceContext::Instance().GetMaxTokenLength(), | ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:539:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:547:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 547 | {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(), | ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:547:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:548:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 548 | k * InferenceContext::Instance().GetMaxTokenLength(), | ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:548:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vector<at::Tensor> ds_rms_mlp_gemm(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, float, at::Tensor&, at::Tensor&, bool, int, bool) [with T = __half]’: /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:2007:5: required from here /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1575:72: warning: narrowing conversion of ‘(size_t)mlp_1_out_neurons’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing] 1575 | at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options); | ^~~~~~~~~~~~~~~~~ /run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1575:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing] ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build subprocess.run( File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/run/media/user/hdd/ai-voice-cloning/./src/main.py", line 27, in <module> tts = load_tts() File "/run/media/user/hdd/ai-voice-cloning/src/utils.py", line 3666, in load_tts tts = TorToise_TTS(minor_optimizations=not args.low_vram, autoregressive_model_path=autoregressive_model, diffusion_model_path=diffusion_model, vocoder_model=vocoder_model, tokenizer_json=tokenizer_json, unsqueeze_sample_batches=args.unsqueeze_sample_batches, use_deepspeed=args.use_deepspeed) File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/api.py", line 308, in __init__ self.load_autoregressive_model(autoregressive_model_path) File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/api.py", line 392, in load_autoregressive_model self.autoregressive.post_init_gpt2_config(use_deepspeed=self.use_deepspeed, kv_cache=self.use_kv_cache) File "/run/media/user/hdd/ai-voice-cloning/modules/tortoise-tts/tortoise/models/autoregressive.py", line 371, in post_init_gpt2_config self.ds_engine = deepspeed.init_inference(model=self.inference_model, File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/__init__.py", line 342, in init_inference engine = InferenceEngine(model, config=ds_inference_config) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 160, in __init__ self._apply_injection_policy(config) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 411, in _apply_injection_policy replace_transformer_layer(client_module, self.module, checkpoint, config, self.config) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 332, in replace_transformer_layer replaced_module = replace_module(model=model, File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 576, in replace_module replaced_module, _ = _replace_module(model, policy, state_dict=sd) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 636, in _replace_module _, layer_id = _replace_module(child, File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 636, in _replace_module _, layer_id = _replace_module(child, File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 612, in _replace_module replaced_module = policies[child.__class__][0](child, File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 291, in replace_fn new_module = replace_with_policy(child, File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 246, in replace_with_policy _container.create_module() File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/module_inject/containers/gpt2.py", line 20, in create_module self.module = DeepSpeedGPTInference(_config, mp_group=self.mp_group) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/model_implementations/transformers/ds_gpt.py", line 20, in __init__ super().__init__(config, mp_group, quantize_scales, quantize_groups, merge_count, mlp_extra_grouping) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 58, in __init__ inference_module = builder.load() File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 446, in load return self.jit_load(verbose) File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 489, in jit_load op_module = load(name=self.name, File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1308, in load return _jit_compile( File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1710, in _jit_compile _write_ninja_file_and_build_library( File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1823, in _write_ninja_file_and_build_library _run_ninja_build( File "/run/media/user/hdd/ai-voice-cloning/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'transformer_inference'
Author

I figured out the problem. First I activate the python enivroment.
source ./venv/bin/activate
Then I install torch torchvision torchaudio with command.
pip3 install torch torchvision torchaudio
Then I follow the instruction from this link. This also works with CUDA 12.2
if you must use CUDA 12.1 on your system you have two options:

Remove the restriction on CUDA major version matching in DeepSpeed. You can do that by cloning the DeepSpeed Repo and modifying op_builder/builder.py to remove the exception. Then install DeepSpeed from source with pip install . in the cloned repo.

I figured out the problem. First I activate the python enivroment. `source ./venv/bin/activate` Then I install torch torchvision torchaudio with command. `pip3 install torch torchvision torchaudio` Then I follow the instruction from this link. This also works with CUDA 12.2 [if you must use CUDA 12.1 on your system you have two options:](https://github.com/microsoft/DeepSpeed/issues/2902#issuecomment-1530051657l) >Remove the restriction on CUDA major version matching in DeepSpeed. You can do that by cloning the DeepSpeed Repo and modifying op_builder/builder.py to remove the [exception](https://github.com/microsoft/DeepSpeed/blob/b4b63f521f3d12118a0240a6b46dd35ba79f5534/op_builder/builder.py#L88). Then install DeepSpeed from source with pip install . in the cloned repo.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#424
No description provided.