master #369

I did some test with webui and cli.
If i run in webui with same setting, it took ~20s to inference with deepspeed and ~40s without deepspeed.
Here is output i run with webui

Whisper detected
[2023-09-05 10:21:56,713] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
VALL-E detected
Bark detected
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Loading TorToiSe... (AR: /mnt/e/ai-voice-cloning/models/tortoise/autoregressive.pth, diffusion: ./models/tortoise/diffusion_decoder.pth, vocoder: bigvgan_24khz_100band)
Hardware acceleration found: cuda
Loading tokenizer JSON: ./modules/tortoise-tts/tortoise/data/tokenizer.json
Loaded tokenizer
Loading autoregressive model: /mnt/e/ai-voice-cloning/models/tortoise/autoregressive.pth
[2023-09-05 10:23:13,696] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.10.2, git-hash=unknown, git-branch=unknown
[2023-09-05 10:23:13,698] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-09-05 10:23:13,698] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
WARNING! Setting BLOOMLayerPolicy._orig_layer_class to None due to Exception: module 'transformers.models' has no attribute 'bloom'
Using /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.5863604545593262 seconds
[2023-09-05 10:23:15,232] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False}
Using /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.023633956909179688 seconds
Loaded autoregressive model
Loaded diffusion model
Loading vocoder model: bigvgan_24khz_100band
Loading vocoder model: bigvgan_24khz_100band.pth
Removing weight norm...
Loaded vocoder model
Loaded TTS, ready for generation.
[1/1] Generating line: [I am really happy,] When use DeepSpeed is True, the methods that load models will use DeepSpeed for loading, and when it's False, the models will be loaded without DeepSpeed.
Loading voice: gun with model d1f79232
Loading voice: gun
Reading from latent: ./voices/gun//cond_latents_d1f79232.pth
------------------------------------------------------
Free memory : 17.803711 (GigaBytes)
Total memory: 23.999390 (GigaBytes)
Requested memory: 5.375000 (GigaBytes)
Setting maximum total tokens (input + output) to 1024
WorkSpace: 0x1b4d800000
------------------------------------------------------
Generating line took 21.546339511871338 seconds
/home/ubuntu/miniconda3/envs/tortoise/lib/python3.10/site-packages/torchaudio/functional/functional.py:1458: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged.
  warnings.warn(
Generation took 22.051681518554688 seconds, saved to './results//gun//gun_00039.wav'

If i run in cli with same setting it always take same ~40s with deepspeed and without deepspeed
Here is output i run with cli

python src/cli.py --use-deepspeed True --prune-nonfinal-outputs True \
> --autoregressive-model ./models/tortoise/autoregressive.pth \
> --diffusion-model ./models/tortoise/diffusion_decoder.pth \
> --vocoder-model bigvgan_24khz_100band \
> --tokenizer-json ./modules/tortoise-tts/tortoise/data/tokenizer.json \
> --voice gun --preset ultra_fast --candidates 1 \
> --text "When use DeepSpeed is True, the methods that load models will use DeepSpeed for loading, and when it's False, the models will be loaded without DeepSpeed."  \
> --num_autoregressive_samples 16 --diffusion_iterations 30 --emotion "Happy" --experimentals "Conditioning-Free" \
> --voice_latents_chunks 4 --candidates 1 --diffusion_sampler "DDIM" --output-sample-rate 44100 \
> --voice_latents_original_ar True --voice_latents_original_diffusion True --no-embed-output-metadata True
Whisper detected
[2023-09-05 10:29:38,770] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
VALL-E detected
Bark detected
Loading TorToiSe... (AR: ./models/tortoise/autoregressive.pth, diffusion: ./models/tortoise/diffusion_decoder.pth, vocoder: bigvgan_24khz_100band)
Hardware acceleration found: cuda
Loading tokenizer JSON: ./modules/tortoise-tts/tortoise/data/tokenizer.json
Loaded tokenizer
Loading autoregressive model: ./models/tortoise/autoregressive.pth
[2023-09-05 10:30:52,417] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.10.2, git-hash=unknown, git-branch=unknown
[2023-09-05 10:30:52,418] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-09-05 10:30:52,418] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
WARNING! Setting BLOOMLayerPolicy._orig_layer_class to None due to Exception: module 'transformers.models' has no attribute 'bloom'
Using /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.5363190174102783 seconds
[2023-09-05 10:30:53,854] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False}
Using /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.025423049926757812 seconds
Loaded autoregressive model
Loaded diffusion model
Loading vocoder model: bigvgan_24khz_100band
Loading vocoder model: bigvgan_24khz_100band.pth
Removing weight norm...
Loaded vocoder model
Loaded TTS, ready for generation.
[1/1] Generating line: [I am really happy,] When use DeepSpeed is True, the methods that load models will use DeepSpeed for loading, and when it's False, the models will be loaded without DeepSpeed.
Loading voice: gun with model d1f79232
Reading from latent: ./voices/gun//cond_latents_d1f79232.pth
Generating autoregressive samples:   0%|                                                          | 0/1 [00:00<?, ?it/s]------------------------------------------------------
Free memory : 17.803711 (GigaBytes)
Total memory: 23.999390 (GigaBytes)
Requested memory: 5.375000 (GigaBytes)
Setting maximum total tokens (input + output) to 1024
WorkSpace: 0x1b4d800000
------------------------------------------------------
Generating autoregressive samples: 100%|██████████████████████████████████████████████████| 1/1 [00:08<00:00,  8.86s/it]Computing best candidates using CLVP: 100%|███████████████████████████████████████████████| 1/1 [00:20<00:00, 20.38s/it]Generating line took 40.81342697143555 seconds
/home/ubuntu/miniconda3/envs/tortoise/lib/python3.10/site-packages/torchaudio/functional/functional.py:1458: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged.
  warnings.warn(
Embedding metadata...: 100%|██████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 27.25it/s]Generation took 41.372543811798096 seconds, saved to './results//gun//gun_00040.wav'

(tortoise) ubuntu@HP:/mnt/e/ai-voice-cloning$ python src/cli.py --use-deepspeed False --prune-nonfinal-outputs True --autoregressive-model ./models/tortoise/autoregressive.pth --diffusion-model ./models/tortoise/diffusion_decoder.pth --vocoder-model bigvgan_24khz_100band --tokenizer-json ./modules/tortoise-tts/tortoise/data/tokenizer.json --voice gun --preset ultra_fast --candidates 1 --text "When use DeepSpeed is True, the methods that load models will use DeepSpeed for loading, and when it's False, the models will be loaded without DeepSpeed."  --num_autoregressive_samples 16 --diffusion_iterations 30 --emotion "Happy" --experimentals "Conditioning-Free" --voice_latents_chunks 4 --candidates 1 --diffusion_sampler "DDIM" --output-sample-rate 44100 --voice_latents_original_ar True --voice_latents_original_diffusion True --no-embed-output-metadata True
Whisper detected
[2023-09-05 10:35:36,272] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
VALL-E detected
Bark detected
Loading TorToiSe... (AR: ./models/tortoise/autoregressive.pth, diffusion: ./models/tortoise/diffusion_decoder.pth, vocoder: bigvgan_24khz_100band)
Hardware acceleration found: cuda
Loading tokenizer JSON: ./modules/tortoise-tts/tortoise/data/tokenizer.json
Loaded tokenizer
Loading autoregressive model: ./models/tortoise/autoregressive.pth
[2023-09-05 10:36:49,144] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.10.2, git-hash=unknown, git-branch=unknown
[2023-09-05 10:36:49,145] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-09-05 10:36:49,146] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
WARNING! Setting BLOOMLayerPolicy._orig_layer_class to None due to Exception: module 'transformers.models' has no attribute 'bloom'
Using /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.586035966873169 seconds
[2023-09-05 10:36:50,648] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False}
Using /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.028942584991455078 seconds
Loaded autoregressive model
Loaded diffusion model
Loading vocoder model: bigvgan_24khz_100band
Loading vocoder model: bigvgan_24khz_100band.pth
Removing weight norm...
Loaded vocoder model
Loaded TTS, ready for generation.
[1/1] Generating line: [I am really happy,] When use DeepSpeed is True, the methods that load models will use DeepSpeed for loading, and when it's False, the models will be loaded without DeepSpeed.
Loading voice: gun with model d1f79232
Reading from latent: ./voices/gun//cond_latents_d1f79232.pth
Generating autoregressive samples:   0%|                                                          | 0/1 [00:00<?, ?it/s]------------------------------------------------------
Free memory : 17.803711 (GigaBytes)
Total memory: 23.999390 (GigaBytes)
Requested memory: 5.375000 (GigaBytes)
Setting maximum total tokens (input + output) to 1024
WorkSpace: 0x1b4d800000
------------------------------------------------------
Generating autoregressive samples: 100%|██████████████████████████████████████████████████| 1/1 [00:07<00:00,  7.92s/it]Computing best candidates using CLVP: 100%|███████████████████████████████████████████████| 1/1 [00:20<00:00, 20.09s/it]Generating line took 39.286839962005615 seconds
/home/ubuntu/miniconda3/envs/tortoise/lib/python3.10/site-packages/torchaudio/functional/functional.py:1458: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged.
  warnings.warn(
Embedding metadata...: 100%|██████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 22.81it/s]Generation took 39.848819732666016 seconds, saved to './results//gun//gun_00041.wav'

Can you take a look inside src/cli.py , webui.py and utils.py. I'm trying to make cli work with deepspeed but something went wrong, I did some test with webui and cli. If i run in webui with same setting, it took ~20s to inference with deepspeed and ~40s without deepspeed. Here is output i run with webui ``` Whisper detected [2023-09-05 10:21:56,713] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) VALL-E detected Bark detected Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Loading TorToiSe... (AR: /mnt/e/ai-voice-cloning/models/tortoise/autoregressive.pth, diffusion: ./models/tortoise/diffusion_decoder.pth, vocoder: bigvgan_24khz_100band) Hardware acceleration found: cuda Loading tokenizer JSON: ./modules/tortoise-tts/tortoise/data/tokenizer.json Loaded tokenizer Loading autoregressive model: /mnt/e/ai-voice-cloning/models/tortoise/autoregressive.pth [2023-09-05 10:23:13,696] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.10.2, git-hash=unknown, git-branch=unknown [2023-09-05 10:23:13,698] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2023-09-05 10:23:13,698] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 WARNING! Setting BLOOMLayerPolicy._orig_layer_class to None due to Exception: module 'transformers.models' has no attribute 'bloom' Using /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118/transformer_inference/build.ninja... Building extension module transformer_inference... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module transformer_inference... Time to load transformer_inference op: 0.5863604545593262 seconds [2023-09-05 10:23:15,232] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False} Using /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.023633956909179688 seconds Loaded autoregressive model Loaded diffusion model Loading vocoder model: bigvgan_24khz_100band Loading vocoder model: bigvgan_24khz_100band.pth Removing weight norm... Loaded vocoder model Loaded TTS, ready for generation. [1/1] Generating line: [I am really happy,] When use DeepSpeed is True, the methods that load models will use DeepSpeed for loading, and when it's False, the models will be loaded without DeepSpeed. Loading voice: gun with model d1f79232 Loading voice: gun Reading from latent: ./voices/gun//cond_latents_d1f79232.pth ------------------------------------------------------ Free memory : 17.803711 (GigaBytes) Total memory: 23.999390 (GigaBytes) Requested memory: 5.375000 (GigaBytes) Setting maximum total tokens (input + output) to 1024 WorkSpace: 0x1b4d800000 ------------------------------------------------------ Generating line took 21.546339511871338 seconds /home/ubuntu/miniconda3/envs/tortoise/lib/python3.10/site-packages/torchaudio/functional/functional.py:1458: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged. warnings.warn( Generation took 22.051681518554688 seconds, saved to './results//gun//gun_00039.wav' ``` If i run in cli with same setting it always take same ~40s with deepspeed and without deepspeed Here is output i run with cli ``` python src/cli.py --use-deepspeed True --prune-nonfinal-outputs True \ > --autoregressive-model ./models/tortoise/autoregressive.pth \ > --diffusion-model ./models/tortoise/diffusion_decoder.pth \ > --vocoder-model bigvgan_24khz_100band \ > --tokenizer-json ./modules/tortoise-tts/tortoise/data/tokenizer.json \ > --voice gun --preset ultra_fast --candidates 1 \ > --text "When use DeepSpeed is True, the methods that load models will use DeepSpeed for loading, and when it's False, the models will be loaded without DeepSpeed." \ > --num_autoregressive_samples 16 --diffusion_iterations 30 --emotion "Happy" --experimentals "Conditioning-Free" \ > --voice_latents_chunks 4 --candidates 1 --diffusion_sampler "DDIM" --output-sample-rate 44100 \ > --voice_latents_original_ar True --voice_latents_original_diffusion True --no-embed-output-metadata True Whisper detected [2023-09-05 10:29:38,770] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) VALL-E detected Bark detected Loading TorToiSe... (AR: ./models/tortoise/autoregressive.pth, diffusion: ./models/tortoise/diffusion_decoder.pth, vocoder: bigvgan_24khz_100band) Hardware acceleration found: cuda Loading tokenizer JSON: ./modules/tortoise-tts/tortoise/data/tokenizer.json Loaded tokenizer Loading autoregressive model: ./models/tortoise/autoregressive.pth [2023-09-05 10:30:52,417] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.10.2, git-hash=unknown, git-branch=unknown [2023-09-05 10:30:52,418] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2023-09-05 10:30:52,418] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 WARNING! Setting BLOOMLayerPolicy._orig_layer_class to None due to Exception: module 'transformers.models' has no attribute 'bloom' Using /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118/transformer_inference/build.ninja... Building extension module transformer_inference... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module transformer_inference... Time to load transformer_inference op: 0.5363190174102783 seconds [2023-09-05 10:30:53,854] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False} Using /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.025423049926757812 seconds Loaded autoregressive model Loaded diffusion model Loading vocoder model: bigvgan_24khz_100band Loading vocoder model: bigvgan_24khz_100band.pth Removing weight norm... Loaded vocoder model Loaded TTS, ready for generation. [1/1] Generating line: [I am really happy,] When use DeepSpeed is True, the methods that load models will use DeepSpeed for loading, and when it's False, the models will be loaded without DeepSpeed. Loading voice: gun with model d1f79232 Reading from latent: ./voices/gun//cond_latents_d1f79232.pth Generating autoregressive samples: 0%| | 0/1 [00:00<?, ?it/s]------------------------------------------------------ Free memory : 17.803711 (GigaBytes) Total memory: 23.999390 (GigaBytes) Requested memory: 5.375000 (GigaBytes) Setting maximum total tokens (input + output) to 1024 WorkSpace: 0x1b4d800000 ------------------------------------------------------ Generating autoregressive samples: 100%|██████████████████████████████████████████████████| 1/1 [00:08<00:00, 8.86s/it]Computing best candidates using CLVP: 100%|███████████████████████████████████████████████| 1/1 [00:20<00:00, 20.38s/it]Generating line took 40.81342697143555 seconds /home/ubuntu/miniconda3/envs/tortoise/lib/python3.10/site-packages/torchaudio/functional/functional.py:1458: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged. warnings.warn( Embedding metadata...: 100%|██████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 27.25it/s]Generation took 41.372543811798096 seconds, saved to './results//gun//gun_00040.wav' (tortoise) ubuntu@HP:/mnt/e/ai-voice-cloning$ python src/cli.py --use-deepspeed False --prune-nonfinal-outputs True --autoregressive-model ./models/tortoise/autoregressive.pth --diffusion-model ./models/tortoise/diffusion_decoder.pth --vocoder-model bigvgan_24khz_100band --tokenizer-json ./modules/tortoise-tts/tortoise/data/tokenizer.json --voice gun --preset ultra_fast --candidates 1 --text "When use DeepSpeed is True, the methods that load models will use DeepSpeed for loading, and when it's False, the models will be loaded without DeepSpeed." --num_autoregressive_samples 16 --diffusion_iterations 30 --emotion "Happy" --experimentals "Conditioning-Free" --voice_latents_chunks 4 --candidates 1 --diffusion_sampler "DDIM" --output-sample-rate 44100 --voice_latents_original_ar True --voice_latents_original_diffusion True --no-embed-output-metadata True Whisper detected [2023-09-05 10:35:36,272] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) VALL-E detected Bark detected Loading TorToiSe... (AR: ./models/tortoise/autoregressive.pth, diffusion: ./models/tortoise/diffusion_decoder.pth, vocoder: bigvgan_24khz_100band) Hardware acceleration found: cuda Loading tokenizer JSON: ./modules/tortoise-tts/tortoise/data/tokenizer.json Loaded tokenizer Loading autoregressive model: ./models/tortoise/autoregressive.pth [2023-09-05 10:36:49,144] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.10.2, git-hash=unknown, git-branch=unknown [2023-09-05 10:36:49,145] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2023-09-05 10:36:49,146] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 WARNING! Setting BLOOMLayerPolicy._orig_layer_class to None due to Exception: module 'transformers.models' has no attribute 'bloom' Using /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118/transformer_inference/build.ninja... Building extension module transformer_inference... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module transformer_inference... Time to load transformer_inference op: 0.586035966873169 seconds [2023-09-05 10:36:50,648] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False} Using /mnt/e/ai-voice-cloning/models/torch_extensions/py310_cu118 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.028942584991455078 seconds Loaded autoregressive model Loaded diffusion model Loading vocoder model: bigvgan_24khz_100band Loading vocoder model: bigvgan_24khz_100band.pth Removing weight norm... Loaded vocoder model Loaded TTS, ready for generation. [1/1] Generating line: [I am really happy,] When use DeepSpeed is True, the methods that load models will use DeepSpeed for loading, and when it's False, the models will be loaded without DeepSpeed. Loading voice: gun with model d1f79232 Reading from latent: ./voices/gun//cond_latents_d1f79232.pth Generating autoregressive samples: 0%| | 0/1 [00:00<?, ?it/s]------------------------------------------------------ Free memory : 17.803711 (GigaBytes) Total memory: 23.999390 (GigaBytes) Requested memory: 5.375000 (GigaBytes) Setting maximum total tokens (input + output) to 1024 WorkSpace: 0x1b4d800000 ------------------------------------------------------ Generating autoregressive samples: 100%|██████████████████████████████████████████████████| 1/1 [00:07<00:00, 7.92s/it]Computing best candidates using CLVP: 100%|███████████████████████████████████████████████| 1/1 [00:20<00:00, 20.09s/it]Generating line took 39.286839962005615 seconds /home/ubuntu/miniconda3/envs/tortoise/lib/python3.10/site-packages/torchaudio/functional/functional.py:1458: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged. warnings.warn( Embedding metadata...: 100%|██████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 22.81it/s]Generation took 39.848819732666016 seconds, saved to './results//gun//gun_00041.wav' ```

Sign in to join this conversation.

No reviewers