20% Inference Speed increase for Large VRAM (3090+) GPUS #84

Closed
opened 2023-03-07 14:13:34 +00:00 by deviandice · 1 comment

I edited the function in devices.py that calculates the autoregressive batch size and managed to knock off 60 seconds from standard preset inference. Same seed, same settings, same lines.

I'd say it works out, at least for my 3090, a 20% speed increase. Setting 40/48 resulted in GIANT OOMS so this seems to be the sweet spot.

e650800447

Original Peasant 16 Batch Size

Loading voice: blaze
Reading from latent: ./voices/blaze/cond_latents_0fd98af0.pth
[1/4] Generating line: [I'm in love,] One of the best ways to be a good friend is to accept the care and support from your friends that they naturally offer.
Total free memory available: 23.99951171875
Setting AutoRegressive Batch Size to: 16
Generating autoregressive samples
Computing best candidates using CLVP
Transforming autoregressive outputs into audio..
Generating line took 76.78550267219543 seconds
[2/4] Generating line: [I'm in love,] If your friend enjoy helping their friends, and it is important to show that you accept and appreciate what they have to offer.
Total free memory available: 23.99951171875
Setting AutoRegressive Batch Size to: 16
Generating autoregressive samples
Computing best candidates using CLVP
Transforming autoregressive outputs into audio..
Generating line took 65.99525809288025 seconds
[3/4] Generating line: [I'm in love,] However, it is also important that you offer your support in return. Some people are not always good at asking for help when they need it.
Total free memory available: 23.99951171875
Setting AutoRegressive Batch Size to: 16
Generating autoregressive samples
Computing best candidates using CLVP
Transforming autoregressive outputs into audio..
Generating line took 78.1494972705841 seconds
[4/4] Generating line: [I'm in love,] In many cases, simply being willing to listen to whatever they have to share can be very helpful.
Total free memory available: 23.99951171875
Setting AutoRegressive Batch Size to: 16
Generating autoregressive samples
Computing best candidates using CLVP
Transforming autoregressive outputs into audio..
Generating line took 66.5297338962555 seconds
Loading Voicefixer
Loaded Voicefixer
Generation took 294.33222913742065 seconds, saved to './results/blaze//blaze_00017_combined_fixed.wav'

Modified 32 Batch Size

Loading voice: blaze
Reading from latent: ./voices/blaze/cond_latents_0fd98af0.pth
[1/4] Generating line: [I'm in love,] One of the best ways to be a good friend is to accept the care and support from your friends that they naturally offer.
Total device memory available: 23.99951171875
Setting AutoRegressive Batch Size to: 32
Generating autoregressive samples
Computing best candidates using CLVP
Transforming autoregressive outputs into audio..
Generating line took 59.91248345375061 seconds
[2/4] Generating line: [I'm in love,] If your friend enjoy helping their friends, and it is important to show that you accept and appreciate what they have to offer.
Total device memory available: 23.99951171875
Setting AutoRegressive Batch Size to: 32
Generating autoregressive samples
Computing best candidates using CLVP
Transforming autoregressive outputs into audio..
Generating line took 57.53539824485779 seconds
[3/4] Generating line: [I'm in love,] However, it is also important that you offer your support in return. Some people are not always good at asking for help when they need it.
Total device memory available: 23.99951171875
Setting AutoRegressive Batch Size to: 32
Generating autoregressive samples
Computing best candidates using CLVP
Transforming autoregressive outputs into audio..
Generating line took 61.35547494888306 seconds
[4/4] Generating line: [I'm in love,] In many cases, simply being willing to listen to whatever they have to share can be very helpful.
Total device memory available: 23.99951171875
Setting AutoRegressive Batch Size to: 32
Generating autoregressive samples
Computing best candidates using CLVP
Transforming autoregressive outputs into audio..
Generating line took 44.585120677948 seconds
Loading Voicefixer
Loaded Voicefixer
Generation took 229.13134860992432 seconds, saved to './results/blaze//blaze_00018_combined_fixed.wav'
I edited the function in devices.py that calculates the autoregressive batch size and managed to knock off 60 seconds from standard preset inference. Same seed, same settings, same lines. I'd say it works out, at least for my 3090, a 20% speed increase. Setting 40/48 resulted in GIANT OOMS so this seems to be the sweet spot. https://git.ecker.tech/deviandice/tortoise-tts/commit/e6508004477b56bf358e0104f1d571e22f25d951 **Original Peasant 16 Batch Size** ``` Loading voice: blaze Reading from latent: ./voices/blaze/cond_latents_0fd98af0.pth [1/4] Generating line: [I'm in love,] One of the best ways to be a good friend is to accept the care and support from your friends that they naturally offer. Total free memory available: 23.99951171875 Setting AutoRegressive Batch Size to: 16 Generating autoregressive samples Computing best candidates using CLVP Transforming autoregressive outputs into audio.. Generating line took 76.78550267219543 seconds [2/4] Generating line: [I'm in love,] If your friend enjoy helping their friends, and it is important to show that you accept and appreciate what they have to offer. Total free memory available: 23.99951171875 Setting AutoRegressive Batch Size to: 16 Generating autoregressive samples Computing best candidates using CLVP Transforming autoregressive outputs into audio.. Generating line took 65.99525809288025 seconds [3/4] Generating line: [I'm in love,] However, it is also important that you offer your support in return. Some people are not always good at asking for help when they need it. Total free memory available: 23.99951171875 Setting AutoRegressive Batch Size to: 16 Generating autoregressive samples Computing best candidates using CLVP Transforming autoregressive outputs into audio.. Generating line took 78.1494972705841 seconds [4/4] Generating line: [I'm in love,] In many cases, simply being willing to listen to whatever they have to share can be very helpful. Total free memory available: 23.99951171875 Setting AutoRegressive Batch Size to: 16 Generating autoregressive samples Computing best candidates using CLVP Transforming autoregressive outputs into audio.. Generating line took 66.5297338962555 seconds Loading Voicefixer Loaded Voicefixer Generation took 294.33222913742065 seconds, saved to './results/blaze//blaze_00017_combined_fixed.wav' ``` **Modified 32 Batch Size** ``` Loading voice: blaze Reading from latent: ./voices/blaze/cond_latents_0fd98af0.pth [1/4] Generating line: [I'm in love,] One of the best ways to be a good friend is to accept the care and support from your friends that they naturally offer. Total device memory available: 23.99951171875 Setting AutoRegressive Batch Size to: 32 Generating autoregressive samples Computing best candidates using CLVP Transforming autoregressive outputs into audio.. Generating line took 59.91248345375061 seconds [2/4] Generating line: [I'm in love,] If your friend enjoy helping their friends, and it is important to show that you accept and appreciate what they have to offer. Total device memory available: 23.99951171875 Setting AutoRegressive Batch Size to: 32 Generating autoregressive samples Computing best candidates using CLVP Transforming autoregressive outputs into audio.. Generating line took 57.53539824485779 seconds [3/4] Generating line: [I'm in love,] However, it is also important that you offer your support in return. Some people are not always good at asking for help when they need it. Total device memory available: 23.99951171875 Setting AutoRegressive Batch Size to: 32 Generating autoregressive samples Computing best candidates using CLVP Transforming autoregressive outputs into audio.. Generating line took 61.35547494888306 seconds [4/4] Generating line: [I'm in love,] In many cases, simply being willing to listen to whatever they have to share can be very helpful. Total device memory available: 23.99951171875 Setting AutoRegressive Batch Size to: 32 Generating autoregressive samples Computing best candidates using CLVP Transforming autoregressive outputs into audio.. Generating line took 44.585120677948 seconds Loading Voicefixer Loaded Voicefixer Generation took 229.13134860992432 seconds, saved to './results/blaze//blaze_00018_combined_fixed.wav' ```
Owner

Per #87, it's not a good idea to blindly increase it, as longer sentences will break it.

Per https://git.ecker.tech/mrq/ai-voice-cloning/issues/87, it's not a good idea to blindly increase it, as longer sentences will break it.
mrq closed this issue 2023-03-07 19:41:12 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#84
No description provided.