Update 'tortoise/utils/device.py'

Noticed that the autoregressive batch size was being set off of VRAM size. Adjusted to scale for the VRAM capacity of 90 series GPUs. In this case, 16 -> 32 batches. 

Using the standard pre-set with ChungusVGAN, I went from 16 steps to 8.
Over an average of 3 runs, I achieved an average of 294 seconds with 16 batches, to 234 seconds with 32. Can't complain at a 1.2x speed increase with functionally 2 lines of code. Can't complain. 

I restarted tortoise each run, and executing ```torch.cuda.empty_cache()``` just before loading the autoregressive model to clean the memory cache each time.
This commit is contained in:
deviandice 2023-03-07 14:05:27 +00:00
parent 26133c2031
commit e650800447

View File

@ -50,7 +50,7 @@ def get_device_batch_size():
name = get_device_name()
if name == "dml":
# there's nothing publically accessible in the DML API that exposes this
# there's nothing publicly accessible in the DML API that exposes this
# there's a method to get currently used RAM statistics... as tiles
available = 1
elif name == "cuda":
@ -59,12 +59,23 @@ def get_device_batch_size():
available = psutil.virtual_memory()[4]
availableGb = available / (1024 ** 3)
if availableGb > 14:
print(f"Total device memory available: {availableGb}")
if availableGb > 18:
print(f"Setting AutoRegressive Batch Size to: 32")
print(f"Damn. Nice GPU Dude.")
return 32
elif availableGb > 14:
print(f"Setting AutoRegressive Batch Size to: 16")
return 16
elif availableGb > 10:
print(f"Setting AutoRegressive Batch Size to: 8")
return 8
elif availableGb > 7:
print(f"Setting AutoRegressive Batch Size to: 4")
return 4
print(f"Setting AutoRegressive Batch Size to: 1")
print(f"Don't cry about it if it doesn't work.")
return 1
def get_device_count(name=get_device_name()):