20% Inference Speed increase for Large VRAM (3090+) GPUS #84
Labels
No Label
bug
duplicate
enhancement
help wanted
insufficient info
invalid
news
not a bug
question
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: mrq/ai-voice-cloning#84
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I edited the function in devices.py that calculates the autoregressive batch size and managed to knock off 60 seconds from standard preset inference. Same seed, same settings, same lines.
I'd say it works out, at least for my 3090, a 20% speed increase. Setting 40/48 resulted in GIANT OOMS so this seems to be the sweet spot.
e650800447
Original Peasant 16 Batch Size
Modified 32 Batch Size
Per #87, it's not a good idea to blindly increase it, as longer sentences will break it.