Validate Training Configuration always sets settings that take more VRAM than is available. #168

Closed
opened 2023-03-24 01:39:15 +00:00 by sazandora · 1 comment

Like the title says, I think you may have fine tuned it TOO well on using the most VRAM possible while staying in limits. No matter what I try, it always hits 16-24 MiB over the VRAM limit and crashes if the settings are adjusted with validation.

Like the title says, I think you may have fine tuned it TOO well on using the most VRAM possible while staying in limits. No matter what I try, it always hits 16-24 MiB over the VRAM limit and crashes if the settings are adjusted with validation.
Owner

I only can give rough estimates given:

  • cards I've tested (6GiB 2060, 16GiB A4000, 16+16GiB 6800XTs, 80GiB A100)
  • semi-sane batch sizes
  • using bitsandbytes
  • a semi-normally distributed (by length) of audio files for training
  • having absolutely nothing else using the target GPU
    • my 2060 tests were when it was a slave card
    • my 2x6800XTs already get an overhead from having the model + kernels copied twice, so its metric of 32GiB is already conservative
    • the A4000 and A100 were on paperspace instances which have a baseline of zero

Like everything the web UI does, it's merely a suggestion.

I only can give rough estimates given: * cards I've tested (6GiB 2060, 16GiB A4000, 16+16GiB 6800XTs, 80GiB A100) * semi-sane batch sizes * using bitsandbytes * a semi-normally distributed (by length) of audio files for training * having absolutely nothing else using the target GPU - my 2060 tests were when it was a slave card - my 2x6800XTs already get an overhead from having the model + kernels copied twice, so its metric of 32GiB is already conservative - the A4000 and A100 were on paperspace instances which have a baseline of zero Like everything the web UI does, it's merely a suggestion.
mrq closed this issue 2023-03-24 02:00:11 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#168
No description provided.