Here's a number that keeps GPU cluster operators up at night: a single H100 GPU costs roughly $2-4 per hour. On a 100-GPU cluster, every percentage point of wasted utilization costs $175K-350K annually.
Where the Time Goes
GPU idle time comes from three sources, and traditional monitoring only catches the obvious one:
Queue Gaps
5-15%Time between one job ending and the next starting. The scheduler needs time to allocate, and jobs don't pack perfectly.
Over-Provisioning
10-20%Jobs requesting 8 GPUs but only utilizing 4-5. Teams pad requests because they can't predict what they need.
Scheduling Blindness
5-10%Sub-optimal job ordering. A 2-GPU job that could fill a gap sits behind an 8-GPU job because the scheduler doesn't predict futures.
Combined, these account for 20-45% of cluster capacity. On our example 100-GPU cluster, that's $520K-$1.17M in annual waste.
What 10% Improvement Actually Looks Like
Impact Model
The math is straightforward. The hard part is knowing where the waste is. That's an observability problem, not a hardware problem.
You don't need more GPUs. You need to see how the ones you have are actually being used.
Find your cluster's waste
VGAC shows exactly where GPU time goes — and what you can do about it.
View on GitHub