Game Development Reference
GPU, which defines the number of primitives that the GPU can render per frame
without becoming the bottleneck. Because the draw call limit is so small while
the peak primitive rate is proportionally high, programs must aim to render a lot
of primitives per draw call.
To tackle this issue, Direct3D 11 introduced the concept of deferred contexts,
where multiple contexts can record commands executed later by the main context.
Unfortunately, this strategy is not particularly effective and doesn't scale linearly
across the number of cores utilized. This is mainly due to the synchronization
in the Direct3D 11 runtime and the fact that today's GPUs only have a single
graphics ring, which can only process a single command queue at a time.
An earlier solution introduced by OpenGL 3 hardware was instancing. It
provides one way to deal with a growing demand for scene complexity without
adding CPU overhead. Unfortunately, instancing is nothing but duplicating a
mesh multiple times, which limits the complexity we can reach. Another approach
is to use batching, which aggregates multiple meshes into a single set of buffer
objects and issues a single draw call. This performs well for perfectly static
geometry and is relatively practical to use. However, it limits the amount and
the granularity of the culling we can perform. It can also waste some memory
when padding meshes into fixed-size memory chunks.
To reach a much higher scene complexity, we are looking for a solution where
we could submit a lot more draws per frame;
each draw would render meshes with different geometry, number of vertices,
each draw could render a small number of primitives but still hit the GPU
each draw could access different resources.
Evaluating Draw Call CPU Overhead and
the GPU Draw Submission Limitation
The Performance Test
Due to the complex nature of real-time graphics software, it is dicult to under-
stand the source for the cost of a single draw call: Is the 1,000 to 5,000 draw calls
limit a GPU or a CPU limitation?
In this section, we follow our intuition that tells us that the more resource
switching we do between draws, the higher the CPU overhead will be. Meanwhile,
the GPU has a fixed cost for each draw. To build a relevant test to prove this
hypothesis, we define the following criteria: