Game Development Reference
and resources between draws actually consumes most of the draw submission
performance. From these numbers we can confirm that the number of draw calls
per frame is mainly a CPU overhead limitation and that we can reach the GPU
submission limit somewhere between the instancing and the shared VAO results.
This is the level of performance we are looking for.
2.3.2 Understanding the Nature of the CPU Overhead in Our Test
What is the nature of the CPU overhead in this VAO case? First, there is a
validation step where the drivers check that no OpenGL error is generated by
the OpenGL commands, as well as checking whether the vertex format and the
bound buffers have changed. The second part concerns the vertex setup where
the drivers generate a fetch shader devoted to building the vertices by indexing
the array buffer according to the vertex format. Using a fetch shader allows
reusing of the unified arithmetic logic units (ALUs) on the GPU to do the vertex
fetching. To avoid increasing the number of VAO validations and vertex setups,
the application should sort the rendering by vertex format and pack multiple
meshes of identical vertex formats into a single VAO.
2.3.3 Avoiding CPU Overhead by Reducing Resource Switching
In the previous sections we showed that switching resources per draw has a sig-
nificant impact on performance due to CPU overhead. From a software design
point of view, we can avoid this cost by packing multiple resources together and
using the GPU to index those resources.
For VAOs, we can pack together multiple meshes sharing the same vertex
format and relying on base vertex to access the right data per draw. For textures,
we can rely on texture 2D arrays to expose many textures per texture unit. For
uniforms, we can pack them into large uniform buffers sorted by update rate
and index the uniform blocks in the shader to access the right data per draw.
inside shaders, hence how much CPU side resource switching we can avoid.
Some resources like uniform blocks are extremely limited but others are gen-
erously provided. On AMD Southern Islands the maximum size for a texture 2D
array is 16,384 (width) by 16,384 (height) by 8,192 (layers) by 16 (RGBA32F)
bytes for a total of 32 TB for a single texture. Obviously we can't store this much
memory on a graphics card but we can address it. This is one of the motivations
behind the creation of the AMD_sparse_texture [Sellers 12a] extension enabling
partially resident memory of the GPU resources.
2.3.4 Indexing Resources in Shaders, Dynamically Uniform Expressions
Indexing resources is typically performed by relying on some of the built-in vari-
ables provided by OpenGL ( Figure 2.6 ).