Game Development Reference
Because the constant engine of AMD OpenGL 4 GPUs can only fetch a single
resource header per workgroup, indexing an array of resources must be done using
what OpenGL calls dynamically uniform expressions [Kessenich 12, Section 3.8.3].
All work items in a work group must use the same index to access the same
resource. Before OpenGL 4.3 introduced the compute shader stage, OpenGL
didn't have the notion of workgroup or work item but we can consider each vertex,
primitive or fragment as a work item. If an index is set per primitive, it will be
a dynamically uniform expression on the fragment shader stage because all the
fragments will belong to the same workgroup. Resources that must be indexed by
dynamically uniform expressions are sampler arrays, image arrays, uniform block
arrays, atomic counter buffer arrays, shader storage block arrays, and subroutine
index arrays. Furthermore, GLSL shaders may access resources through a series of
if statements. This is nothing but another embodiment of resource indexing that
requires following the same constraints as other dynamically uniform expressions.
Programmable Vertex Pulling
Relying on resource batching and GPU indexing of resources can significantly
reduce the CPU overhead [Hilaire 12]. However, this approach still suffers from
several limitations. First, CPU overhead still exists in the form of CPU draw call
actual real-time rendering applications don't just submit draws; they need to
select the draws that they expect to be visible first, performing culling. This
task is not trivial as it often relies on space partitioning techniques to quickly
analyze the scene typically consuming a lot of CPU time. To hide this cost, many
applications use a dedicated thread for this task, introducing a frame of latency.
When the scene increases in complexity, like we are imagining in this chapter, the
time consumed by this thread increases until its latency can't be hidden anymore.
One idea is to move the culling and sorting from the CPU to the GPU by
relying on OpenCL or the OpenGL compute shader stage so that the GPU selects
and submits itself the draws. We call this approach the Programmable Vertex
stages. On the one hand, the Programmable Draw Dispatch stage uses compute
shaders with OpenGL 4.3 multi draw indirect buffers [Sellers 12b]. On the other
hand, the Programmable Vertex Fetching stage uses the vertex shader stage to
index into shader storage buffers or texture buffers to manually compose each
vertex instead of using the VAO.
Programmable Draw Dispatch
OpenGL 4.0 introduced the draw indirect functionality that allows storing the
parameters of draw commands into a buffer object. Unfortunately, a call to such