Game Development Reference
compute the actual index of each vertex. We believe that such parameters
are not necessarily useful for all scenarios. The registers used for those
parameters should become user-defined variables to store the DrawID, for
The series of if statements required to select the right vertex format inside
the vertex fetching function is not very elegant as it introduces a level of
indirection. By backing subroutines in buffers, we could select a subroutine
per draw and effectively hide this indirection.
The strategy behind the Programmable Vertex Pulling Rendering Pipeline
is to replace CPU resource switching by GPU-based indexing of the re-
sources. For this to be possible, we need to be able to access enough differ-
ent resources. AMD Southern Islands architecture supports bindless buffers
and textures so that an unlimited number of resources could be bound. By
working with partially resident memory, we believe that both features would
enable rendering of more complex scenes.
Beyond API improvements, we can also consider additional software design re-
Generating the draw indirect buffer by itself is a complex task that we
currently solve by using the brute force performance of the GPU in an
OpenCL kernel. Instead, could we rely on GPU-based space partitioning
techniques? Octrees, k-d trees, or bounding volume hierarchies (BVHs)?
Which space partitioning techniques can be eciently implemented on the
Could we use programmable vertex pulling to bring deferred tile based
rendering on immediate rendering GPUs? Is there an ecient GPU-based
algorithm to build lists of triangles and dispatch them using separate draws
per tile? Could we enable Order Independent Transparency [Knowles 12]
in a single pass if we expose a portion of the Local Data Store [AMD 12] in
the fragment shader stage?
Can we use AMD_query_buffer_object [Rakos 12b] to build a heuristic to
reorganize the memory in a memory management kernel?
In this chapter we presented the Programmable Vertex Pulling Rendering Pipeline,
which can render more complex scenes by significantly reducing the CPU overhead
caused by resource switching between draw calls. We detailed the possibilities
given by GPU batching and indexing in the two main parts of this approach: