Game Development Reference
In-Depth Information
We reach the primitive peak rate with about 320 primitives per draw when
using multi draw indirect. This gives us the opportunity to use tessellation where
triangle complexity is low, ensuring that we hit the peak primitive rate. Tessel-
lation is not only a great tool to add geometric details; it is also a great tool to
ensure that the pixel per primitive rate remains constant.
2.5.2 Memory Repacking
When rendering dynamic scenes, some objects will need to be created and deleted
during the lifetime of the program execution. When using the batching approach,
meshes must be added and deleted from an existing set of buffers, which typically
leads to some level of memory fragmentation. There are multiple approaches to
repack the memory and avoid wasting priceless graphics memory:
The application can rely on glCopyBufferSubData and glCopyImageSubData
to fill empty space in buffers and textures. If the granularity of the data is
too thin, then the application would need to make more subdata CPU calls
and thus create too much CPU overhead.
The application can use an OpenCL kernel to move the data around. Such
a kernel will probably underutilize the GPU's ALUs. However, because
AMD Southern Islands allows us to run one graphics ring and two compute
rings in parallel, such kernel execution could be hidden by other shader and
kernel executions.
The application can rely on the virtual memory capability of GPU using
AMD_sparse_buffer and AMD_sparse_texture to manage the memory pages
that need to be allocated or not. This relies on memory addressing to avoid
moving the data while still effectively using the graphics memory.
2.6
Future Work
At the time of this chapter's writing, the Programmable Vertex Pulling Rendering
Pipeline remains a work in progress considering that OpenGL drivers are sub-
optimized for this purpose. Many API improvements could strengthen this design
for post-OpenGL 4 hardware:
A built-in gl_DrawID would allow us to remove the need for vertex attributes
and hence for vertex array objects. All the setup, mostly a CPU overhead
and the bandwidth needs, would be avoided and replaced by a simple com-
mand processor counter.
Currently when using programmable vertex fetching, we are basically losing
the capabilities of certain draw call parameters: base vertex, base instance,
and the offset to the first element or vertex. All those parameters are used to
 
Search Nedrilad ::




Custom Search