Game Development Reference
Figure 1.4. The Stanford Bunny (left, 94 fps), Welsh Dragon (center, 31 fps), and
Happy Buddha (right, 49 fps) models rendered in real time using the SLBVH. The
application uses gloss mapping, phong shading, and one ray for shadows with one light
The SLBVH in Action
The tests were executed on an NVIDIA GTX 590 but using just one of its GPUs.
The models compared are the Stanford Bunny (69,451 primitives), Crytek Sponza
(279,163 primitives), Conference Room (282,759 primitives), and Welsh Dragon
(2,097,152 primitives). The data is normalized and scaled to fit within a sphere
of radius 0.8. An average of six different camera positions are used to measure
1 to launch a total
of 1,048,576 threads in the compute shader. The resolution is set to 1 , 024
1, and the group size is 16
1 , 024
pixels where each thread computes the color of one pixel. The tests shown in
Figure 1.5 were executed using shadows, phong shading, and gloss mapping.
Reflections were not activated on benchmarks. Shadow rays are not coherent,
which heavily impacts on performance. When deactivating shadows, the frame
is computed up to two times faster. However, a ray reordering [Garanzha and
Loop 10] scheme could improve rendering times.
A BVH using a surface area heuristic (SAH-BVH) is used to compare traversal
performance with our SLBVH. The BVH is based on the one provided on the
PBRT framework [Pharr and Humphreys 04] and it is built on the CPU.
Traversal Frame Rate
The rendering times are shown in Figure 1.5. Three models are traversed using an
SLBVH and a stack-based SAH-BVH. On small models with constant primitive
sizes, the SLBVH is as fast as a stack-based SAH-BVH. Since the algorithm is
not using a stack, cache memory usage is significantly lower than a BVH with
higher tree quality. However, on larger models such as the Welsh Dragon, a high
number of primitives occupy the same leaf node, which decreases rendering time.