Game Development Reference
Sort primitives. Our current implementation uses a modified version of the bitonic
sort included in the DirectX 11 software development kit (SDK) [Microsoft 10].
According to [Satish et al. 09], radix sort currently seems to be the fastest sorting
algorithm on GPU platforms. However, it is not within the scope of the current
chapter to discuss the advantages of radix sort against bitonic sort. Bitonic sort
allows the current application to build models as big as 2 18 (262 , 144) primitives.
A proper sorting implementation would allow the building of bigger models
from scratch every frame. The algorithm sorts the primitives in ascending order
based on the Morton code computed on the previous stage. For bigger models,
the framework uses a CPU-based BVH.
Build the tree. On the third step, the tree is built in a top-down fashion. Each
thread computes the data of two nodes, left and right. Since two children share
the same parent, race conditions are avoided. The number of passes is equal to
the depth of the tree because each pass computes the nodes of one level. It is easy
to see that the first levels are poorly parallelized, which represents a serious issue
on hierarchical structures. The algorithm stores three types of nodes: invalid,
leaf, and internal nodes. For invalid nodes, the algorithm checks the parent of
the current nodes. If PrimCount is negative, it means that the parent is invalid.
Since an invalid node cannot contain valid nodes underneath it, both children are
marked as invalid. When the i th node is less than 2 D − 1 , the node is internal. If
exceptions to this rule. Internal nodes may be marked as leaf nodes and their
children would be marked as invalid (which will never be visited) when
1. a branch finishes all possible geometry partitions before reaching the max-
imum depth of the tree, D (in this case, the primitives are stored in an
internal node, and the node is marked as a leaf node);
2. two or more primitives have the same Morton code (in this case, all primi-
tives with the same Morton code are stored in the same node; if the node is
an internal node, then the same process described previously is followed).
The first exception improves traversal performance by 20% because it avoids
unnecessary comparisons on deeper branches. In this way, the SLBVH emulates
an adaptive BVH by storing primitives in internal nodes. These internal nodes
have all the properties of a leaf node. The second exception has the same advan-
tage. If two primitives have the same Morton code, then in the end, they will be
stored on the same node. Instead of waiting until the last level of the tree, the
primitives are stored on a node as soon as possible.
Leaf nodes are also created when computing the nodes on the last level of the
tree, as shown in Listing 1.3. All remaining primitives are stored on either the
left or the right child. This is a disadvantage due to memory constraints. On
large models, in order to decrease the number of comparisons per ray, the depth