Game Development Reference
4. Assign lights to each tile.
5. Render geometry and compute shading for each generated fragment.
Subdivision of screen. We use regular N
N pixel tiles (e.g., N = 32). Having
very large tiles creates a worse light assignment; each tile will be affected by
more light sources that affect a smaller subset of samples in the tile. Creating
very small tiles makes the light assignment more expensive and increases the
required memory storage—especially when the tiles are small enough that many
adjacent tiles are found to be affected by the same light sources.
Optional pre-Z pass. An optional pre-Z pass can help in two ways. First, it is
required if we wish to find the Z-bounds for each tile in the next step. Secondly,
in the final rendering pass it can reduce the number of samples that need to be
shaded through early-Z tests and similar hardware features.
The pre-Z pass should, of course, only include opaque geometry. Transparent
geometry is discussed in Section 4.5.
Though a pre-Z pass is scene and view dependent, in our tests we have found
that adding it improves performance significantly. For instance, for the images in
Figure 4.1(a), rendering time is reduced from 22 . 4 ms (upper view) and 37 . 9ms
(lower view) to 15 . 6msand18 . 7 ms, respectively.
Optional minimum or maximum Z-bounds. If a depth buffer exists, e.g., from the
pre-Z pass described above, we can use this information to find (reduce) the ex-
tents of each tile in the Z-direction (depth). This yields smaller per-tile bounding
volumes, reducing the number of lights that affect a tile during light assignment.
Depending on the application, finding only either the minimum or the max-
imum bounds can be sucient (if bounds are required at all). Again, trans-
In conjunction with the pre-Z test above, the minimum or maximum reduction
yields a further significant improvement for the views in Figure 4.1(a) . Rendering
time with both pre-Z and minimum or maximum reduction is 10 . 9 ms (upper) and
13 . 8 ms (lower), respectively—which is quite comparable to the performance of
tiled deferred shading (8 . 5msand10 . 9 ms). The reduction itself is implemented
using a loop in a fragment shader (for simplicity) and currently takes about
0 . 75 ms (for 1 , 920
1 , 080 resolution).
Light assignment. Next, we must assign lights to tiles. Basically, we want to
eciently find which lights affect samples in which tiles.
This requires a few
choices and considerations.
In tiled shading, where the number of tiles is relatively small (for instance,
aresolutionof1 , 920
32 tiles yields just about 2,040 tiles), it
can be feasible to do the assignment on the CPU. This is especially true if the
1 , 080 with 32