Game Development Reference
No modification is necessary for the Z prepass, so we do not describe its implemen-
tation. The light-culling stage can be implemented in several ways thanks to the
flexibility of current GPUs. Specifically, direct compute and read-writable struc-
ture data buffers or UAVs are the key features to utilizing Forward+. In this
section, we first describe which features of DirectX 11 are essential to making
Forward+ work well on modern GPUs. Then we explain a light-culling imple-
mentation that works well for a scene with thousands of lights. If there are more
lights, we might be better off considering other implementations such as those de-
scribed in [Harada et al. 11]. This section concludes by describing modifications
for final shading.
5.3.1 Gather-Based Light Culling
During light culling, the computation is done on a by-tile basis. Therefore, it is
natural to execute a thread group for a tile. A thread group can share data using
thread group shared memory (called shared memory from now on), which can
reduce a lot of redundant computation in a thread group. The computation is
identical for each tile; therefore, we explain the computation for a single tile.
The compute shader for light culling is executed as a two-dimensional (2D)
work group. A thread group is assigned a unique 2D index, and a thread in a
thread group is assigned a unique 2D index in the group.
In the pseudocode in this subsection, the following macros are used for these
GET_GROUP_IDX : thread group index in X direction ( SV_GroupID );
GET_GROUP_IDY : thread group index in Y direction ( SV_GroupID );
GET_GLOBAL_IDX : global thread index in X direction ( SV_DispatchThreadID );
GET_GLOBAL_IDY : global thread index in Y direction ( SV_DispatchThreadID );
GET_LOCAL_IDX : local thread index in X direction ( SV_GroupThreadID );
GET_LOCAL_IDY : local thread index in Y direction ( SV_GroupThreadID ).
The first step is computation of the frustum of a tile in view space. To reconstruct
four side faces, we need to calculate the view-space coordinates of the four corner
points of the tile. With these four points and the origin, four side planes can be
// construct frustum
v=projToView(8*GET_GROUP_IDX , 8*GET_GROUP_IDY ,1.f) );
v=projToView(8*(GET_GROUP_IDX+1), 8*GET_GROUP_IDY ,1.f) );