Game Development Reference
based on performance needs. For instance, we can change LIGHT_LOOP_BEGIN to
iterate a few lights on a slower platform.
An optimization we can do for the host side is to sort all render draw calls
by material type and render all triangles that belong to each unique material at
the same time. This reduces GPU state change and makes good use of the cache
because all pixels needing the same data will be rendered together.
We implemented Forward+ using DirectX 11 and benchmarked using the scene
shown in Figure 5.2 to compare the performance of Forward+ to compute-based
deferred lighting [Andersson 11].
In short, Forward+ was faster on both the AMD Radeon HD 6970 and HD
computing, it makes sense. Three timers are placed in a frame of the benchmark
to measure time for prepass, light processing, and final shading. In Forward+,
these three are depth prepass, light culling, and final shading. In compute-based
deferred, they are geometry pass (or G-pass), which exports geometry information
to full screen buffers, light culling, screen-space light accumulation, and final
Prepass. Forward+ writes a screen-sized depth buffer while deferred writes a
depth buffer and another float4 buffer that packs the normal vector of the visible
pixel. The specular coecient can be stored in the W component of the buffer,
too. Therefore, Forward+ writes less than deferred and is faster on prepass.
Figure 5.2. A scene with 3,072 dynamic lights rendered in 1,280 × 720 resolution.
(a) Using diffuse lighting. (b) Visualization of number of lights overlapping each tile.
Blue, green and red tiles have 0, 25, and 50 lights, respectively. The numbers in between
are shown as interpolated colors. The maximum number is clamped to 50.