Game Development Reference
Figure 5.3. Breakdown of the computation time for three stages of Forward+ and
deferred on an AMD Radeon HD 6970 GPU and an AMD Radeon HD 7970 GPU.
Light processing. Forward+ reads the depth and light geometry buffers. Deferred
also reads them, but the float4 buffer storing normal vectors and lighting prop-
erties has to be read as well because lighting is done at this stage. Therefore,
Forward+ has less memory read compared to deferred.
As for the amount of the computation, Forward+ culls lights. On the other
hand, deferred not only culls lights but also performs lighting computation. For-
ward+ has less computation.
For the memory write, Forward+ writes light indices, the sizes of which de-
pend on the scene and tile size. If 8
8 tiles are used, deferred has to write 8
4bytesifa float4 data is written for each pixel.
With this data size,
Forward+ can write 256 (8
4) light indices for a tile; if the number of
lights is less than 256 per tile, Forward+ writes less. In our test scene, there was
no tile overlapped with more than 256 lights.
To summarize this stage, Forward+ is reading, computing, and writing less
than deferred. This is why Forward+ is so fast at this stage.
Final shading. It is obvious that Forward+ takes more time compared to deferred
at shading because it has to iterate through all the lights in the pixel shader.
This is a disadvantage in terms of the performance, but it is designed this way
to get more freedom.
Forward+ in the AMD Leo Demo
We created the AMD Leo demo to show an implementation of Forward+ in real-
time in a real-world setting. A screenshot from the demo is shown in Figure 5.4.
We chose scene geometry on the order of what can be found in current PC-based
video games (one to two million polygons). We also had the objective of rendering
with a unique stylized look that could be characterized as “CGish” in that it uses
material types that resemble those found in an oine renderer. There are more