Game Development Reference
Figure 3.5. Aliased scene before applying SDAA.
Figure 3.6. Scene antialiased with SDAA.
Figure 3.7. Zoomed pixels before SDAA.
Figure 3.8. Zoomed pixels after SDAA.
buffer may incur a sizeable overhead. The addition of a pre-Z pass may, of course,
also improve performance. This is all very dependent on the application, so it is
impossible to give a universal statement about performance. However, the sample
application was profiled using GPU PerfStudio 2, from the default start position
using an AMD Radeon HD5870 at a 1,920
1,080 resolution. The results are
presented in Table 3.1.
The overhead of SDAA consists of the final resolve pass and the cost of copying
the depth buffer after the pre-Z pass. The cost of the depth buffer copy is hard to
do much about, but the resolve pass can be optimized. The main bottleneck in
the standard DX10 implementation is texture fetches. GPU PerfStudio reports
the texture unit being busy 97.5% of the time. Using Gather() in DX10.1, we can
significantly reduce the number of fetches required, from 17 to 9. This brings the
cost of the resolve pass down from 0.33 ms to 0.26 ms, and the total overhead of
SDAA down from 0.51 ms to 0.43 ms. The texture unit is now 75% busy.
SDAA off 1.736 ms 1.736 ms
SDAA on 2.255 ms 2.180 ms
Depth buffer copy 0.176 ms 0.176 ms
Resolve pass 0.335 ms 0.256 ms
Tab l e 3 . 1 . GPU times on an AMD HD5870 at 1,920