Game Development Reference
In-Depth Information
of the hardware uses a wider filter. This is hardly objectionable, since custom
resolve is also required in order to perform tone mapping on the samples before
the resolve, as required for high-quality antialiasing. We should also note that
if wider reconstruction filters are desirable, they can be used in the luminance
channel only, which is perceptually the most important.
4.6
Blending
Since all operations in the RGB to YC o C g transform are linear, blending can
be performed directly in the YC o C g color space. Therefore, our method directly
supports hardware framebuffer blending, without any modification. This is par-
ticularly the case when rendering to floating-point render targets, but fixed-point
rendering requires some attention.
When rendering to unsigned fixed-point render targets, as noted in Section 4.2 ,
we have added a bias of 0.5 in the chrominance values, in order to keep them pos-
itive, since these buffers do not support signed values. This does not create any
problems with traditional alpha blending, since the bias will always remain 0.5
and can be easily subtracted when converting back to RGB. Nevertheless, when
using other blending modes, such as additive blending , the bias will be accumu-
lated and will create clamping artifacts. Furthermore, when additive blending is
used to accumulate N fragments, we should subtract 0 . 5 N from the chrominance
in the framebuffer, but in many cases N is either unknown or dicult to compute.
One possible solution is to perform the blending operation inside the shader, in
the correct [
0 . 5 , 0 . 5] range, by reading and writing to the same render target
(using the texture_barrier extension), but this approach is limited to specific
platforms and use cases. However, this limitation only concerns certain blending
modes on unsigned 8-bit render targets. High-quality rendering usually requires
HDR formats, which are trivially handled by our method.
4.7
Performance
Before providing any GPU measurements, it is worth investigating the theoretical
gains from our method. For a visible fragment, the GPU has to read the old 32-bit
depth value from the Z-buffer in order to perform the depth test, and then it has
to write back the new depth and color information. When blending is enabled,
the old color should also be fetched. Based on this theoretical analysis, we can
calculate that, for a 16-bit half-float render target, our technique reduces the
bandwidth consumption by 25% without blending and by 33% when blending is
enabled. We should note that all our measurements and analysis include a depth
buffer, since in practice rasterization without a Z-buffer is not very common.
To examine how these theoretical gains translate in practice, we have mea-
sured the fill rate during rasterization. The fill rate measures how fast the GPU
 
 
Search Nedrilad ::




Custom Search