Game Development Reference
Tab l e 4 . 1 . The time in milliseconds for blitting a 720p render target to the GPU back
fill-rate increase, indicating that the 8-bit two-channel format ( GL_RG8 ) is handled
internally as a four-channel format ( GL_RGBA8 ).
For the measurements above, we have used a typical fill-rate benchmark ap-
plication that renders large visible quads on a render target. For the compressed
case, the application also performs the compression and decompression of the
render target. The results of course will be different on other GPU architectures,
but generally we can expect that the performance increase due to the bandwidth
reduction outweighs the small increase in the ALU instructions used to encode
and decode the color. Although many applications are shader or geometry lim-
ited, an increase in the fill rate is always welcomed, especially when rendering
particles and other fill-rate-intensive content.
Another operation worth investigating is the time it takes to uncompress and
copy (blit) a compressed render buffer to the GPU back buffer. This operation is
performed in our tests by rendering a full screen quad that uses the render buffer
as a texture. When using a half-float format, resolving a compressed 720p render
buffer takes 0.19 ms, which is 25% faster than the uncompressed case. Table 4.1
has the complete measurements for the other precisions too. As noted before,
our desktop GPU does not internally support a two-channel 8-bit format, thus in
this case our measuements show only the decompression overhead.
Of course, aside from any performance and bandwidth gains, the memory
saved by our method can be used to improve other aspects of an application,
such as the textures or the meshes.
As an example, an uncompressed 1080p
render target with 8
MSAA requires 189 MB of storage at 16-bit half-float
precision, while with our method it requires only 126 MB. Both numbers include
the Z-buffer storage.
Conclusion and Discussion
In this chapter we have presented a practical framebuffer compression scheme
based on the principles of chrominance subsampling. Using our method, color
images can be rasterized directly using only two framebuffer channels. The mem-
ory footprint of the framebuffer is reduced along with the consumed bandwidth,
and as a consequence the fill rate of the GPU rasterizer is significantly increased.