Game Development Reference
Practical Framebuffer Compression
Pavlos Mavridis and Georgios Papaioannou
In computer graphics, the creation of realistic images requires multiple samples
per pixel, to avoid aliasing artifacts, and floating-point precision, in order to prop-
erly represent the high dynamic range (HDR) of the environmental lighting. Both
of these requirements vastly increase the storage and bandwidth consumption of
the framebuffer. In particular, using a multisample framebuffer with N samples
per pixel requires N times more memory. On top of that, the usage of a 16-bit
half-float storage format doubles the memory and bandwidth requirements when
compared to the 8-bit fixed-point equivalent. As an example, a 1080p framebuffer
MSAA requires 189 MB of memory when using half-float precision for
the color and a 32-bit Z-buffer.
The total framebuffer memory can further increase when using algorithms
that store multiple intermediate render buffers, such as deferred rendering or
when simply rendering at very high resolutions in order to drive high-density
displays, which is a rather recent trend in mobile and desktop computing. The
same is also true when driving multiple displays from the same GPU, in order
to create immersive setups. All of these factors vastly increase the consumed
memory and put an enormous stress on the memory subsystem of the GPU.
This fact was recognized by the hardware manufacturers and most, if not all,
of the shipping GPUs today include a form of lossless framebuffer compression.
Although the details of these compression schemes are not publicly disclosed,
based on the performance characteristics of the GPUs, it is rather safe to assume
that these algorithms mostly exploit the fact that a fragment shader can be
executed only once per covered primitive and the same color can be assigned
to many subpixel samples. It is worth noting that according to the information
theory, there is no lossless compression algorithm that can guarantee a fixed-rate
encoding, which is needed in order to provide fast random access; therefore, these
algorithms can only save bandwidth but not storage.