Game Development Reference
In-Depth Information
before Huffman coding is applied. Coecient symbols are sequentially variable
length encoded (VLE) and concatenated into a JPEG bit stream. Special marker
symbols are used in the following conditions:
1. for every preceding 16 zeros of an AC coecient,
2. when an additional 0x00 byte is appended directly after bytes that equals
0xFF,
3. to indicate end of block (EOB) or end of image (EOI).
Further entropy coding details are beyond the scope of this chapter.
2.2
Implementation
Based on the description in Section 2.1.1 , this section describes the implemen-
tation of a baseline JPEG encoder using DirectX 11.0 and Shader Model 5.0.
The encoder is designed and implemented using C++ and the Direct3D 11.0
API, making it trivial to use in an existing DirectX 11.0-based renderer. Shader
preprocessor directives are used in the example implementation to control the
output index and texture sampling behavior: this makes it possible to use the
same shader program for all YC b C r
components.
Each thread group consists
of 8
8 threads—see Listing 2.3 —and the number of dispatched thread groups
is based on the source-image dimensions and chroma subsampling mode.
×
The
following features of Direct3D are beneficial to encoding JPEG data:
full interoperability with all Direct3D resources,
execution of group threads synchronization,
computations' access to group shared memory,
read and write capabilities on resources via unordered access views ,
automatic bounds check when reading from or writing to shader resource
views and unordered access views,
atomic intrinsic functions,
source texture rescaling by hardware.
2.2.1 Performance Considerations
This encoder technique is designed and implemented to minimize global memory
accesses. Global accesses may considerably lower performance if overused, for
example, when the same data is processed in multiple dispatch calls. Therefore
only one dispatch call per color plane is invoked. Loops accessing memory are
manually unrolled to help the compiler. To maintain computation performance,
shared memory registers are reused. The total size of shared members is 32 KB
for data. Group shared memory is smaller in size than 16 KB and may therefore
 
Search Nedrilad ::




Custom Search