Game Development Reference
In-Depth Information
Element
Resource Type
Image dimensions
Constant buffer
Computation dimensions
Constant buffer
Output block size
Constant buffer
Source texture
Shader resource view
Quantization table
Shader resource view
AC Huffman table
Shader resource view
GPU output result
Unordered access view
Tab l e 2 . 1 . Shader resources used in different instances.
2.2.3
GPU Initialization
Some Direct3D resources can be created at encoding system initialization, while
others have to be created based on encoding parameters. The necessary resource
types used are either textures, constant buffers ,or structured buffers that utilize
cache functionality. Constant buffers are used when multiple threads read the
same data. Structured buffers are used when multiple threads read different data.
The output buffer size is calculated based on the JPEG quality and the chroma
subsampling mode, since lower chroma resolution results in fewer computation
blocks and less memory to be copied from the GPU to the CPU. Textures and
structured buffers are accessed as shader resource views, and the output buffer
is accessed as an unordered access view. One compute shader instance is created
per color plane, and the shader resources in Table 2.1 are used in each instance.
2.2.4
Execution
Figure 2.1 illustrates a generic encoding process where only some of the steps are
suited for GPU processing. The execution path used in this technique involves
both GPU and CPU processing. The GPU is used to compute a majority of en-
coding steps, and the CPU is used to stitch together final JPEG data. Encoding
can take place when all required resources are created. The input data to each en-
code invocation includes image width, image height, source-image resource, data
block size, and JPEG quality. Quantization tables are calculated based on JPEG
quality, while computation dimensions are calculated from image dimensions and
subsampling mode. Figure 2.6 illustrates an execution of a single thread group,
where the final output is one DC coecient, a bit-stream of entropy coded AC
coecients, and the number of bits occupied by the AC data.
Dispatch thread groups. For each color plane, thread groups are spawned by call-
ing the device context method Dispatch ; see the C++ code in Listing 2.2. The
method takes three input parameters that specify how many thread groups to
spawn in each dimension. Each thread group consists of 64 threads, 8 threads
 
Search Nedrilad ::




Custom Search