Game Development Reference
float Get_YCbCr_Component_From_RGB(float3 RGB)
return dot(RGB,float3(0.299,0.587,0.114)) * 255.0f;
return dot(RGB,float3(-0.168736,-0.331264,0.5)) * 255.0f + 128.0f;
return dot(RGB,float3(0.5,-0.418688,-0.081312)) * 255.0f + 128.0f;
Listing 2.4. RGB to YC b C r
ter finished execution, each group has computed one 8
8 block of partial JPEG
data that is later processed by the CPU. One Dispatch method call per color
plane is invoked. All relevant shader resources are set before each invocation.
Compute color transform and chroma subsampling. Source texture data is sampled
using the clamp addressing mode; see details in Section 2.2.2. Depending on the
chroma subsampling mode, each thread group may simulate a different group
size by sampling multiple texture elements per thread. When encoding using, for
example, 4:2:0 subsampling, each thread block acts over a 16 × 16 pixel block
by having each thread sample four pixels each. Sampled RGB color values are
converted to YC b C r color space using Equation (2.1). Listing 2.4 shows how this
conversion is done using HLSL, where defines are used to differentiate between
the shaders for the different color planes.
Converted values are averaged and
finally rescaled to the range [0 , 255].
Compute forward discrete cosine transform. Before DCT is applied, color values
8 color matrix M .TheDCTmatrix
multiplications, as in Equation (2.3), are computed in parallel [Kirk and Hwu 10].
Each thread calculates a matrix element result by adding corresponding row
and column element multiplications together. To avoid data dependency errors,
threads are synchronized before the second multiplication takes place. The HLSL
implementation is listed in Listing 2.5.
Compute quantization. After DCT computation, the resulting matrix is quan-
tized. Quantization is computed by dividing each DCT coecient by a cor-
responding quantization table element. The resulting floating-point value is
ure 2.4 , to an integer array. See the quantization HLSL code in Listing 2.6.
Calculation of preceding zeros. AC coecients are, for any chroma subsampling
and quality level, entropy coded as described in Section 2.1.6. To comply with the
JPEG standard, run-length zeros are counted. Scan primitives provide a method