Game Development Reference
In-Depth Information
as current hardware does not allow local on-chip memory usage in rasterization
mode, and the execution of fragment shaders is nondeterministic. Furthermore,
waiting for the result would mean a significant delay for a complex shader.
With our modification we can move the shader evaluation into a deferred
stage, which results in a more coherent fragment shader execution. While we
cannot avoid using the global memory to simulate the memoization cache, the
overhead of decoupled sampling is independent from the shading complexity. This
is the key difference that makes our algorithm feasible even for current GPUs:
if the shading computation is “expensive enough,” the constant overhead of our
caching implementation will be less than the performance gain of reduced shading.
Furthermore, we can utilize our CG-buffer to keep the memory footprint of the
shading data minimal.
Decoupling Shading Samples
We now discuss a method that implements the sampling stage of decoupled de-
ferred shading in a single rasterization pass. The first problem we need to solve
is how to assign shading samples to fragments. Prior to rasterization, each prim-
itive needs to be processed to determine its shading domain (see Section 3.3.1
in vec2 in_scrPos[]; // screen-space positions
flat out ivec4 domain; // shading grid of the triangle
flat out uint startID; // ID of the first sample in the sh. grid
uniform float shadingRate;
// global SSID counter array
layout(size1x32) uniform uimageBuffer uCtrSSID;
void main(){
// project screen position to the shading grid
vec2 gridPos0 = scrPos[0] * shadingRate; [...]
vec2 minCorner = min(gridPos0, min(gridPos1, gridPos2));
vec2 maxCorner = max(gridPos0, max(gridPos1, gridPos2));
// shading grid: xy-top left corner, zw-grid size
domain.x = int (minCorner.x) - 1;
domain.y = int (minCorner.y) - 1;
domain.z = int ((maxCorner.x)) - domain.x + 1;
domain.w = int ((maxCorner.y)) - domain.y + 1;
// we allocate the ssID range with an atomic counter.
uint reserved = uint((domain.z) * (domain.w));
startID = imageAtomicAdd(uCtrSSID, 0, reserved);
Listing 3.1. The geometry shader generates a shading grid for each triangle, and ensures
globally unique ssIDs using an atomic counter
Search Nedrilad ::

Custom Search