Game Development Reference
void ps_main(PS_IN input)
// Allocate a new element in the list element buffer by
// atomically incrementing the buffer counter.
int counter = listElementBuffer.IncrementCounter();
// Store the required information and apply a depth bias.
listElementBuffer[counter].depth = input.posToShader.z + bias;
listElementBuffer[counter].alpha = input.alpha;
//pixel screen coordinate to buffer index
int index = ( int )(input.pos.y * Width + input.pos.x);
// Atomically exchange the element in the head buffer.
InterlockedExchange(headBuffer[index], counter, originalVal);
// Create the link to the existing list.
listElementBuffer[counter].next = originalVal;
Listing 1.2. Concurrent way of adding a new fragment to the list element buffer and
updating the head buffer in a pixel shader.
elements in parallel. The pixel shader for filling both head buffer and list element
buffer with a new incoming fragment is shown in Listing 1.2.
1.3.2 Processing the Fragments
Note that up to now the list entries neither are sorted nor contain the final
transmittance. These processing tasks are the purpose of this step. The sorting
of all fragments with respect to their depth is done in a separate compute shader
(executed for every pixel). We load a single linked list per pixel into a local array
and do a local sort, which does not require any shared memory. Since neither
compute shaders nor OpenCL support recursion yet, a sorting algorithm like quick
sort is not very well suited to the GPU architecture. Instead we use insertion sort
due to its simplicity and because it is known to be very fast on small arrays (as is
the case for models with reasonable depth complexity). This fact was confirmed
in our experiments, where insertion sort yielded about two times the performance
of a nonrecursive version of quick sort. Next we convert the alpha values in the
linked list into a transmittance function according to Equation (1.1). This means
that we pre-multiply transmittance for each of the depths d i from the linked list
and store the transmittance at each fragment instead of the alpha values. Note
that this step is done to accelerate spatial coherent lookups for filtering later on.
Here, having to traverse the list from the head to reconstruct the transmittance
for each filter sample is exactly what we want to avoid. Furthermore, the prev
links are stored in this step of the algorithm in order to create a double-linked