Game Development Reference

In-Depth Information

not perform well on the GPU. Instead, sorting networks, such as bitonic merge

sort, are preferred because of their parallel nature [Kipfer and Westermann 05].

However, the drawback of sorting networks is that they require lots of passes to

complete the sorting. Recently, the functionality of the GPU has made it possible

to implement radix sort, which requires fewer passes [Grand 07].

Although the radix sort runs quickly on the GPU, the sorting cost of the radix

sort is prohibitively expensive for the sole purpose of improving cache efficiency.

Actually, it took more than the computation of one step on DEM simulation in

our experiment. So it does not meet our goal. There are several sorting algo-

rithms suited for a nearly sorted list, such as insertion sort. They are good for

situations with temporal coherency between frames, like our simulation. But the

problem here is that an insertion sort is a completely sequential algorithm, which

is not good for multiple processors, such as GPUs. But what we want is not a

completion of a sort in a frame because the sort is used just to increase the spatial

coherency of the data. Even if a sort in a time step improves the order of the lists

more or less, it would improve the cache efficiency.

7.4.4 Block Transition Sort

Among sorting networks, we have chosen an odd-even transition sort, a sorting

network that completes a sort by repeating two simple operations: comparing ad-

jacent odd-even index pairs and flipping them if they are in the wrong order, then

comparing adjacent even-odd index pairs. If blocks with an arrow in Figure 7.9

are thought of as two adjacent elements, it shows how the sorting works. Odd-

even transition sort is good for a nearly sorted list but is pretty poor when it is

applied to a random list. If only two adjacent elements are flipped, it can com-

plete the sort in one or two steps. But if they are arranged in the reverse order, it

takes
n
steps to move them to the correct order.

1st pass

2nd pass

3rd pass

4th pass

Figure 7.9.
Block transition sort. An array is divided into blocks, and two adjacent blocks

are sorted in a pass.