Game Development Reference
InDepth Information
p
i
. It is calculated as the summation of the number of voxels from the first slice
S
0
to slice
S
iā
1
. Thus,
p
i
=
i<j
n
j
. Taking the prefix sum of the number of
voxels in the slices gives us the indices of the first voxels.
We are now ready to store the data in the grid. The index of the voxel to
which a point (
x, y, z
) belongs is calculated in two steps. The first step is the
computation of the slice the point is on. It can be calculated by
i
=[(
b
x
B
min
)],
where
B
min
is the minimum coordinate of the slices in the
x
direction. By using
the index of the slice, the first voxel of the slice stored in the table calculated in
the preprocessing step is read. From the index and Equation (7.3), the index of
the voxel is calculated as follows:
v
(
x, y, z
)=
p
i
+
y
ā
.
B
i,
min
d
B
i,
min
d
ā
+
z
ā
n
i
Of course, we can push this slicing concept to another dimension to remove
more empty voxels, i.e., by slicing in the
x
direction before slicing in the
y

direction. However, this is a tradeoff between memory saving and computation;
it adds much more overhead for realtime applications.
Implementation on the GPU.
Before storing particle indices to memory, the
bounding box and the first index of the voxel in each slice have to be calculated.
Although these computations are trivial on a sequential processor, it requires some
effort to perform these on multiple processors. The GPU implementation is ex
plained in the following paragraphs.
Calculating the bounding box and the first voxel index of every slice is per
formed in several steps. The first step is the computation of the bounding boxes
in which memory will be allocated. The grid coordinate calculated from the par
ticle position is inserted in the bounding box of the slice on which the particle
is located. Although the flexibility of current GPUs makes the serial version of
the computation possible on the GPU, it cannot exploit the power of the GPU.
For efficiency reasons, the computation is divided into two stages. The particles
are divided into several sets, and bounding boxes for these sets are computed in
parallel. If there are
m
slices and the particles are divided into
n
sets,
n
sets of
m
bounding boxes are calculated in parallel. (Of course, this is also effective on
other multiple processors.) Then, the results are merged into a set of bounding
boxes. This reduction step can also be parallelized on the GPU.
Here we assume that the
x
axis is taken as the slicing axis. Then, what we
have to compute are
B
i,
max
,B
i,
min
,B
i,
max
,and
B
i,
min
, which are the maximum
and the minimum values for the
y
and
z
directions on
i
th slice of the
x
direction.
Let
n
and
m
be the total number of particles and the number of the small com
putations (we will call them jobs from now on). The
i
th job is responsible for