Game Development Reference
3.4.2 Optimize Solver Calculation
The easiest way to convert the solver calculation for parallel computation is to
break constraints into small independent batches without sharing rigid bodies be-
tween batches. It is too complicated to divide all constraints at the same time.
However, it is easy to create small independent batches. Then we gather these
batches into a group. Independent batches in a group can be processed by SPUs
in parallel. We continue to create groups in the same way until all constraints are
Figure 3.14 shows some groups containing batches that can be executed in
parallel. After the calculation of batches in a group is completed, we synchronize
all SPUs and continue to calculate batches for the next group until all groups
are completed. However, the cost of synchronization is not free. As the number
of groups increases, the cost of synchronization will also increase. However,
Cell/BE has a mechanism by which it can operate synchronization between SPUs
without PPU operation. The cost of the SPU synchronization is low enough when
the number of synchronizations is not too large.
Double buffering with two phases. In using the constraint solver, we need to be
careful about DMA transfer when double buffering. As described in the previous
section, each SPU calculates an assigned batch. But each constraint in a batch has
data dependencies if the constraints share rigid bodies. In a worst case, the Put
and Get DMA operations for the same data occur at the same time; then double
buffering causes an irregular result.
Moreover, we need a constraint and two related rigid bodies to calculate one
constraint. The structure of a constraint has links to related rigid bodies. So
we have to get a constraint first, then we can get two related rigid bodies before
calculation. But such a dependency causes a disabling of double buffering.
Figure 3.14. Constraints assigned to batches.