3.4.2 Optimize Solver Calculation

The easiest way to convert the solver calculation for parallel computation is to

break constraints into small independent batches without sharing rigid bodies be-

tween batches. It is too complicated to divide all constraints at the same time.

However, it is easy to create small independent batches. Then we gather these

batches into a group. Independent batches in a group can be processed by SPUs

in parallel. We continue to create groups in the same way until all constraints are

assigned.

Figure 3.14 shows some groups containing batches that can be executed in

parallel. After the calculation of batches in a group is completed, we synchronize

all SPUs and continue to calculate batches for the next group until all groups

are completed. However, the cost of synchronization is not free. As the number

of groups increases, the cost of synchronization will also increase. However,

Cell/BE has a mechanism by which it can operate synchronization between SPUs

without PPU operation. The cost of the SPU synchronization is low enough when

the number of synchronizations is not too large.

Double buffering with two phases.
In using the constraint solver, we need to be

careful about DMA transfer when double buffering. As described in the previous

section, each SPU calculates an assigned batch. But each constraint in a batch has

data dependencies if the constraints share rigid bodies. In a worst case, the Put

and Get DMA operations for the same data occur at the same time; then double

buffering causes an irregular result.

Moreover, we need a constraint and two related rigid bodies to calculate one

constraint. The structure of a constraint has links to related rigid bodies. So

we have to get a constraint first, then we can get two related rigid bodies before

calculation. But such a dependency causes a disabling of double buffering.

Figure 3.14.
Constraints assigned to batches.