Game Development Reference
Figure 7.15. A screenshot from a real-time simulation using 500,000 particles on four
GPUs. The simulation domain is split by planes perpendicular to the x -axis. The different
particle colors show on which GPU they are calculated (see Color Plate IX).
In this chapter, we have discussed techniques to use multiple processors with dis-
tributed memory for a particle-based simulation. The performance of the method
scales well to the number of processors when particles are distributed evenly on
each computation domain. However, the performance is not good when the parti-
cle distribution is not uniform because it uses a fixed decomposition of the com-
putation domain. Dynamic load balancing is something to be considered in future
work, but it would be possible by using the data calculated for the sliced grid
because it has histograms of the particle distribution in the sliced direction.
[Grand 07] S. L. Grand. “Broad-Phase Collision Detection with CUDA.”” In
GPU Gems 3 , edited by Herbert Nguyen, pp. 697-722. Reading, MA:
[Harada et al. 07] T. Harada, S. Koshizuka, and Y. Kawaguchi. “Slided Data
Structure for Particle-based Simulations on GPUs.” In Proc. of GRAPHITE ,
pp. 55-62. New York: ACM, 2007.
[Harada 07] T. Harada. “Real-Time Rigid Body Simulation on GPUs.” In GPU
Gems 3 , edited by Herbert Nguyen, pp. 611-632. Reading, MA: Addison-
[Harris et al. 07] M. Harris, S. Segupta, and J.D. Owens. “Parallel Prefix Sum
(Scan) with CUDA.” In GPU Gems 3 , edited by Herbert Nguyen, pp. 851-
876. Reading, MA: Addison-Wesley, 2007.