Game Development Reference
In-Depth Information
7.5 Data Transfer Using Grids
The sliced grid in which the computation domain is sliced by the x -axisisused
by the acceleration structure to search for neighboring particles. The data that
have to be sent to an adjacent processor are two contiguous slices when the side
length of voxels equals the particle diameter. Generally, the data to be sent is
smaller than using a uniform grid although efficiency depends on the distribution
of particles.
Sending the data between GPUs cannot be done directly at the moment. There-
fore, the data have to be sent via main memory in two steps: first, send the data
from a GPU to main memory, then the second GPU reads the data from main
memory. Because the neighbors of GPUs do not change, the destination of the
memory to which a GPU writes the data and the memory a GPU reads from is
defined at spatial decomposition. Figure 7.13 illustrates how this works when us-
ing four GPUs for a simulation. Each GPU computes a subdomain, and they each
have one or two ghost regions. After the computation of a time step, all the GPUs
send the data to the predefined location in main memory. GPUs at both ends write
particle data to one buffer and other GPUs write to two buffers. To make sure that
all the data are ready, all of the threads are synchronized after the send. Then the
reading from the defined memory location finishes the transfer. As you can see,
these threads run completely in parallel except for one synchronization per time
step.
Subdomain0
Subdomain1
Subdomain2
Subdomain3
Processor0
Processor1
Processor2
Processor3
Subdomain0
Subdomain1
Subdomain2
Subdomain3
Send
Main Memory
Receive
Figure 7.13. Overview of a simulation using four GPUs.
 
Search Nedrilad ::




Custom Search