Game Development Reference
In-Depth Information
Data structure suitable for the SPU. Due to the limitation of the memory ca-
pacity of SPUs, the available space for data is very limited; using local storage
limits the amount of data. Therefore, it is best to store all data in main memory
and have the SPU transfer only the data necessary for a calculation to the local
storage using DMA at runtime. When processing is completed, the SPU returns
data to the main memory. To use DMA transfer, it is necessary to align data at
a 16-byte boundary; nonaligned data cause DMA alignment-error interrupts. For
optimal performance, transfer efficiency rises if data are aligned at a 128-byte
In addition, because accessing the main memory is slower than accessing the
local storage, it is important to reduce the number of DMA transfers. The effec-
tive way to reduce the transfer number is to store all the information necessary
for a single process in a single data structure. Many DMA transfers are needed
to process a structure where all the necessary information is linked with many
pointers, in which case performance would be slower (see Figure 3.3 ) .
Hiding the latency of DMA transfer. SPUs can hide the latency of DMA trans-
fer by executing computation and DMA transfer simultaneously. Using the mech-
anism of double buffering, SPUs can process the computation using one buffer
while at the same time the other buffer is used for DMA transfer. A structure con-
nected with pointers, such as a tree or a linked list, is not recommended because,
to use double buffering effectively, we need to know an address of data in main
memory before it is used. If we use a tree or a linked-list structure, we can only
know the next data linked with the current data. If possible, use a simple array as
a substitute for a linked structure.
Figure 3.4 shows the mechanism of double buffering. The data in an array
in main memory are transferred into the local storage ( Get ). After the data are
calculated ( Calc ), they returned to main memory ( Put ) in order. Two buffers in
the local storage are used in turn to store the data—one is used for transfer, while
the other is used simultaneously for calculation. In this way, accessing the main
memory and performing calculations are executed in parallel in each step.
Execute tasks in parallel. As described in the previous section, we can largely
improve computation performance by assigning tasks to multiple SPUs and pro-
cessing them in parallel. If there are dependencies between data, SPUs must
control the order of processing tasks with various synchronization mechanisms.
But the cost of synchronization is not zero. So before starting tasks, it is better to
gather the data without dependencies.
Search Nedrilad ::

Custom Search