Game Development Reference
right ear before it reaches the left ear. The time delay between the sound reaching each
ear is called the interaural delay . We can approximately calculate the delay from a sound
coming from the side by taking the distance separating the ears and dividing it by the
speed of sound. In air and for a typical head size, that delay is around half a millisecond.
The delay will be shorter depending on the orientation of your head with respect to the
sound source. Whatever the delay is, our brains use that information to help determine
the location from which the sound is coming.
Additionally, as the sound coming from the right in Figure 26-9 reaches the head, some
of the energy is reflected off the head. Reflections also occur off the shoulders and torso.
Further, as the sound waves pass the head they tend to bend around it. Higher-frequency
waves tend to get blocked by the head, and lower-frequency waves tend to pass by with
little interruption. The resulting sound in the shadow region behind the head is some‐
what different than the source due to the effective filtering that has occurred via inter‐
action with the head. Also, notice that the orientation of the ears with respect to the
sound source is different, and sound waves will interact with the ear and ear canal
differently due to this differing orientation.
If the sound is coming from above or below the person in addition to being offset
laterally, the sound will reflect off and diffract around different parts of the body in
Considering all these interactions, it would seem that the sound we end up hearing is
quite different from the pure source sound. Well, the differences may not be that dra‐
matic, but they are sufficient to allow our brains to pick up on all these cues, allowing
us to locate the sound source. Given that we are all different shapes and sizes, our brains
are tuned to our specific bodies when processing these localization cues.
It would seem that including believable 3D sound is virtually impossible to achieve in
games given the complexity of sounds interacting with the listener. Certainly you can't
model every potential game player along with your game sounds to compute how they
interact with each other. That said, one approach to capturing the important localization
cues is to use what are called head-related transfer functions (HRTFs).
If you were to place a small microphone in each ear and then record the sound reaching
each ear from some known source, you'd have what is called a binaural recording . In
other words, the two recordings—one for each ear—capture the sound received by each,
which, given all the factors we described earlier, are different from each other. These
two recordings contain information that our brains use to help us localize the source
Now, if you compare these binaural recordings by taking the ratio of each to the source
sound, you'd end up with what's called a transfer function for each ear. (The math is
more complicated than we imply here.) These are the HRTFs. And you can derive an
HRTF for a sound located at any position relative to a listener. So, the binaural recordings