Game Development Reference
Figure 2.5. A visualization of the skeletal information available from the Kinect.
2.2.3 Skeletal Tracking
As discussed earlier in the chapter, one of the biggest advances that the Kinect
provides is the ability to view a user with the sensing systems we have just
described and to discern where they are within the scene and what pose they
are holding. This is made possible by the Kinect by analyzing each pixel of the
depth frame and applying a decision tree algorithm to determine to which part
of a human body that pixel is most likely to belong [Shotton et al. 11]. All of this
work is largely hidden from the developer—we simply receive the benefit that we
know what the user's pose is in any give frame.
In general, the skeletal information that is available is quite similar to the
skeletal information that one would expect when rendering a skinned model [Fer-
nando and Kilgard 03]. (See Figure 2.5.) Each joint is represented by an absolute
position and optionally an orientation that describes that portion of the model.
In recent releases of the Kinect for Windows SDK, there is even support for dif-
ferent types of skeletons. For example, when a user is standing, it is possible to
obtain a full 20 joint skeleton. However, when a user is sitting it is also possible
to obtain a smaller skeleton that only includes a reduced subset of 10 joints cor-
responding to the upper body. This allows for a wide variety of usage scenarios
and gives the developer freedom to choose how to interact with the user.
Mathematics of the Kinect
In this section we will look at the mathematics required to interpret the various
camera spaces in the Kinect and to develop the needed concepts for matching
objects in each space together. As we have just seen, the Kinect has two different
camera systems, producing color and depth images. Both of these cameras can
be handled in the same manner, using the pinhole camera model. Understanding
this model will provide the necessary background to take an object found in one
of the camera images and then determine to what that object correlates in the