Game Development Reference
In-Depth Information
Section 5.3). In such cases, all we can hope for is an e cient algorithm which
approximates val f ( v ) up to an arbitrarily small given precision.
Question (4) also requires special attention in the case of infinite-state
games, because then even MD strategies do not necessarily admit a finite
description. Since the vertices of infinite-state games typically carry some
algebraic structure, optimal strategies may depend just on some finite infor-
mation gathered by analysing the structure of vertices visited along a run in
a play.
The existing results
In this section we give a short summary of the existing results about turn-
based stochastic games with Borel measurable payoffs. We start with finite-
state games.
In general, optimal strategies in finite-state games with the Borel mea-
surable payoff functions introduced in the previous sections exist, do not
need to randomise, and are either memoryless or finite-memory. In some
cases, memory can be 'traded' for randomness (see Chatterjee et al. [2004a]).
The values are rational (assuming that transition probabilities in games are
rational) and computable. The main techniques for establishing these results
are the following:
Strategy improvement. This technique was originally developed for
general stochastic games (see Hoffman and Karp [1966]). An initial strategy
for one of the players is successively improved by switching it at positions
at which the current choices are not locally optimal.
Value iteration. The tuple of all values is computed/approximated
by iterating a suitable functional on some initial vector. For example,
the Bellman functional Γ defined in Section 5.3.1 can be used to com-
pute/approximate the values in games with reachability payoffs.
Convex optimisations. These methods are particularly useful for MDPs
(see Puterman [1994], Filar and Vrieze [1996]). For example, the tuple of
values in MDPs with reachability payoffs is computable in polynomial time
by a simple linear program (see below).
Graph-theoretic methods. Typically, these methods are used to design
e cient (polynomial-time) algorithms deciding whether the value of a
given vertex is equal to one (see, e.g., Chatterjee et al. [1994]).
The individual payoff functions are discussed in greater detail below.
Reachability. Turn-based stochastic games with reachability payoffs were
first considered by Condon [1992] where it was shown that the problem of
whether val ( v ) > 2 for a given vertex v is in NP coNP . It was also observed