Game Development Reference
In-Depth Information
that both players have optimal MD strategies. The algorithm proposed by
Condon [1992] requires transformation of the original game into a stopping
game where local optimality equations admit a unique solution (i.e., the
functional Γ defined in Section 5.3.1 has a unique fixed-point) which is a
tuple of rational numbers of linear size. The tuple can then be guessed and
verified (which leads to the NP
coNP upper bound), or computed by a
quadratic program with linear constraints. This algorithm is exponential
even if the number of random vertices is fixed. Randomised algorithms with
sub-exponential expected running time were proposed by Halman [2007]
and Ludwig [1995]. A deterministic algorithm for computing the values and
optimal strategies with
is
the maximum bit length of a transition probability, was recently proposed
by Gimbert and Horn [2008]. This algorithm is polynomial for every fixed
number of random vertices.
Let us note that the exact complexity of the problem of whether val ( v ) > 2
remains unsettled, despite substantial effort of the community. Since the
problem belongs to NP coNP , it is not likely to be NP or coNP complete.
At the same time, it is not known to be in P . On the other hand, the
qualitative variant of the problem (i.e., the question whether val ( v ) = 1) is
solvable in polynomial time (this follows, e.g., from a more general result
about Buchi objectives achieved by de Alfaro and Henzinger [2000]). For
MDPs, the values and optimal strategies are computable in polynomial time
(both in the maximising and minimising subcase) by linear programming.
Given a MDP G =( V,
O
(
|
V |
!
·
(
|
V
|·|→|
+
|
p
|
)) running time, where
|
p
|
and v n is
the only target vertex, the tuple of all val ( v i ) is computable by the following
program (a correctness proof can be found in, e.g., Filar and Vrieze [1996]):
minimise y 1 + ··· + y n
subject to
y n =1
y i
, ( V ,V ) , Prob ) where V =
{
v 1 ,...,v n }
y j
for all v i
v j where v i
V and i<n
y i = v i
→v j x
·
y j
for all v i
V , i<n
x
n .
An optimal maximising strategy can be constructed in polynomial time even
naively (i.e., for each v ∈ V we successively identify a transition v → v such
that the tuple of values does not change when the other outgoing transitions
of v are removed).
(co-)Buchi. In turn-based stochastic games with Buchi and co-Buchi
payoffs, both players have optimal MD strategies, the problem of whether
val ( v )
y i
0
for all i
for a given rational
[0 , 1] is in NP coNP , and the problem