Game Development Reference
In-Depth Information
v
1
2
1
2
1
2
1
2
1
2
1
1
2
1
2
1
2
1
2
1
1
1
1
Figure 5.3 An optimal maximising strategy does not necessarily exist
(b) an optimal maximising strategy in v (if it exists) may require infinite
memory even if G is finitely branching;
(c) if G has finitely many vertices, then there is a MD strategy which is
optimal maximising in every vertex of G.
A counterexample for claim (a) is given in Figure 5.3 (target vertices are grey).
Observe that val ( v ) = 1 but there is no optimal maximising strategy in v .
An example demonstrating that an optimal maximising strategy may require
infinite memory (originally due to Brozek [2009]) is given in Figure 5.4. The
outgoing transitions of the vertex v are the same as the outgoing transitions
of the vertex v in Figure 5.3 and they are not shown in the picture. Observe
that val ( v ) = 1 and hence val ( e i ) = 1 for all i
1. Further, we have that
val ( s i )=1 ( 2 ) i for all i ≥ 1, and hence also val ( d i )=1 ( 2 ) i for all
i ≥ 1. From this we get val ( v )= 3 . Also observe that player hasaHD
optimal maximising strategy σ which simply ensures that player ♦ cannot
gain anything by using transitions d i → e i . That is, whenever the vertex v is
visited, the strategy σ finds a d i stored in the history of a play, and starts to
behave as an ε -optimal maximising strategy in v , where ε< ( 2 ) i . Thus, σ
achieves the result
2
3
or better against every strategy of player
. However,
for every finite-memory strategy σ of player
there is a fixed constant
( σ,π )
v
P σ
P σ
Π. Since P σ
< 1 such that
P
( Reach ( T ))
for every π
< 1,
( 2 ) k for all k>j . Now let π be
1 such that P σ < 1
there surely exists j
a MD strategy of player
which in d i selects either the transition d i
e i or
d i
s i , depending on whether i
j or not, respectively. Now one can easily
( σ,π )
v
2
check that
P
( Reach ( T )) <
3 , which means that σ is not an optimal
maximising strategy in v .
Let us note that Claim (b) of Proposition 5.7 does not hold for MDPs. By
applying Theorem 7.2.11 of Puterman [1994], we can conclude that if there
is some optimal maximising strategy in a vertex v of a (possibly infinitely