Game Development Reference
In-Depth Information
1
3
1
3
1
3
s
u
1
3
1
3
1
3
t
1
Figure 5.1 A finite-state Markov chain
M
the probability ( 3 ) i , hence
( Reach i ( s, t ))=( 3 ) i− 1
1
3 . Since Reach i ( s, t )
P
·
Reach j ( s, t )=
whenever i
= j , we obtain
Reach i ( s, t ) =
2
3
i− 1
( Reach i ( s, t )) =
1
3 =1
P
( Reach ( s, t )) =
P
P
·
i =1
i =1
i =1
as expected. Also note that Run ( s )
\
Reach ( t ) is uncountable but its proba-
bility is zero.
Turn-based stochastic games and Markov decision processes
A (turn-based) stochastic game is a tuple G =( V,
, ( V ,V ,V ) , Prob )
where ( V,
) is a transition system, ( V ,V ,V ) is a partition of V , and Prob
is a function which to each v
V assigns a positive probability distribution
on the set of its outgoing transitions. We say that G is finitely branching
if ( V,
) is finitely branching. A Markov decision process (MDP) is a
stochastic game where V =
. Note that Markov chains can be
seen as stochastic games where V = V =
or V =
.
A stochastic game is played by two players,
and
, who select transi-
tions in the vertices of V and V , respectively. Let
∈{
,
}
.A strategy
V V assigns a probability
distribution on the set of outgoing transitions of v . The sets of all strategies
for player
for player
is a function which to each wv
are denoted by Σ G and Π G (or just by Σ and Π
if G is understood), respectively. We say that a strategy τ is memoryless
(M) if τ ( wv ) depends just on the last vertex v , and deterministic (D) if
τ ( wv ) is Dirac for all wv . Strategies that are not necessarily memoryless are
called history-dependent (H) , and strategies that are not necessarily de-
terministic are called randomised (R) . A special type of history-dependent
strategies are finite-memory (F) strategies. A strategy τ of player
and player
is a
finite-memory strategy if there is a finite set C =
{
c 1 ,...,c n }
of colours ,a
colouring ν : V
C , and a deterministic finite-state automaton M over the
alphabet C such that for every wv
V V we have that τ ( wv ) depends only