Game Development Reference

In-Depth Information

1

3

1

3

1

3

s

u

1

3

1

3

1

3

t

1

Figure 5.1 A finite-state Markov chain

M

the probability (
3
)
i
, hence

(
Reach
i
(
s, t
))=(
3
)
i−
1

1

3
. Since
Reach
i
(
s, t
)

P

·

∩

Reach
j
(
s, t
)=

∅

whenever
i

=
j
, we obtain

∞

Reach
i
(
s, t
)
=
∞

2

3

i−
1

(
Reach
i
(
s, t
)) =
∞

1

3
=1

P

(
Reach
(
s, t
)) =

P

P

·

i
=1

i
=1

i
=1

as expected. Also note that
Run
(
s
)

\

Reach
(
t
) is uncountable but its proba-

bility is zero.

Turn-based stochastic games and Markov decision processes

A
(turn-based) stochastic game
is a tuple
G
=(
V,

→

,
(
V
,V
♦
,V
)
, Prob
)

where (
V,

) is a transition system, (
V
,V
♦
,V
) is a partition of
V
, and
Prob

is a function which to each
v

→

V
assigns a positive probability distribution

on the set of its outgoing transitions. We say that
G
is
finitely branching

if (
V,

∈

) is finitely branching. A
Markov decision process (MDP)
is a

stochastic game where
V
♦
=

→

. Note that Markov chains can be

seen as stochastic games where
V
=
V
♦
=

∅

or
V
=

∅

.

A stochastic game is played by two players,

∅

and

♦

, who select transi-

tions in the vertices of
V
and
V
♦
, respectively. Let

∈{

,

♦
}

.A
strategy

V
∗
V
assigns a probability

distribution on the set of outgoing transitions of
v
. The sets of all strategies

for player

for player

is a function which to each
wv

∈

are denoted by Σ
G
and Π
G
(or just by Σ and Π

if
G
is understood), respectively. We say that a strategy
τ
is
memoryless

(M)
if
τ
(
wv
) depends just on the last vertex
v
, and
deterministic (D)
if

τ
(
wv
) is Dirac for all
wv
. Strategies that are not necessarily memoryless are

called
history-dependent (H)
, and strategies that are not necessarily de-

terministic are called
randomised (R)
. A special type of history-dependent

strategies are
finite-memory (F)
strategies. A strategy
τ
of player

and player

♦

is a

finite-memory strategy if there is a finite set
C
=

{

c
1
,...,c
n
}

of
colours
,a

colouring ν
:
V

C
, and a deterministic finite-state automaton
M
over the

alphabet
C
such that for every
wv

→

V
∗
V
we have that
τ
(
wv
) depends only

∈

Search Nedrilad ::

Custom Search