Game Development Reference
In-Depth Information
the average reward per vertex visited along w . For every n
1, let
avg n ( w )= n n− 1
i =0 r ( w ( i )). Since lim n→∞ avg n ( w ) may not exist for a
given run w , the mean-payoff function appears in two flavours, defined as
follows:
avg n ( w )
avg n ( w ) .
avg sup ( w ) = lim sup
n→∞
avg inf ( w ) = lim inf
n→∞
Let us note that if the players play in a su ciently 'weird' way, it may
happen that avg sup ( w )
= avg inf ( w ) for all runs w in a given play, even
if the underlying game G has just two vertices.
Mean-payoff functions were introduced by Gillette [1957]. For finite-state
games, it is sometimes stipulated that the aim of player
is to maximise
the expectation of avg sup and the aim of player ♦ is to minimise the
expectation of avg inf . For finite-state games, we have that
( avg sup σ, v ) = inf
( avg inf σ, v )
sup
σ∈ Σ
π∈ Π E
inf
π∈ Π sup
σ∈ Σ E
(5.2)
and there are MD strategies
σ
Σ and
π
Π such that
( avg sup σ,π
( avg inf σ, v ) are equal to the value de-
fined by Equality (5.2) (see Gillette [1957], Liggett and Lippman [1969]).
Note that Equality (5.2) does not follow from Equality (5.1) and is
invalid for infinite-state games. To see this, consider an arbitrary sequence
( a i ) i =0 such that a i ∈{
inf π∈ Π
E
) and sup σ∈ Σ
E
v
0 , 1
}
and
n− 1
n− 1
1
n
1
n
A inf = lim inf
n→∞
a i < lim sup
n→∞
a i = A sup .
i =0
i =0
This sequence can be encoded as a game with countably many vertices
v 0 ,v 1 ,... where v i
v i +1 and r ( v i )= a i for all i
0. Obviously, for all
( σ, π )
Σ
×
Π we have that
( avg inf σ,π
( avg sup σ,π
E
)= A inf <A sup =
E
)
v 0
v 0
which means that Equality (5.2) does not hold. Also observe that
val avg inf ( v 0 )
= val avg sup ( v 0 ).
Discounted payoff assigns to each w
Run ( G ) the sum of discounted
rewards
d i
·
r ( w ( i ))
i =0
where 0 <d< 1isa discount factor . Discounting formally captures the
natural idea that the far-away future is not as important as the near future.