Game Development Reference
In-Depth Information
often in order to win. In practice, since one wrong guess make Player 1 win,
this suggests that the probability that Player 2 wins (making infinitely many
right guesses) is 0, and thus Player 1 can win with probability 1.
We now formally define a notion of probabilistic winning. First, a ran-
domised strategy for Player 1 is a function α :( L
Σ) L
×
→D
(Σ) where
D
(Σ) denotes the set of probability distributions over Σ, i.e., the set of all
functions f
[0 , 1] such that σ∈ Σ f ( σ ) = 1. Intuitively, if Player 1
uses distribution f , then he plays each action σ with probability f ( σ ). We
assume that Player 1 is informed about which actual actions he played.
Hence, strategies are mappings from interleaved sequences of states and
actions of the form ρ = 0 σ 0 1 σ 1 ...σ n− 1 n that we call labelled histories .
We denote by L ( ρ )= 0 1 ... n the projection of ρ onto L . A strategy α
is observation-based if for all pairs of labelled histories ρ , ρ of the form
ρ = 0 σ 0 1 σ 1 ...σ n− 1 n and ρ = 0 σ 0 1 σ 1 ...σ n− 1 n such that for all i ,
1 ≤ i ≤ n , obs( i )=obs( i ), we have that α ( ρ )= α ( ρ ).
A randomised strategy for Player 2 is a function β :( Σ) +
→D ( L ) such
that for all labelled histories ρ = 0 σ 0 1 σ 1 ...σ n− 1 n and σ
Σ, for all such
Δ. We extend in the expected way
(using projection of labelled histories onto L when necessary) the notions of
observation-based randomised strategies for Player 2, memoryless strategies,
consistent plays, outcome, etc.
Given strategies α and β for Player 1 and Player 2, respectively, and an
initial location 0 , the probability of a labelled history ρ = 0 σ 0 1 σ 1 ...σ n− 1 n
is
that β ( ρ, σ )( ) > 0, we have ( n ,σ, )
( ρ )= n− 1
P
i =0 α ( 0 σ 0 ...σ i− 1 i )( σ i )
·
β ( 0 σ 0 ...σ i− 1 i σ i )( i ). The probability
( π )= ρ∈L 1 ( π ) P
of a history π = 0 1 ... n is
( ρ ), which uniquely defines
the probabilities of measurable sets of (infinite) plays (Vardi [1985]). The
safety, reachability, and parity objectives being Borel objectives, they are
measurable (Kechris [1995]). We denote by Pr α,β
P
( ϕ ) the probability that an
objective ϕ is satisfied by a play starting in in the game G played with
strategies α and β . A randomised strategy α for Player 1 in G is almost-
surely-winning for the objective ϕ if for all randomised strategies β of
Player 2, we have Pr α,β
l I ( ϕ ) = 1. A location ˆ ∈ L is almost-surely-winning
for ϕ if Player 1 has an almost-surely-winning randomised strategy α in the
game G = L, ˆ , Σ , Δ where ˆ is the initial location.
Note that our definition is again asymmetric in the sense that Player 1
has imperfect information about the location of the game while Player 2
has perfect information. While having perfect information does not help
Player 2 in the case of surely-winning, it makes Player 2 stronger in the
probabilistic case. Recent works (Bertrand et al. [2009], Gripon and Serre
[2009]) study a symmetric setting where the two players have imperfect