Game Development Reference
In-Depth Information
a, b
b
a
a
b
0
1
2
Figure 6.5 A reachability game G
6.4 Games with imperfect information:
almost-surely-winning
We revisit the 3-coin game. In Exercise 6.11, we have seen that Player 1
does not have an observation-based deterministic winning strategy in this
game when Player 2 is allowed to exchange the position of the coins that are
not toggled. This is because Player 2 can guess the choice that Player 1 will
make in the next round. When a deterministic strategy for Player 1 is fixed,
this information is formally available to Player 2 but this is not realistic in
practice. Player 1 should use a source of randomisation in order to avoid
the possibility that Player 2 can guess the choice she will make in the next
round. Whenever the game is in a configuration with two heads, Player 1
chooses uniformly at random one of the three coins. Clearly the probability
to choose the coin showing tail is
1
3 no matter if Player 2 has decided to
exchange the coins or not at the previous step. Otherwise, she should play
the same coin a second time to make sure she comes back to a configuration
with two heads. She then repeats the same randomised strategy. Every two
rounds, Player 1 has a
1
3 probability to reach the winning configuration.
Note also that she is sure to avoid the losing configuration (all coins on
tails). This simple strategy is thus winning the reachability objective with
probability one. This illustrates the power of randomised strategies in games
with imperfect information.
6.4.1 Playing with randomised strategies
Before going into the formalisation, let us take a look at the example of
Figure 6.3. From the initial location 1 , we have seen that Player 1 has no
surely-winning strategy for reaching 4 . This is because for all strategies α
of Player 1, there exists a play π ∈ Outcome 1 ( G, α ) that visits 3 infinitely
often, and therefore never visits 4 . However, the strategy β of Player 2 such
that π = outcome( G, α, β ) chooses the successor ˆ of 1 in a way that depends
on the next move of Player 1, namely ˆ = 2 if α plays action a next, and
ˆ = 2 if α plays action b next. In a concrete implementation of the system,
this means that Player 2 needs to predict the behaviour of Player 1 infinitely