Game Development Reference

In-Depth Information

empirically tested among game theorists (with
p
= 2) they typically did not

play anywhere close to 2. Becker, Carter, and Naeve [2005] asked members of

the Game Theory Society (presumably, all experts in game theory) to submit

a strategy for the game. Fifty-one of them did so. Of the 45 that submitted

pure strategies, 33 submitted a strategy of 95 or higher, and 38 submitted a

strategy of 90 or higher; only three submitted the 'recommended' strategy

of 2. The strategy that performed best (in pairwise matchups against all

submitted strategies) was 97, which had an average payoff of $85.09. The

worst average payoff went to those who played 2; it was only $3.92.

Another sequence of experiments by Capra et al. [1999] showed, among

other things, that this result was quite sensitive to the choice of
p
. For low

values of
p
, people tended to play high values, and keep playing them when

the game was repeated. By way of contrast, for high values of
p
, people

started much lower, and converged to playing 2 after a few rounds of repeated

play. The standard solution concepts (Nash equilibrium, rationalizability,

etc.) are all insensitive to the choice of
p
; for example, (2,2) is the only Nash

equilibrium for all choices of
p>
1.

Arguably, part of the problem here is that Nash equilibrium presumes

that players know what other players are doing. I now consider a solution

concept that Rafael Pass and I [2009] recently introduced,
iterated regret

minimisation
, that attempts to deal with this problem, and exhibits the

same qualitative behaviour as that observed in experiments in many games of

interest. (The discussion in the rest of this section is taken almost verbatim

from [Halpern and Pass, 2009].)

The idea of minimising regret was introduced (independently) in decision

theory by Niehans [1948] and Savage [1951]. To explain how we use it in a

game-theoretic context, I first review how it works in a single-agent decision

problem. Suppose that an agent chooses an act from a set
A
of acts. The

agent is uncertain as to the true state of the world; there is a set
S
of possible

states. Associated with each state
s

A
is the utility
u
(
a, s
)

of performing act
a
if
s
is the true state of the world. For simplicity, we take

S
and
A
to be finite here. The idea behind the
minimax regret
rule is to

hedge the agent's bets, by doing reasonably well no matter what the actual

state is.

Formally, for each state
s
, let
u
∗
(
s
) be the best outcome in state
s
; that is,

u
∗
(
s
)=max
a∈A
u
(
a, s
). The
regret
of
a
in state
s
, denoted
regret
u
(
a, s
), is

u
∗
(
s
)

∈

S
and act
a

∈

u
(
a, s
); that is, the regret of
a
in
s
is the difference between the utility

of the best possible outcome in
s
and the utility of performing act
a
in
s
.

Let
regret
u
(
a
)=max
s∈S
regret
u
(
a, s
). For example, if
regret
u
(
a
) = 2, then

in each state
s
, the utility of performing
a
in
s
is guaranteed to be within 2

−

Search Nedrilad ::

Custom Search