Game Development Reference

In-Depth Information

of the utility of any act the agent could choose, even if she knew that the

actual state was
s
. The minimax-regret decision rule orders acts by their

regret; the 'best' act is the one that minimises regret. Intuitively, this rule is

trying to minimise the regret that an agent would feel if she discovered what

the situation actually was: the 'I wish I had chosen
a
instead of
a
' feeling.

Despite having been used in decision making for over 50 years, up until

recently, there was little work on applying regret minimisation in the context

of game theory. Rather than giving formal definitions here, I explain how

regret can be applied in the context of the traveller's dilemma. For ease of

exposition, I restrict attention to pure strategies.

The acts for each player are that player's pure strategy choices; the states

are the other player's pure strategy choices. Each act-state pair is then just a

strategy profile; the utility of the act-state pair for player
i
is just the payoff

to player
i
of the strategy profile. Intuitively, each agent is uncertain about

what the other agent will do, and tries to choose an act that minimises his

regret, given that uncertainty.

It is easy to see that, if the penalty/reward
p
is such that 2

≤

p

≤

49,

then the acts that minimise regret are the ones in the interval [100

−

2
p,
100];

the regret for all these acts is 2
p

−

1. For if player 1 asks for an amount

2
p,
100] and player 2 asks for an amount
m
≤

m

∈

[100

−

m
, then the payoff

to player 1 is at least
m
−

p
, compared to the payoff of
m
+
p

1 (or just

m
if
m
= 2) that is achieved with the best response; thus, the regret is at

most 2
p

−

1 in this case. If, instead, player 2 asks for
m
>m
, then player 1

gets a payoff of
m
+
p
, and the best possible payoff in the game is 99 +
p
,so

his regret is at most 99

−

−

m

≤

2
p

−

1. On the other hand, if player 1 chooses

m<
100

−

2
p
, then his regret will be 99

−

m>
2
p

−

1 if player 2 plays 100.

This proves the claim. If
p

≥

50, then the unique act that minimises regret

is asking for $2.

Suppose that 2

49. Applying regret minimisation once suggests

using a strategy in the interval [100

≤

p

≤

2
p,
100]. But we can iterate this

process. If we assume that both players use a strategy in this interval, then

the strategy that minimises regret is that of asking for $(100
−
2
p
+ 1). A

straightforward check shows that this has regret 2
p −
2; all other strategies

have regret 2
p −
1. If
p
= 2, this approach singles out the strategy of asking

for $97, which was found to be the best strategy by Becker, Carter, and

Naeve [2005]. As
p
increases, the act that survives this iterated deletion

process goes down, reaching 2 if
p

−

50. This matches (qualitatively) the

findings of Capra et al. [1999]. (Capra et al. actually considered a slightly

different game where the minimum bid was
p
(rather than 2). If we instead

consider their version of the game, we get an even closer qualitative match

≥

Search Nedrilad ::

Custom Search