Game Development Reference
In-Depth Information
of the utility of any act the agent could choose, even if she knew that the
actual state was s . The minimax-regret decision rule orders acts by their
regret; the 'best' act is the one that minimises regret. Intuitively, this rule is
trying to minimise the regret that an agent would feel if she discovered what
the situation actually was: the 'I wish I had chosen a instead of a ' feeling.
Despite having been used in decision making for over 50 years, up until
recently, there was little work on applying regret minimisation in the context
of game theory. Rather than giving formal definitions here, I explain how
regret can be applied in the context of the traveller's dilemma. For ease of
exposition, I restrict attention to pure strategies.
The acts for each player are that player's pure strategy choices; the states
are the other player's pure strategy choices. Each act-state pair is then just a
strategy profile; the utility of the act-state pair for player i is just the payoff
to player i of the strategy profile. Intuitively, each agent is uncertain about
what the other agent will do, and tries to choose an act that minimises his
regret, given that uncertainty.
It is easy to see that, if the penalty/reward p is such that 2
p
49,
then the acts that minimise regret are the ones in the interval [100
2 p, 100];
the regret for all these acts is 2 p
1. For if player 1 asks for an amount
2 p, 100] and player 2 asks for an amount m
m
[100
m , then the payoff
to player 1 is at least m
p , compared to the payoff of m + p
1 (or just
m if m = 2) that is achieved with the best response; thus, the regret is at
most 2 p
1 in this case. If, instead, player 2 asks for m >m , then player 1
gets a payoff of m + p , and the best possible payoff in the game is 99 + p ,so
his regret is at most 99
m
2 p
1. On the other hand, if player 1 chooses
m< 100
2 p , then his regret will be 99
m> 2 p
1 if player 2 plays 100.
This proves the claim. If p
50, then the unique act that minimises regret
Suppose that 2
49. Applying regret minimisation once suggests
using a strategy in the interval [100
p
2 p, 100]. But we can iterate this
process. If we assume that both players use a strategy in this interval, then
the strategy that minimises regret is that of asking for \$(100 2 p + 1). A
straightforward check shows that this has regret 2 p − 2; all other strategies
have regret 2 p − 1. If p = 2, this approach singles out the strategy of asking
for \$97, which was found to be the best strategy by Becker, Carter, and
Naeve [2005]. As p increases, the act that survives this iterated deletion
process goes down, reaching 2 if p
50. This matches (qualitatively) the
findings of Capra et al. [1999]. (Capra et al. actually considered a slightly
different game where the minimum bid was p (rather than 2). If we instead
consider their version of the game, we get an even closer qualitative match