Game Development Reference
Brandenburger, Friedenburg, and Kiesler  resolve this paradox in the
context of iterated deletion of weakly dominated strategies by assuming that
strategies were not really eliminated. Rather, they assumed that strategies
that are weakly dominated occur with infinitesimal (but non-zero) probability.
Unfortunately, this approach does not seem to help in the context of iterated
regret minimisation. Assigning deleted strategies infinitesimal probability
will not make 97 a best response to a set of strategies where 97 is given very
high probability. Pass and I deal with this problem by essentially reversing
the approach taken by Brandenburger, Friedenberg, and Keisler. Rather
than assuming common knowledge of rationality, we assign successively
lower probability to higher orders of rationality. The idea is that now, with
overwhelming probability, no assumptions are made about the other players;
with probability , they are assumed to be rational, with probability 2 ,
the other players are assumed to be rational and to believe that they are
playing rational players, and so on. (Of course, 'rationality' is interpreted
here as minimising expected regret.) Thus, players proceed lexicographically.
Their first priority is to minimise regret with respect to all strategies; their
next priority is to minimise regret with respect to strategies that a rational
player would use; and so on. For example, in the traveller's dilemma, all the
choices between 96 and 100 minimise regret with respect to all strategies.
To choose among them, we consider the second priority: minimising regret
with respect to strategies that a rational player would use. Since a rational
player (who is minimising regret) would choose a strategy between 96 and
100, and 97 minimises regret with respect to these strategies, 97 is preferred
to the other strategies between 96 and 100. In [Halpern and Pass, 2009], this
intuition is formalised, and a formal epistemic characterisation is provided
for iterated regret minimisation. This characterisation emphasises the fact
that this approach makes minimal assumptions about the strategies used by
the other agent.
Of course, an agent may have some beliefs about the strategies used by
other agents. These beliefs can be accommodated by allowing the agent to
start the deletion process with a smaller set of strategies (the ones that he
considers the other players might actually use). The changes required to deal
with this generalisation are straightforward.
Example 8.4 The role of prior beliefs is particularly well illustrated in
the finitely repeated prisoner's dilemma. Recall that always defecting is the
only Nash equilibrium of FRPD; it is also the only strategy that is rational-
izable, and the only one that survives iterated deletion of weakly dominated
strategies. Nevertheless, in practice, we see quite a bit of cooperation. We