Game Development Reference
In-Depth Information
only Nash equilibrium is to always defect; this can be seen by a backwards
induction argument. (The last round is like the one-shot game, so both
players should defect; given that they are both defecting at the last round,
they should both defect at the second-last round; and so on.) This seems quite
unreasonable. And, indeed, in experiments, people do not always defect. In
fact, quite often they cooperate throughout the game. Are they irrational? It
is hard to call this irrational behaviour, given that the 'irrational' players do
much better than supposedly rational players who always defect. There have
been many attempts to explain cooperation in FRPD in the literature (see,
for example, [Kreps et al., 1982]). Indeed, there have even been well-known
attempts that take computation into account; it can be shown that if players
are restricted to using a finite automaton with bounded complexity, then there
exist equilibria that allow for cooperation [Neyman, 1985, Papadimitriou
and Yannakakis, 1994]. However, the strategies used in those equilibria are
quite complex, and require the use of large automata; as a consequence this
approach does not seem to provide a satisfactory explanation of why people
choose to cooperate.
Using the framework described above leads to a straightforward explana-
tion. Consider the tit for tat strategy, which proceeds as follows: a player
cooperates at the first round, and then at round m + 1, does whatever his
opponent did at round m . Thus, if the opponent cooperated at the previous
round, then you reward him by continuing to cooperate; if he defected at
the previous round, you punish him by defecting. If both players play tit for
tat, then they cooperate throughout the game. Interestingly, tit for tat does
exceedingly well in FRPD tournaments, where computer programs play each
other [Axelrod, 1984].
Tit for tat is a simple program, which needs very little memory. Suppose
that we charge even a modest amount for memory usage, and that there is a
discount factor δ , with 0 . 5 <δ< 1, so that if the player gets a reward of r m
in round m , his total reward over the whole N -round game (not including
the cost of memory usage) is taken to be m =1 δ m r m . In this case, it is
easy to see that, no matter what the cost of memory is, as long as it is
positive, for a su ciently long game, it will be a Nash equilibrium for both
players to play tit for tat. For the best response to tit for tat is to play tit
for tat up to the last round, and then to defect. But following this strategy
requires the player to keep track of the round number, which requires the
use of extra memory. The extra gain of \$2 achieved by defecting at the last
round, if su ciently discounted, will not be worth the cost of keeping track of
the round number. (A similar argument works without assuming a discount