Game Development Reference

In-Depth Information

only Nash equilibrium is to always defect; this can be seen by a backwards

induction argument. (The last round is like the one-shot game, so both

players should defect; given that they are both defecting at the last round,

they should both defect at the second-last round; and so on.) This seems quite

unreasonable. And, indeed, in experiments, people do not always defect. In

fact, quite often they cooperate throughout the game. Are they irrational? It

is hard to call this irrational behaviour, given that the 'irrational' players do

much better than supposedly rational players who always defect. There have

been many attempts to explain cooperation in FRPD in the literature (see,

for example, [Kreps et al., 1982]). Indeed, there have even been well-known

attempts that take computation into account; it can be shown that if players

are restricted to using a finite automaton with bounded complexity, then there

exist equilibria that allow for cooperation [Neyman, 1985, Papadimitriou

and Yannakakis, 1994]. However, the strategies used in those equilibria are

quite complex, and require the use of large automata; as a consequence this

approach does not seem to provide a satisfactory explanation of why people

choose to cooperate.

Using the framework described above leads to a straightforward explana-

tion. Consider the
tit for tat
strategy, which proceeds as follows: a player

cooperates at the first round, and then at round
m
+ 1, does whatever his

opponent did at round
m
. Thus, if the opponent cooperated at the previous

round, then you reward him by continuing to cooperate; if he defected at

the previous round, you punish him by defecting. If both players play tit for

tat, then they cooperate throughout the game. Interestingly, tit for tat does

exceedingly well in FRPD tournaments, where computer programs play each

other [Axelrod, 1984].

Tit for tat is a simple program, which needs very little memory. Suppose

that we charge even a modest amount for memory usage, and that there is a

discount factor
δ
, with 0
.
5
<δ<
1, so that if the player gets a reward of
r
m

in round
m
, his total reward over the whole
N
-round game (not including

the cost of memory usage) is taken to be
m
=1
δ
m
r
m
. In this case, it is

easy to see that, no matter what the cost of memory is, as long as it is

positive, for a su
ciently long game, it will be a Nash equilibrium for both

players to play tit for tat. For the best response to tit for tat is to play tit

for tat up to the last round, and then to defect. But following this strategy

requires the player to keep track of the round number, which requires the

use of extra memory. The extra gain of $2 achieved by defecting at the last

round, if su
ciently discounted, will not be worth the cost of keeping track of

the round number. (A similar argument works without assuming a discount

Search Nedrilad ::

Custom Search