Evolutionary stability in the infinitely repeated prisoners' dilemma played by two-state Moore machines.
Linster, Bruce G.
I. Introduction
A large literature which attempts to explain how cooperative outcomes be supported in the repeated Prisoners' Dilemma (RPD) has emerged in recent years. An interesting branch of this literature has analyzed the infinite RPD as a game in which two "metaplayers" choose a strategy which is implemented by a finite automation, or Moore machine. This line of research has been especially interesting because it allows us to capture the notion of "bounded" rationality in the players. Also, the use of these machines as devices to implement strategies permits us to quantify the notion of strategic complexity. We can then apply the idea that complexity is costly to the analysis, and the results have proven very interesting.
So far, the work in this area has applied the concepts of Nash equilibrium and evolutionary stability in determining what reasonable outcomes are in these games. However, it has frequently been that pointed out these equilibrium concepts may be either too restrictive or not restrictive enough to be useful in analyzing the infinitely repeated Prisoners' Dilemma. That is, if we use the Nash equilibrium as the appropriate equilibrium concept, the set of equilibrium outcomes is infinitely large. On the other hand, if we require the equilibrium to be evolutionarily stable, the set of such equilibrium outcomes is frequently empty. It has been suggested that the requirement for evolutionary stability may be too stringent a criterion.
The purpose of this essay is to examine what happens in these games in an evolutionary framework under various conditions. The results of the simulations I performed and report here indicate certain combinations of strategies may exist which, although not evolutionary stable in the sense of Maynard Smith [14], prove to be invasion-proof against permissible mutant strategies. This is akin to saying a set of possible strategies may exist which cannot be invaded, but neither the individual strategies nor the mixed strategy represented by the population is evolutionarily stable. We can imagine the mix strategies changing as various mutants attempt to invade the population but the set of strategies remaining the same as the one with which we started. In fact, it may well be the case that this set of strategies will not last forever, but it may last for a very long time. I formalize this idea later in the essay. In order to test the robustness of the hypothesis, a number of different mutation schemes were simulated to examine what sort of outcome we should expect to find in a population similar to the one I explore.
I begin by reviewing the applicable literature and discussing the philosophy underlying this exercise. Then I describes the set of automata I consider as well as the various evolutionary model I simulated. As always, the results we obtain depend upon the assumptions of the model. The assumptions in this case include the set of permissible strategy implementing machines and the rules for how these strategic evolve over time. This line of research differs from other attempts to capture the results of an evolutionary story in two important ways. First,since the strategies I consider are all those which can be implemented by an automation with not more than two states, there is no predisposition, either intentional or unintentional, toward a given outcome. The claim can be made that the strategies which were entered in Robert Axelrod's now famous Prisoners' Dilemma tournament [3] were in some sense biased toward the TIT-FOR-TAT (TFT) type strategies because Axelrod identified TFT as the most successful strategy in a preliminary round of the tournament. The simulations I performed are not predisposed toward a particular outcome except through the evolutionary processes I use, and these are made explicit so we can either them or reject them on more reasonable grounds. Second, I implement different schemes for mutation which seem plausible in a sense I will explain later. Using Axelrod's tournament as an example again, we can imagine mutation entering his dynamic ecological process. Such mutant strategies could easily effect the final population in significant ways. However, what a reasonable mutant looks like in Axelrod's framework is not clear. In this essay, I formalize how the strategies are modeled so we can then evaluate the results based on the underlying assumptions.
II. Cooperation and the Infinite RPD
The Prisoner's Dilemma is well known bimatrix game which has been used to study such economic problems as oligopolistic collusion, international trade, and public goods provision. The players in the game must simultaneously choose between " Cooperate" (C) and "Defect: (D). The payoffs I will use in this analysis are described in Figure I. Regardless of what one's opponent does, a player does better by defecting. This means (D,D) is the only Nash equilibrium in the one-shot game. The dilemma is that if the player could be induced to cooperate they would both do better than if they receive the equilibrium payoff. In fact, any finitely repeated version of the Prisoners' Dilemma has defection at every stage as the only perfect equilibrium outcome. It is not until we consider the infinitely repeated game that cooperation can be sustained. The "Folk Theorem" of repeated games tells us any individually rational payoff vector can be the outcome of a perfect equilibrium of an infinitely repeated game when the payoffs are calculated as "the limit of the mean."(1)
John Maynard Smith and G. R. Price [15] are among those credited with the introduction of evolutionary game theory, but perhaps the most well known of the theoretical works on the subject is Maynard Smith's 1082 book Evolution and the Theory of Games. Axelrod's [3] analysis of the success of cooperation in the repeated Prisoners' Dilemma (RPD) is probably the most famous of many studies of the game in an evolutionary framework. Evolutionary game theory has frequently been used to study coordination games, common interest games, and the Prisoners' Dilemma. Since its introduction many people in a broad range of disciplines have applied it to the study of natural and social science problems. Here I provide a brief review of some of those studies, especially those analyses which deal with the Prisoners' Dilemma.
Robert Axelrod, in his 1984 book The Evolution of Cooperation, suggested cooperation is evolutionary stable in the RPD. However, the definition of evolutionary stability he employed differs from that used by Maynard Smith [4]. Specifically, Axelrod applied a concept he called "collective stability" which requires only that a strategy be a Nash equilibrium when it plays itself. In choosing this definition, Axelrod disallowed the possibility an alternate best reply strategy, which earns an equal payoff against both the indigenous and invading strategies, would be able to successfully infiltrate the native population.
This idea is embodied in the formal definition of an evolutionary stable strategy (ESS). A strategy [Sigma], which can be either a pure or mixed strategy, is an ESS of a symmetric game if and only if: u([Sigma], [Sigma]) [is greater than or equal to] u([Sigma]', [Sigma]) and [Mathematical Expression Omitted] for all [Sigma]' [is not equal to] [Sigma], where u([[Sigma].sub.1] [Sigma].sub.2]) represents the payoff to a player if he plays strategy [Sigma].sub.1] and his opponent chooses strategy [Sigma].sub.2]. In words, these conditions require an ESS be a best reply to itself, and if there are alternate best replies, the ESS must do strictly better against an alternate best reply than the alternate best reply does against itself. The first requirement above means any ESS is a Nash equilibrium. However, the converse is not true because of the second condition, often called the "stability condition," which Axelrod eliminated as part of his definition of evolutionary stability [3, 217]. Therefore, it is in this weaker sense that TFT is stable. There were a number of possible strategies which met this requirement but did not do well in Axelrod's
The above definition of an ESS is actually a characterization of a more basic requirement which holds in pairwise random matching models.(2) Assuming von Neumann-Morgenstern utility functions, the following must hold for some sufficiently small proportion of invaders, [Epsilon] > O: [Mathematical Expression Omitted] This requires the expected payoff from playing [Sigma] in a population with proportion of [Epsilon] of [Sigma]' is strictly greater than the expected utility of those playing [Sigma]'. Loosely speaking, then, an ESS is a strategy which cannot be invaded by an arbitrary small proportion of potential invading strategies.
Robert Boyd and Jeffrey Lorberbaum [7] have shown no pure strategy is evolutionary stable in this more restricted sense in the infinite RPD. Specifically, they show a population of TFT players can be invaded by the appropriate mixture of Suspicious TIT-FOR-TAT, which is the strategy which defects on the first move and then reciprocates the opponent's move, and TIT-FOR-TWO-TATS (TF2T), which reciprocates defection only after two consecutive defections by the opponent. Moreover, they show that if every strategy is possible through mutation, no pure strategy is immune to invasion by some mixture of strategies. However, they suggest every strategy will not be possible through mutation, so cooperation based on reciprocity may indeed flourish. Joseph Farrell and Roger Ware [8] extended the work of Boyd and Lorberbaum to show that for any evolutionary stable mixture of strategies in the infinite RPD, every finite history must occur with positive probability. The implication of this result is no finite mixture of strategies is evolutionary stable in the infinite RPD. They use this result as evidence the ESS concept is too stringent a criterion to be useful.
More recently, Yong-Gwan Kim [12] showed that an ESS in the infinite RPD does not exist unless we allow perturbations. Then he applied Reinhard Selten's [20] concept of "limit ESS." This concept expands the set of evolutionary stable outcomes by allowing perturbations which may put some minimum probability on other strategies in the game. In this way, ties which arise may be broken to allow for more equilibria.(3) Kim was able to prove a "Folk Theorem" of sorts which states we can find a limit ESS outcome which is arbitrarily close to any convex combination of the payoffs from two purely defecting strategies and two purely cooperating strategies.
If we limit the set of possible strategies to those which can be modeled as finite automata, or Moore machines, we can include measures of complexity in the players' utility functions. This has led to interesting results and has only recently been used to make evolutionary arguments. It was Robert Aumann [2] who first suggested modeling strategies as machines with a finite number of states as a way to handle bounded rationality. Abraham Neyman [17] showed if we model strategies this way and limit the number of states exogenously, we can find equilibrium machines which cooperate at every stage of the game if the number of states is less than the number of iterations of the game. The intuition here is easy to see. Suppose the Prisoners' Dilemma will be repeated one hundred times. As long as the machines which implement the strategies have fewer than one hundred states, cooperation can be maintained by the GRIM strategy.(4) In order to beat the GRIM strategy, an opponent must be able to identify the last stage of the game so it can defect. Since it requires one hundred states to count that high, any machine with fewer than one hundred states cannot improve on the payoff to playing GRIM. Roy Radner [18] used a similar argument to show how cooperation can be maintained using strategies which are implemented by finite automata which are similarly bounded in size.
Dilip Abreu and Ariel Rubinstein [1] have studied models where complexity, which in their model is defined to be the number of states in the automation, is endogenously determined. The metagame strategies are implemented by finite automata, and the complexity of the machine enters the players' decisions by making the metagame payoffs depend positively on stage game payoffs and negatively on complexity. By this I mean that if two machines yield the same stage game payoffs, the machine with fewer states has higher metagame payoffs. One model they analyzed had complexity enter the metagame payoffs lexicographically. Abreu and Rubinstein were able to reduce the set of equilibrium outcomes to the rational payoff vectors on the main and alternate diagonals of the set of feasible outcomes which provide each player with more than his security level. The interpretation Abreu and Rubinstein put on their model is that of a rational decision maker who must choose someone to play the game for him. However, the player who implements the strategy can only carry out simple instructions. We can think of these decision rules as "rules of thumb" which evolve over time. Another way to think of this is to assign some evolutionary process the roll of the metaplayer which selects the strategies. It is this interpretation which I will explore.
Ken Binmore and Larry Samuelson [5] have done interesting work recently using the model developed by Abreu and Rubinstein. They show that if we consider the same type of utility functions used by Abreu and Rubinstein, any evolutionary stable strategy has both players earning the cooperative, or utilitarian,(5) payoff. In other words, Abreu and Rubinstein refine the set of equilibrium payoffs by considering complexity of the implementing machine, and Binmore and Samuelson further refine this set to one outcome with an evolutionary stability argument. This result depends crucially on the definition of complexity we choose. Jeffrey Banks and Rangarajan Sundaram [4] have shown if we use preferences which are lexicographic in complexity and use the number of transitions in the Moore machine as the measure of complexity, the only Nash equilibrium machine defects always.
These ideas are summarized in Figure 2. The equilibrium payoffs allowed by the "Folk Theorem" are all those vectors in the shaded region. The equilibrium payoff set from the Abreu/Rubinstein models contains the rational on the main and alternate diagonals which assure each player a payoff greater than one. Binmore and Samuelson's evolutionary model reduced the set to the unique point (3,3), and the Banks/Sundaram model yields the one equilibrium outcome in which both players always defect and receive the payoff vector (1,1).
Selecting the "right" equilibrium is frequently the result of some ad hoc decision rule. For example, many economists and game theorists argue that when multiple equilibria exist the solution to the game should have the players achieving an efficient outcome. Using the terminology of Harsanyi [10] and Harsanyi and Selten [11], the players should choose this outcome based on payoff dominance (6) if possible. This intuition is especially appealing for repeated games since there is recurrent interaction between the players. The argument for selecting payoff dominant equilibrium outcomes is especially strong when some sort of coordinating device is present. For example, if we allow preplay communication between players, we should expect the equilibrium they agree upon will be payoff dominant if possible. Even in the absence of such a coordinating device, many game theorists and economists expect to see the payoff dominant equilibrium outcome. In the infinitive RPD, of course, this means we should expect to see a cooperative outcome However, if such selection criteria are to be meaningful, they should be based on first principles and not chosen because they give nice results.
This area of game theory has just recently become the target of significant research. The results so far have proven interesting but not conclusive. The work I described by Binmore and Samuelson is the first research in this area. They prove that any symmetric equilibrium strategy which yields less than the cooperative payoff can be invaded by a strategy which does not well, in stage game payoffs, as a native does if it plays a native, but it does better against itself than it does against a native. The invading machines are able to send a signal to their opponents. If they are not playing their own type, it makes no difference in the metagame payoffs because payoffs are computed as the limit of the mean, and in the limit they get the same payoff as the native strategy. However, they earn greater average payoffs than the native strategy because they do better when playing strategies like themselves. The idea behind signalling to opponents is nothing new. Arthur Robson [19] speaks of a "secret handshake" which identifies members of a certain group to each other.
Drew Fudenberg and Eric Maskin [9] have shown, independently of Binmore and Samuelson, that cooperation in RPD games is evolutionary stable in an environment where only strategies of finite complexity can be used, metagame payoffs are calculated as the limit of the mean, and noise exists. In their model, the term "noise" refers to some chance an action may be misperceived by a player's opponent. John Miller [16] performed interesting simulations of the evolution of automata which were modeled as a string of digits. The impact of noise on the model was one of the factors he examined. He simulated the evolution of population playing a finite RPD in which there were strictly positive probabilities of errors in the transition of information about what a player's opponent did on the previous move. (7) His results are difficult to characterize in a sentence or two; however, the methodology he used is interesting and the results indicate the effects of noise on a model like this are not negligible.
There is no definitive theoretical result explaining why a certain outcome should emerge from some evolutionary process. We are caught between arguments like those of Farell and Ware, which indicate no strategy or mix of strategies is evolutionarily stable in the infinite RPD, and those which identify one particular strategy as the only reasonable result of an evolutionary process. With this in mind, I intend to explore the evolution of strategies which can implemented by two-state Moore Machines in the infinitive RPD. Axelrod's work has been standard in this area. However, as I pointed out earlier, his results may have been part of a self-fulfilling prophecy in which TFT was identified as the most successful strategy. Another problem with Axelrod's work is that the next generation's population was determined only by the replication dynamic process, and new strategies were not allowed to enter. In other words, the strategies were assumed to be perfect in their ability to breed true. In the next section, I will describe the basics of the simulations I performed. I described how I modeled the strategies, how I chose to model possible mutations, and how I attempted to capture the idea of strategic complexity.
III. The Model
In this essay, I consider only two-player games. The underlying game is the version of the Prisoners' Dilemma described in Figure 1. More formally, I will consider the game G which is specified by the four-tuple <[S.sub.1], [S.sub.2], [[Pi].sub.1], [[Pi].sub.2]> where [S.sub.i] = {C,D} the strategy set and [[Pi].sub.i] : [S.sub.1] X [S.sub.2] [right arrow] R is the payoff function. These elements are summarized in the bimatrix form of the game.
The infinitely repeated game [G.sup.[infinity]] = <[R.sub.1], [R.sub.2], [P.sub.1], [P.sub.2]> is constructed with G as the stage game. Each player's strategy space, [R.sub.i], is the set of functions which map any history of play into [S.sub.i]. That is, [R.sub.i] ={f : [h.sub.t] [right arrow] [S.sub.i]} where [h.sub.t] is the history of the map up to and including time t. The payoff functions [P.sub.1] and [P.sub.2] are defined as the limit of the mean of the stage payoffs. This is always defined because we are considering only strategies which can be implemented using finite automata. Since the machines must eventually enter a cycle, we know this limit exists. Therefore, we have [Mathematical Expression Omitted] where [r.sub.i] [Epsilon] [R.sub.i].
I also use Abreu and Rubinstein's definition of an automation selection game. Here G# is the automation selection game defined as <[A.sub.1], [A.sub.2], [U.sub.1], [U.sub.2]>. The strategy space [A.sub.i] is the set of Moore machines which I will describe more completely later. The functions [U.sub.i] are the true utility, or profit, functions of the players. At various times during their precise form will vary. However, these functions basically adjust the payoff functions [P.sub.i] for complexity. In this automation selection game we can think of automation a [Epsilon] [A.sub.i] as actually being Player i in [G.sup.[infinity]]. In a sense, this automation is Player i's agent who follows the simple instructions represented by the Moore machine.
Now I will define more formally the kind of machine I have in mind as implementing these simple instructions. A Moore machine is described by a four-tuple <Q, [q.sub.0], [Lambda], [Mu]>, where Q = {[q.sub.0], [q.sub.1], [q.sub.2],..., [q.sub.m]} is a finite set of states, [q.sub.0] [Epsilon] Q is the initial state, [Lambda] : Q [right arrow] {C, D} is the output function which maps the state into a strategic choice, and [Mu] : Q x {C,D} [right arrow] Q is te transition function which maps the current state and the opponent's choice into a state (not necessarily different). Moore machines not only allow us to model strategies repeated games concisely and conveniently, they let s analytically calculate payoffs for infinite games.
These machines can be represented quite easily as demonstrated in Figure 3. Each circle in a machine represents a possible state of the machine, and the state on the left is the initial state. I depict the action taken when the machine is in that state by the upper case letter inside the circle. The arrows indicate how the machine transitions after a play of the stage game, and the transition depends on the opponent's choice indicated by lower case letters. The strategies which are "nice" by Axelrod's definition are those which begin by cooperating and continue cooperating as long as the opponent does. An example of a nice strategy is GRIM or TFT.
Determining a reasonable mutation scheme is not without problems. One alternative is to assume it is possible for one strategy to mutate to any other strategy with equal likelihood. I call this "uniform mutation." However, this is not a very satisfying way to proceed. If we are thinking of these machines as decision rules evolving over time, we must develop some idea of "closeness" in the strategies and only allow strategies to mutate to those which are close. Another idea we should try to capture is that simplifying mutations are more likely than complicating mutations. I will use this idea later as a way to pick up the costs of complexity.
In order to capture these notions of "closeness" and simplification, I developed a mutation scheme based on how these strategies would appear if they were machines in the physical sense and subject to breakdowns. We can think of these machines as consisting of transmitters and the appropriate wires hooked up to receivers to control which signal is transmitted. The simplest machines are those which need only transmit either "cooperate" or "defect." These correspond to the single state machines which always defect or always cooperate. Two-state machines are made up of the two transmitters which send the signals corresponding to cooperating and defection. I chose to model these strategies so the machines attempt to move to the next state, or change the signal it sends, after taking an action. However, the transition may be blocked for one of two reasons. First, there may be no circuit from one state to the next. In this case a machine will continually send the same response because it can reach no other state. Another way the transition can be inhibited is by perceiving an appropriate action from the opponent to prevent the transition. We can think of these inhibitors as open switches which keep the state from changing. For example, consider the GRIM strategy which I have modeled in Figure 4. The GRIM machine will begin cooperating and attempt to switch to the defecting state unless the machine receives the signal that the opponent cooperated. The receiver, indicated by a boxed "c," will keep the circuit open as long as it receives the signal that its opponent cooperated in the last stage. If the machine does not detect that its opponent cooperated, the switch will close, completing the circuit which allows the machine to make the transition to the defecting state.
This stylized model of how strategies are implemented offers an opportunity to consider what sensible mutations look like. For example, we can assume a reasonable mutation involves a wire in the machine breaking. We can imagine a strategy being misinterpreted by the player and some transition being missed. If the mutant machine continues to function as well or better in the environment than the machine from which it evolved, we can expect the mutant machine will be copied.
In this essay I will consider "breakdowns" of the machines, or possible mutations in which a wire breaks. Also, I assume it is possible for the signal to get reversed in the machine so the automaton sends C when it should send D and vice versa. This is not the only way to model these strategies as mechanical or electrical devices. However, this is one way which is relatively simple and yields reasonable mutation patterns. I will not examine increasing complexity through mutation. This model of the strategies allows us one opportunity to capture the cost of complexity. The more complex a machine is, the more likely it is to break down into other simpler machines. Therefore, if complexity adds nothing to the payoffs, it will not endure in the population.
Another way I try to capture the costs of complexity is to model parts of the machines as having some cost. This can be applied to the evolutionary dynamic process in one of two ways which closely follow the notions of complexity described by Abreu/Rubinstein and Banks/Sundaram. To capture the idea that states are costly, I adjust the payoffs so the single state machines receive a premium in the play of the game. It turns out the size of the premium affects the result in an evolutionary framework. Also, I impose a cost for maintaining each of these stylized wires/transitions in the machines. Again, the payoff matrix is modified to reflect this idea.
The model of Abreu and Rubinstein does not lend itself to evolutionary simulations. Operationalizing the idea of lexicographic preferences is not straightforward in an evolutionary context. In the model I have in mind, the proportion of the population which plays a particular strategy at time t + 1 depends on how well that strategy does at time relative to the average member of the population. Lexicographic preferences do not allow such simple averaging of payoffs, but we can handle the idea that preferences depend on complexity by imposing costs on the metaplayers.
The evolutionary process I employ in the simulations has two parts. The first attempts to capture the idea that successful strategies or rules of thumb will tend to be copied. The second part reflects the idea that over time these strategies will not be perfectly duplicated, or will not breed true in biological terms. First I will describe the replication dynamic, and then I will describe how the strategies mutate.
To see how the replication dynamic process works, imagine a game in which there is a strategy space A with n strategies or automata, A = {[a.sub.1], [a.sub.2], ..., [a.sub.n]. The RPD is then played at time t, and each strategy earns a payoff depending on the strategy with which it is matched. These payoffs can be represented in a n X n matrix B with element [b.sub.ij] being the payoff to a player who uses machine [a.sub.i] if he is matched against a player who uses [a.sub.j]. Then we have [Mathematical Expression Omitted] Also, let p(t) = [([p.sub.1] (t), [p.sub.2] (t), ..., [p.sub.n] (t))].sup.T] be the vector of proportions of each type of player in the population at time t [Epsilon] {0, 1, 2, ...}. We can then define the expected payoff to a player of strategy [a.sub.i] at time t as [(Bp(t)).sub.i]. This is the ith element of the vector Bp(t). Also, the expected payoff for a member of the population at time t is [p(t).sup.T] Bp(t). The process we are considering has the proportion of strategy i at time t + 1 equal to its proportion at time t multiplied by the ratio of its own expected payoff to the expected payoff of all players. Then we have the following dynamic process: [p.sub.i] (t + 1) = [[p.sub.i] (t)[[Bp(t)).sub.i]/[(p(t).sup.T] Bp(t))]. It is easy to see whether a strategy's proportion in the population gets larger or smaller depends on whether or not it is doing better or worse than average. The proportion of a strategy in the next generation depends not only on its own relative performance, but also on its proportion in the current population. This captures the idea that a strategy must be both successful and observed by other strategies to be copied.
The dynamic process described above is based on the idea that some machines or rules of thumb would be so unsuccessful they would probably not be used in the future, while the superior machines would be imitated. We can also think of the payoffs from playing the RPD as fitness in the biological sense. That is, strategies with higher payoffs are able to reproduce (asexually) more successfully than strategies with lower payoffs. In either case, we would expect to see the best strategies flourish and the worst strategies die out. This process allows us to evaluate how well a strategy will do when the less effective ones cease being important, and only the best strategies remain to play each other. The dynamic process I used simulates evolution with an infinitely large population and random matching. Essentially, when a player of a certain strategy type plays the game, he is playing against a mixed strategy which is represented by the different proportions in the population.
In order to capture the notion that strategies mutate in the evolutionary process, I used a matrix M to represent the rate at which strategies change into each other. A typical element [m.sub.ij] is the probability a strategy of type j mutates into a type i player. Therefore, after the reproduction dynamic process described above has taken place, the strategies undergo a mutation process which is described by M. Then, we can find the proportions of the population, p*(t + 1), which play the infinite RPD in period t + 1 with the equation p*(t = 1) = Mp(t + 1) where p(t + 1) is the vector of proportions which are the result of the dynamic replication process described above.
Some very simple examples may help see what the mutation matrix does. For example, if we assume one strategy can change into any other strategy with equal probability, [Mu], then the matrix M has form [m.sub.ij] = [Mu] for i [is not equal to] j and [m.sub.ij = 1 - (n - 1) [Mu] if i = j and there are n total strategies. On the other hand, if no mutation takes place the above is true with [Mu] = 0. The only restrictions we need to place on this matrix are (1) [Mathematical Expression Omitted] and (2) [m.sub.ij] [Epsilon] [0, 1] [Mathematical Expression Omitted], j [Epsilon] {1, 2, ..., n}. These restrictions are completely reasonable and merely require an individual either stay the same or change. We do not allow one machine of type a to turn into some other number of type b machines.
With the stylized machines we can derive a mutation matrix. Rather than list the entire matrix, I will just show how one strategy mutates. Consider the strategy which implements the cooperative outcome in the Abreu and Rubinstein automaton selection game. A diagram of the stylized machine is depicted at the top of Figure 5. Each of the possible mutations is indicated by an arrow beginning at the original machine. The machines are identified by two letters enclosed in a box. The 26 possible two-state machines are depicted in Figure 6. It is easy to see what happens as each of the imagined wires breaks. For example, if the wire which attempts to change actions from D to C (this represents the transition from the defecting state to the cooperating state in the diagram of the Moore machine) breaks, the machine will only be able to send the D signal. If, however, the wire from the receiver on the top of the diagram were to malfunction, this machine would mutate into AC. If the signals got reversed, this CC machine would become cc. We need not go any further to see the CC machine can mutate into DD, AC, CA, CD, or cc if only one of the wires breaks. For simplicity I assume only one wire can break in any generation. We can think of this as approximating independent mutations where the mutation rates are so small the probability of two mutations in a given generation is negligible.
Finally, before actually discussing the results of the simulations, I will describe the twenty-six strategies I used. These strategies can be represented by all of the possible one and two state Moore machines. These simple strategies can be thought of as decision rules which are very easily implemented. It is not correct to say these machines are restricted to a memory of only one period. In a sense, the GRIM strategy has a memory of infinite length. However, the strategies I describe here are unable to allow for strategies which need to count to any number greater than one. It is interesting to note the most successful strategy in Axelrod's RPD tournament, TFT, is represented here. Nearly all the strategies which enjoyed some degree of success in the tournament used TFT as the main rule.(8) All of the strategies are described in Figure 6.
Eight of the possible twenty-six strategies form a Nash equilibrium when they play against a machine identical to themselves. These equilibrium strategies include all of the "nice" strategies except "always cooperate." Using the notation here, these are ca, cb, cc, and cd. In addition, the utility maximizing equilibrium strategy from the Abreu and Rubinstein model,(9) CC, is an equilibrium strategy when it plays itself. The three other strategies which form equilibrium when they play themselves are the "always defect" strategy (DD), aa, and AC. The aa machine cooperates on the first move and then begins defecting forever. The AC machine begins by defecting and then switches to a cooperating state where it stays until the opponent defects. An opponent's defection will move this machine back to the initial state. Once in the initial state again, it repeats the same sequence of play by defecting once and then moving to the cooperating state until it detects another defection.
IV. Simulation Results
In this section I will briefly describe the results of the simulations I performed. I elaborate on the result only briefly in this section. I save most of the discussion for the next section where I analyze which strategies do well under the various mutation schemes and draw conclusions about what characteristics the successful strategies have. These will not be in strict agreement with the characteristics identified by Axelrod. There are some circumstances in which "niceness"(10) appears to be a successful characteristic, but under other circumstances it may not be.
The simulations were accomplished on a Zenith 248 computer with programs written in Turbo Pascal. In the evolutionary dynamic simulations, each machine was assigned a fitness based on how well it does against the other strategies and other machines like itself. The proportion in the next generation, before mutation, is the proportion in the current generation times the ratio of the individual strategy's fitness to the average fitness in the population. After evolutionary process takes place, a strategy changes into another with a probability determined by the appropriate mutation scheme.
Evolution without Mutation
I performed one hundred simulations of one thousand generations without any mutation as a control to determine which, if any, particular strategies are the likely results of a strictly ecological process. These simulations began from randomly selected starting points in the unit simplex in R(26). One thousand generations of the evolutionary dynamic process described above were then simulated. These simulations were repeated one hundred times. The significance of the ecological simulations is that once a strategy nearly dies out, there are no trembles to increase its proportions again. This ecological simulation is very similar to what Axelrod did with the results of his RPD tournament.
I graphically describe the average of the proportions of the population for the final generation in each of the simulations in Figure 7. It is clear the cooperative outcomes proved the most evolutionarily fit in this environment. In fact, we can see the only strategies to survive this dynamic process are the "nice" ones, and all of them survived.(11) It is interesting that the strategy which is most exploitable in some sense, "always cooperate, survives in this ecological process. This is because it does well against those other strategies which do well. Specifically, this strategy does well against the other "nice" strategies, and those "nice" strategies drive the "nasty" strategies to near extinction quickly enough so the nicest and infinitely forgiving strategy, ALL C, can survive.
Evolution with Uniform Mutation
I then performed similar simulations using uniform mutation with the same one hundred random starting points I used in the previous simulation. I chose a mutation rate of 0.0001. I present the data in Figure 8 in the same format I used for the evolution without mutation. The same set of strategies that did well in the last simulation are successful in this simulation.(12) The two strategies which appear to do significantly differently are TFT and ALL C. TIT-FOR-TAT does not do as well in the presence of mutation, and ALL C performs better. The intuition for why TFT does not do as well seems relatively clear TFT is not as aggressive in exploiting poor strategies as GRIM. Hence, when these poor strategies appear, GRIM does relatively better than TFT. Hence, TFT's proportion of the population diminishes over time.
Note that in order for a strategy which is a large proportion of the population to keep from becoming smaller, it must do well relative to the others because of the mutation process. The larger the proportion of a given strategy there is in the population, the better that strategy must do to keep from shrinking in its proportion. If all strategies did equally well, this mutation scheme would equilibrate when all strategies are in equal proportion. However, the periodic introduction of poor strategies allows GRIM to do better than TFT. The improvement of ALL C seems to be the result of the mutation scheme. Although ALL C doesn't do very well against "nasty" strategies, there are not many of them around. Since ALL C is a small proportion of the population, it is a net gainer when the mutation process works. Additionally, since most of the population is nice it does not do very badly in terms of payoffs, so over time it grows in size until it attains the levels found in this simulation. We can think of strategies like GRIM as "enforcers" in this game. That is, GRIM keeps the proportions of the "nasty" strategies so ALL C can survive.
Evolution with Stylized Mutation
The work of Abreu and Rubinstein hypothesizes less complex strategies are better than more complex strategies. The notion I attempt to capture in this simulation is that more complex strategies will tend to break down more often than less complex strategies. Hence, unless the extra complexity results in increased stage game payoffs, the more complex strategies will tend to be replaced by less complex strategies through evolution. We can think of these "rules of thumb" mutating to simpler rules. If making a rule less complex does not reduce its payoffs in the stage games, we should expect the simpler rule to be copied more often. Again, which rules are successful will depend on the environment.
I simulated one hundred evolutionary processes of one thousand generations once again, and I started at the same random starting points as in the previous two simulations with the same mutation rate of 0.0001. I summarize the results in Figure 9. Again, the GRIM strategy does much better than any of the others. The intuition is slightly different here. Not only does the GRIM strategy exploit irrational strategies, but it is one of the least complex two state machines in terms of the model I chose because it only needs to perceive one deviation and perform one transition. Hence, it will increase in proportion as a result of mutation from other successful strategies as well as its own strategic fitness.
Mutants Who Enter as Groups (Uniform Mutation)
Next, I attempted to capture the possibility that strategies can grow in their own small communities and then attempt to invade the population all at once. The biological story which goes with this simulation has a certain group of animals which are separated from the rest of their species when the land they are on is cut off from the main land mass by water The animals live in isolation and develop their own characteristics. At some point, a land bridge forms, and this relatively large group of mutants enters the population. I attempt to capture the essence of this thought experiment by allowing one percent of the population to become a mutant type and attempt to invade the population. Then I allow the population dynamics to settle down again. Once the mutant strategy's influence is absorbed by the population, another band of mutants enters. This process continues.
I performed this simulation one time for thirty thousand generations. I chose to use the mid-point of the unit simplex in R(26) as the initial distribution. That is, I began with every strategy as 1/26 of the population. Also, I chose to regard a particular strategy as being extinct once its proportion of the population falls below [10.sup.-5]. This allows us to more easily identify when the population stabilizes. In this exercise, I allow all of the strategies to enter with equal probability. That is, I assume uniform mutation. I show the final population in Figure 10. It is interesting to note that after the population stabilizes from the initial starting point, it changes very little. Specifically, after initial stabilization and before mutants attempt to invade the population the "nice" strategies are in these proportions: ca, 0.575; cb, 0.152; cc. 0.164; cd, 0.105; and dd, 0.005. Here again, the "nice" strategies do very well.
We cannot say these strategies are immune from invasion in this model because there is a very small probability the "always cooperate" strategy will be the invading mutant for a sufficient number of consecutive generations so the population can be successfully invaded by "always defect." Since this probability, although extremely small, is greater than zero, the event will take place if the process continues long enough. However, we can say this set of strategies is nearly invasion-proof in the above sense. This idea is akin to the concept formalized by Binmore and Samuelson [6] which they call a Payoff-equivalent, Polymorphous Modified ESS (PPMESS) without the complexity criteria. The idea behind this concept is certain groups of strategies may exist which cannot be successfully invaded The individual proportions of each strategy will change as various type of potential invaders appear, but the population cannot be invaded.
It is somewhat surprising that the ALL C strategy survives in such a large proportion. The reason for this is clear. Although the ALL C strategy can be exploited, it does not exist in large enough proportions to allow a "nasty" strategy to invade. It is not difficult to see that if only the GRIM, ALL C, and ALL D strategies were possible, ALL D would not be able to invade a "nice" population until ALL C accounted for at least 2/3 of the population. However, this is extremely unlikely because as ALL D appears through mutation it keeps the proportion of ALL C low.
Evolution with Mutants Who Enter as Groups (Stylized Mutation)
In this simulation I attempted to capture the same elements as in the last simulation except I use the mutation scheme from our stylized strategy implementing machines. In this example the strategies which attempt to invade in a group must come from the set of mutants which are possible from one of the strategies in the population. The probability a strategy mutates in this simulation is its proportion of the population. Then any possible mutant from that strategy is equally likely. Here we require those who are "stranded on the island" and then become the invading mutants be reasonably similar to the population from which they came. This simulation was performed like the previous one. The final population is described in Figure 11.
The success of GRIM and ALL C is evident here. These two strategies, which appeared successful in the previous simulations, are the only two survivors of this process. The intuition here again relates to their relative simplicity. Here we seem to have captured the idea that simple strategies will be more evolutionary fit. Although GRIM is more complex than ALL C, the fact that ALL D enters periodically as a mutant invader keeps ALL C in small enough proportions so ALL D cannot successfully invade the population. The same sort of symbiotic relationship described by Binmore and Samuelson is present again. I will discuss the reason why the "stable" group of strategies in this model are different than those in Binmore and Samuelson's model in the next section.
Evolution with Costly States
In this subsection the automation represent costly rationality in addition to bounded rationality. We can think of this as increased strategic complexity requiring more time and resources. Rather than deal with different types of mutation schemes, I will allow mutation in these simulations from our stylized model of the strategies with a mutation rate of 0.0001. Here I want to focus primarily on the effects of costly complexity on the evolutionary dynamic process.
In order to capture the idea that states are costly, I supplement the payoffs of the one state machines (ALL C and ALL D) by 0.05 and 0.1. I simulated five thousand generations of this dynamic process which began from the center of the unit simplex in R(26). I present the results in Figures 12 and 13. These diagrams show how the population changes over time. I have chosen to represent only the GRIM, ALL C, and ALL D strategies because their dynamic behavior accounts for all of the interesting phenomena.
Again, the GRIM and ALL C strategies play an important role in the evolutionary process. When having states is less costly, we can see the ALL D strategy cannot successfully invade the population on a permanent basis. However, when ALL C gets very large, which will occur through both mutation and the supplemented payoffs, the ALL D strategy nearly invades but then dies out before it can complete the invasion. However, if the cost of maintaining a state is sufficiently large, ALL D can successfully invade. Since it is one of the simples machines, there will not be any mutation which can threaten its existence. Note, however, this result is dependent on specific values for complexity costs.
We can see in Figures 12 and 13 that if states are sufficiently costly, ALL D will be the evolutionary outcome. The population will initially tend toward
being all "nice." Since the "nasty" strategies are in very small proportions in the population, the ALL C strategy does very well because of the higher payoff. When it becomes a sufficiently large population, it is invaded by ALL D. If the supplemental payoff is not great enough, ALL D will enjoy a brief period of success, but will again be displaced by "nice" strategies, especially GRIM, and the cycle starts again.
Evolution with Costly Transitions
In these simulations I attempt to capture the idea that maintaining transitions is costly. That is, any time a decision must be made by the person implementing these rules, there is an associated cost. The major difference in this simulation relative to the previous ones is that in these simulations GRIM is more successful than any of the other "nice" two state machines. The reason is that GRIM is less complex than some of other strategies. That is, it is an easier rule to apply than rules like TFT, but clearly it is more complex than ALL C. Again, GRIM is able to take advantage of any of the poor strategies and reap the benefits of cooperation.
The evolutionary results are depicted in Figures 14 and 15. Here I use penalties for complexity of 0.01 and 0.1 per "wire" in the stylized machines. We can see that as long as the penalty is not too great, the population will tend to cycle. It begins by evolving to nearly all GRIM. However, since ALL C earns a higher payoff because it is less complex, it eventually becomes a very large portion of the population. When the proportion of ALL C in the population is large enough, ALL D's proportion grows rapidly. Then, unless the penalty is too large, GRIM emerges as the dominant strategy, and the cycle begins once more. If the penalty is large enough, though, ALL D will eventually dominate the population.
V. Discussion
These simulations suggest a number of things. The most conspicuous result is the evolutionary success of the GRIM strategy. This is surprising since it has not appeared as the result of any previous evolutionary simulation I am aware of. The argument against the GRIM strategy being successful is that it is not forgiving. This is not a problem in these simulations because two-state Moore machines cannot probe for weakness in the strategies periodically. However, there is something to be learned from this result. One possible reason for GRIM's success it that it can exploit poor strategies. It is not difficult to see that TFT can never do strictly better than its opponent because it merely mirrors its opponent. GRIM can exploit irrational play better than TFT. When GRIM and TFT play the other strategies, GRIM is usually better at taking advantage of bad play. Of the twenty-four other strategies, GRIM earns more than TFT against fifteen of them. On the other hand TFT does better than GRIM against only two of them.
The results of these simulations fall far short of suggesting GRIM is evolutionarily superior to all other strategies in the RPD, but they do show a profound weakness in the TFT strategy. A strategy which just copies its opponent cannot take advantage of poor play which will be present through such processes as mutation. These simulations do, however, suggest the most successful strategies will be those able to take advantage of irrational play in an evolutionary situation where mutants enter over time.
It is obvious the GRIM strategy doesn't exploit all irrational play because the ALL C strategy survived in positive proportions in all of the simulations without complexity costs. This is because the GRIM strategy cannot identify the ALL C strategy when they play each other. Binmore and Samuelson [6] suggested the possibility irrational strategies and rational strategies could exist together as the consequence of an evolutionary dynamic process. In these simulations there may be, in the long run, a substantial number of irrational players. However, the fact that they look exactly like GRIM when playing it keeps them from being exploited. Also, any strategy which attempts to exploit these irrational strategies is forced to take the noncooperative payoff in the game most of the time. Therefore, the "nice" strategy GRIM protects, in a sense, the irrational players. I refer to it as na "enforcer" strategy which keeps the "nasty" strategies from invading the population. If for some reason the number of ALL C players were to increase, ALL D could invade the population. In the simulations I performed that was an extremely unlikely event. Generally, whenever the proportion of ALL C grew relative to other strategies, the ALL D and other "nasty" strategies would force it to become smaller. However, because nearly all of the strategies in the population were "nice," ALL C survived in small proportions while the "nasty" strategies earned very small payoffs relative to the cooperative strategies and, hence, died off rapidly.
The "nice" strategies in this simulation form something akin to Binmore and Samuelson's Payoff-equivalent Polymorphous MESS (PPMESS). However, it is clear they are not a PPMESS in the sense of Binmore and Samuelson. The reason for this gets to the heart of the differences between the Abreu/Rubinstein and Binmore/Samuelson models and the evolutionary processes I simulated here.
The simulations I performed have a number of different strategies entering over time either simultaneously or as a single strategy type in a group. In other words, my simulations have the populations being repeatedly perturbed. However, most studies in this area analyze the stability of populations against a single perturbation. Here, though, we look at how population mixtures fare while being repeatedly attacked by possible mutant strategies.
A discussion of Binmore and Samuelson's results will help clarify the issue. They suggest a mixture of the cc, cd, CC, and CA machines form a PPMESS. That is, an appropriate mixture of these strategies cannot be successfully invaded by a mutant strategy. This is true because they earn equal payoffs against each other and have equal complexity. There is a symbiotic relationship among these strategies which protects them from invasion. However, this is true only if we consider these perturbations as one time events Certainly this mix of strategies is immune to a small proportion of invading mutants. Although the exact mix of strategies may change after a mutant invasion is thwarted, the same set of strategies will survive. However, in repeated simulations I found these machines are not immune to invasion when mutants repeatedly attempt to enter the population because the GRIM strategy will be able to invade after some number of unsuccessful tries. When a GRIM machines plays the Abreu/Rubinstein utility maximizing equilibrium machine (UTIL) it earns higher payoffs against UTIL than UTIL earns against it. Hence, we have a situation where cc, cd, and CA attempt to invade the population over time through mutation and are successful because they earn the cooperative payoff when they play themselves and UTIL. However, after these strategies enter the population, GRIM can successfully invade if it appears often enough. Each time GRIm attempts to invade, the proportion of UTIl remaining decreases relative to cc and cd (both "nice" strategies) because GRIM earns the cooperative payoff against them and does better against UTIL than UTIL does against GRIM. Also, GRIM earns the maximum possible payoff against CA. Therefore, after enough attempts GRIM will successfully invade the population.
The ideas of perfect equilibria and trembles are important to these simulations. Reinhard Selten [20] and others have used the idea of trembles in the play of a game to select among multiple equilibria. He also applies this idea to evolutionary stability to expand the set of evolutionarily viable strategies. However, applying the idea of trembles to the play of the automation selection game is not straightforward. We cannot allow a machine to "misplay" during an infinite RPD because there appears to be no sensible way to define such a tremble. If a machine were to change strategies with positive during the game, the change in payoffs could be discontinuous because we calculate the payoffs as the limit of the mean. To see this, suppose two GRIM strategies were playing the RPD, and with some arbitrarily small positive probability one changes to ALL D during the game. We know the change will occur in finite time. Hence, the payoffs will switch to the defecting payoff for both machines. Such a discontinuity makes this sort of tremble unusable. Instead, I consider trembles in the evolutionary process in these simulations. That is, what happens if the strategies do not breed true. Another way of looking at the problem is to imagine what happens if there are perturbations to the payoff matrix.
These trembles point out another difference between these simulations and the work by Abreu/Rubinstein and Binmore/Samuelson. In this analysis, there is no such thing as an unused state. That is, because mutant strategies can appear over time, the idea of unused states means nothing here. This goes back to the fact that in other evolutionary models only one strategy or a mix of strategies enters at a time. Or, perhaps more precisely, we only analyze how a strategy or mix of strategies does against a potential mix of invaders. Without the aid of computer simulations it is very difficult to capture the interactions between strategies as well as the impact of a certain mutation scheme.
It is also interesting to note that if the penalty for complexity is sufficiently small, the "nice" strategies will remain immune to successful invasion. It is not until the cost of complexity gets sufficiently large that it affects the qualitative results of the evolutionary simulation. We can think of complexity as being the cost of monitoring your opponent and responding accordingly. If the cost of monitoring actions increases enough, the defecting outcome will prevail.
Finally, I will comment on the obvious success of the GRIM strategy in these simulations. We must keep in mind we considered only two-state machines. This is a severe limitations on the complexity we allow. In fact, a strategy essentially identical to GRIM was submitted to Axelrod's RPD tournament and finished fifty-second out of sixty-three strategies. However, the success of GRIM in this tournament captures something which has frequently been overlooked in this sort of model. For a strategy to survive over time, it should be able to exploit possible bad strategies. The idea that TFT should be in some sense the "best" strategy in an infinite RPD has the significant failing that TFT doesn't do strictly better than any strategy it meets. It seems reasonable to expect a strategy which succeeds over time should be able to take advantage of bad players who appear.
VI. Summary
This essay reports the results of a number of computer simulations of infinite RPDs. The results are somewhat surprising because the GRIM strategy is clearly the most successful of the cooperative strategies. I attribute this to the fact that GRIM is able to both exploit poor strategies and earn the cooperative payoff against other "nice" strategies. The results do not suggest GRIM would be the "best" strategy in all environments.
This work supports various results of Binmore and Samuelson's recent work. That is, the idea of a PPMESS seems valid in evolutionary models. The differences in our results stem from the fact that the strategies is my simulations are subject to repeated perturbations, and they analyze a game which is more tranquil since they look at what strategies can invade a population if the intruders appear only once either one at a time or in certain mixes. The reason for performing these simulations is to capture what happens when many things are happening at once. Analyzing these problems analytically is intractable because of the complex interrelationships among the strategies and the dynamic process. We are able to see how all of these forces affect the final population under these specific conditions through the use of computer simulations. (1)Robert Aumann [2] provides a proof and further discussion. (2)Maynard Smith [14] offers an excellent discussion of this point. (3)The idea is similar to Selten's concept of perfect equilibrium. For an excellent discussion, see Samuelson [19]. (4)The GRIM strategy begins by cooperating, but if either player defects it defects forever. (5)By this they mean the sum of the payoff is maximized. (6)A payoff dominant equilibrium outcome is one in which both players obtain a higher payoff than they would get in any other equilibrium outcome. Obviously these do not always exist. (7)Specifically, he considered error rates of 1 percent and 5 percent. (8)Linster [13] discusses Axelrod's prisoners' dilemma tournament with alternate payoff schemes. (9)I follow Binmore and Samuelson [6] and call this "UTIL." (10)Axelrod [3] referred to strategies which are not the first to defect as "nice." (11)By "surviving" I mean they were represented in the population by a proportion greater than [10.sup.10]. (12)Here I refer to those strategies which survived in a proportion at least three times the mutation rates as doing well.
References
[1]Abreu, Dilip and Ariel Rubinstein, "The Structure of Nash Equilibrium in Repeated Games with Finite Automata." Econometrica, November 1988, 1259-81. [2]Aumann, Robert J., "Survey of Repeated Games," in Essays in Game Theory and Mathematical Economics in Honor of Oskar Morgenstern, symposium volume. Mannheim: Bibliographisches Institut, 1981, pp. 11-42. [3]Axelrod, Robert. The Evolution of Cooperation. New York: Basic Books, 1984. [4]Banks, Jeffrey and Rangarajan Sundaram, "Repeated Games, Finite Automata, and Complexity." Games and Economic Behavior June 1990, 97-117. [5]Binmore, Kenneth G. and Larry Samuelson. "Notes on Evolutionary Stable Strategies in Repeated Games Played by Finite Automata." Unpublished manuscript, The University of Michigan, 1989. [6]-- and --. "Evolutionary Stability in Repeated Games Played by Finite Automata." Unpublished manuscript, The University of Michigan, 1990. [7]Boyd, Robert and Jeffrey Lorberbaum, "No Pure Strategy is Evolutionary Stable in the Repeated Prisoner's Dilemma Game." Nature. May 1987, 58-59. [8]Farrell, Joseph and Roger Ware, "Evolutionary Stability in the Repeated Prisoner's Dilemma." Theoretical Population Biology. 1989, 61-66. [9]Fudenberg, Drew and Eric Maskin. "Evolution and Cooperation in Noisy Repeated Games." Harvard Institute for Economic Research Discussion Paper 1469, Harvard University, 1990. [10]Harsanyi, John. Rational Behavior and Bargaining Equilibrium in Games and Social Situations. Cambridge: Cambridge University Press, 1977. [11]-- and Reinhard Selten. A General Theory of Equilibrium Selection in Games. Cambridge, Mass.: MIT Press, 1989. [12]Kim, Yong-Gwan, "Evolutionary Stable Strategies in the Repeated Prisoner's Dilemma." Unpublished manuscript, The University of California, San Diego, 1989. [13]Linster, Bruce G. "Essays on Cooperation and Competition." Ph.D. Dissertation, The University of Michigan, 1990. [14]Maynard Smith, John. Evolution and the Theory of Games. Cambridge: Cambridge University Press. 1982. [15]--and George R. Price, "The Logic of Animal Conflict." Nature, November 1973, 15-18. [16]Miller, John. "The Evolution of Automata in the Repeated Prisoner's Dilemma." Unpublished manuscript. The University of Michigan, 1987. [17]Neyman, Abraham, "Bounded Complexity Justifies Cooperation in the Finitely Repeated Prisoner's Dilemma." Economic Letters, 1985, 227-29. [18]Radner, Roy. "Can Bounded Rationality Resolve the Prisoner's Dilemma?", in Essays in Honor of General Debreu. New York: North-Holland, 1986. [19]Robson, Arthur J. "Efficiency in Evolutionary Games: Darwin, Nash, and the Secret Handshake." University of Western Ontario Department of Economics Working Paper, University of Western Ontario, 1989. [20]Selten, Reinhard, "Evolutionary Stability in Extensive Two-Person Games." Mathematical Social Sciences, 1983, 269-363.