Saturday, July 28, 2012

Iterated Prisoner's Dilemma is an Ultimatum Game

Amazing new results on iterated prisoner's dilemma (IPD) by Bill Press (Numerical Recipes) and Freeman Dyson. There is something new under the sun. Once again, physicists invade adjacent field and add value.
Extortion and cooperation in the Prisoner’s Dilemma (June 18, 2012) 
The two-player Iterated Prisoner’s Dilemma game is a model for both sentient and evolutionary behaviors, especially including the emergence of cooperation. It is generally assumed that there exists no simple ultimatum strategy whereby one player can enforce a unilateral claim to an unfair share of rewards. Here, we show that such strategies unexpectedly do exist. In particular, a player X who is witting of these strategies can (i) deterministically set her opponent Y’s score, independently of his strategy or response, or (ii) enforce an extortionate linear relation between her and his scores. Against such a player, an evolutionary player’s best response is to accede to the extortion. Only a player with a theory of mind about his opponent can do better, in which case Iterated Prisoner’s Dilemma is an Ultimatum Game.
Accompanying commentary in PNAS. See these comments by Press and Dyson.
[[Press]] I was originally wondering about a much more modest question that, annoyingly, I couldn’t find already answered in the Prisoner’s Dilemma literature. ... The story now becomes one of symbiosis between computer and human intelligence: The computer could find instances, but not generalize them. I knew that the exact complement of computer intelligence, as yin to yang, is Freeman-Dyson-intelligence. So I showed what I had to Freeman. A day later, he sent me an email with the general result, equations 1-7 in our paper, all worked out. These equations immediately expose all the ZD strategies, including the successful extortionate ones. 
... The successful extortionate strategies have been mathematically present in IPD from the moment that Axelrod defined the game; they just went, seemingly, unnoticed. On a planet in another galaxy, seven million years ago, Axelrod-Prime independently invented the same IPD game. He (it?) was of a species several times more intelligent than Homo sapiens [[i.e., like Dyson!]] and so recognized immediately that, between sentient players, the IPD game is dominated by an obvious extortionate strategy. Hence, for Axelrod-Prime, IPD was just another instantiation of the well-studied Ultimatum Game. He (it?) thus never bothered to publish it.
The history of IPD shows that bounded cognition prevented the dominant strategies from being discovered for over 60 years, despite significant attention from game theorists, computer scientists, economists, evolutionary biologists, etc. Press and Dyson have shown that IPD is effectively an ultimatum game, which is very different from the Tit for Tat stories told by generations of people who worked on IPD (Axelrod, Dawkins, etc., etc.).

How can we expect markets populated by apes to find optimal solutions in finite time under realistic conditions, when the underlying parameters of the game (unlike in IPD) are constantly changing? You cannot think of a simpler quasi-realistic game of cooperation and defection than IPD, yet the game was not understood properly until Dyson investigated it! Economists should think deeply about the history of the academic study of IPD, and what it implies about rationality, heuristics, "efficient" markets (i.e., everyone can be wrong for a long, long time). 

For evolutionary biologists: Dyson clearly thinks this result has implications for multilevel (group vs individual selection):

... Cooperation loses and defection wins. The ZD strategies confirm this conclusion and make it sharper. ... The system evolved to give cooperative tribes an advantage over non-cooperative tribes, using punishment to give cooperation an evolutionary advantage within the tribe. This double selection of tribes and individuals goes way beyond the Prisoners' Dilemma model.
See also What use is game theory? and Plenty of room at the top.

Zero-Determinant Strategies in the Iterated Prisoner’s Dilemma provides a pedagogical summary of the new results.


  1. From the paper:
    "To summarize, player X, witting of ZD strategies, sees IPD as a very different game from how it is conventionally viewed. She chooses an extortion factor , say 3, and commences play. Now, if she thinks that Y has no theory of mind about her (13) (e.g., he is an evolutionary player), then she should go to lunch leaving her fixed strategy mindlessly in place. Y’s evolution will bestow a disproportionate reward on her. However, if she imputes to Y a theory of mind about herself, then she should remain engaged and watch for evidence of Y’s refusing the ultimatum (e.g., lack of evolution favorable to both). If she finds such evidence, then her options are those of the ultimatum game (16). For example, she may reduce the value of , perhaps to its “fair” value of 1."

    I was curious about this, so I ran some simulations of players all playing the extortionate strategy. What's the best way to extort an extorter? The genetic algorithm will quickly fall to chi=1 (fair) and ONLY 1...nothing else consistently gives the largest returns for either player. Now the other parameter (phi) - which represents... search? sociality? something? - varies quite a bit and never seems to settle down. But either way, even though this strategy isn't exactly tit-for-tat, chi=1 gives a strategy that is close to it (and even a bit closer to tit-for-tat with forgiveness), which somehow makes the whole thing a bit less interesting.

    (Self-linking warning: )

  2. This made me chuckle: I knew that the exact complement of computer intelligence, as yin to
    yang, is Freeman-Dyson-intelligence. So I showed what I had to Freeman.

    We should all have a Freeman Dyson on our desk.

  3. I think that reflects the result in Appendix A.

    Having just skimmed it, it looks like the extortionate-excess comes from successfully screwing your opponent. If you are going to remember all of your losses, you are going to be vulnerable to being exploited.

    I have seen specific situations where you need to discard your recorded history in order to understand what is going on more clearly now.

  4. efalken3:57 PM

    I read the part in the edge commentary (page 5 of 8) that said; ”Once both players understand ZD, then each has the power to set the other’s score, independent of the other’s actions. This allows them to make an enforceable treaty, not possible using simpler strategies." This makes me think it is more like tit-for-tat than the Ultimatum game. How is it not?

  5. Although the optimal play between two smart (ZD-aware) players may resemble TFT in some respects, the dynamics of the game are totally different from what generations of people who studied it had thought. ZD negotiation, perhaps leading to cooperation, is a form of iterated ultimatum game.

    Here is what Press says in the Edge comments:

    "This sounds a bit like TFT, but it is actually quite different. First, while TFT guarantees both players the same score, it does not guarantee them both the maximum score. In fact, both players playing TFT is an unstable situation allowing any possible (mutual) score, while the above “treaty” stably awards both players the maximum. Second, TFT plays out on a rapid, move-by-move timescale. There is no way to pause for reflection or negotiation without losing points irretrievably.The ZD treaty regime instead allows for a whole range of careful, deliberative negotiations. You can never change your own score by unilateral action, and you always retain the future ability to extract any specified penalty from your opponent. That is a world in which diplomacy trumps conflict."

  6. efalken4:33 PM

    Well, that seems to me to be a matter of interpretation. At some level, players are acting more like economists contemplating strategies in an Axelrod tournament, deciding on both playing TFT (with negotiations). In fact, why not think this exposes a repeated Ultimatum Game as having the tit-for-tat equilibrium from the Prisoner's Dilemma?

  7. Carson Chow11:44 PM

    I think Dyson qualifies more as a mathematician.

  8. Seem unfortunate to define "evolutionary opponents" as "opponents lacking the ability to condition their opponent". Evolution can (and has) generated such mindful individuals. But let's take it to the next level: organisms should evolve an innate tendency to use (not just respond to) operant conditioning to cheaply change behaviour away from that decrease the total gain in the system. They should also evolve to deploy "cooperation pumps" around their group: barriers through which cooperators are invited in, with individuals who cheat and turn the world into an ultimatum-game being ostracised. Exploiters will evolve strategies to defeat such cooperative return-raising devises by supporting policies that stop people from punishing exploitative behaviour and, especially, choosing who enters or is forced to leave the system.

  9. I think Dyson is talking about cooperation pumps in the last quote about "double selection" (within and between tribes).

    I agree that the definition of "evolutionary opponents" is a bit problematic (although clearly PD are aware of this).

  10. Kiwanda6:39 PM

    Minor typo: sentence should read "Once again, physicists invade adjacent field and think they add value."

  11. Apparently the biologists who wrote the PNAS commentary agree with the physicists.

  12. I guess this post is as good as any for this one: