2022.01.07 19:36

What is prisoners dilemma

Linster has conducted simulations of evolutionary PD's among the strategies that can be represented by two-state Moore machines. Since the strategies are deterministic, we must distinguish between the versions that cooperate on the first round and those that defect on the first round. Linster simulated a variety of EPD tournaments among the two-state strategies.

In some, a penalty was levied for increased complexity in the form of reduced payoffs for machines requiring more states or more links. As one might expect, results vary somewhat depending on conditions.

There are some striking differences, however, between all of Linster's results and those of Nowak and Sigmund. This is a strategy whose imperfect variants seem to have been remarkably uncompetitive for Nowak and Sigmund. It cooperates until its opponent has defected once, and then defects for the rest of the game. According to Skyrms and Vanderschraaf, both Hobbes and Hume identified it as the strategy that underlies our cooperative behavior in important PD-like situations.

The explanation for the discrepancy between GRIM's strong performance for Linster and its poor performance for Nowak and Sigmund probably has to do with its sharp deterioration in the presence of error. Thus, in the long run imperfect GRIM does poorly against itself. Note that imperfect GRIM is also likely to do poorly against imperfect versions of these. The observation that evolution might lead to a stable mix of strategies perhaps each serving to protect others against particular types of invaders rather than a single dominant strategy is suggestive.

Equally suggestive is the result obtained under a few special conditions in which evolution leads to a recurring cycle of population mixes. One might expect it to be possible to predict the strategies that will prevail in EPDs meeting various conditions, and to justify such predictions by formal proofs. Selten , includes an example of a game with no evolutionarily stable strategy, and Selten's argument that there is no such strategy clearly applies to the EPD and other evolutionary games.

Boyd and Lorberbaum and Farrell and Ware present still different proofs demonstrating that no strategies for the EPD are evolutionarily stable. Unsurprisingly, the paradox is resolved by observing that the three groups of authors each employ slightly different conceptions of evolutionary stability. The conceptual tangle is unraveled in a series of papers by Bendor and Swistak. Two central stability concepts are described and applied to the EPD below.

Readers who wish to compare these with some others that appear in the literature may consult the following brief guide:. An evolutionary game has usn-stability just in case it meets a simple condition on payoffs identified by Maynard Smith:.

MS says that any invaders do strictly worse against the natives than the natives themselves do against the natives or else they get exactly the same payoff against the natives as the natives themselves do, but the native does better against the invader than the invader himself does. This argument, of course, uses the assumption that any strategy in the iterated game is a possible invader. There may be good reason to restrict the available strategies.

For example, if the players are assumed to have no knowledge of previous interactions, then it may be appropriate to restrict available strategies to the unconditional ones. Since a pair of players then get the same payoffs in every round of an iterated game, we may as well take each round of the evolutionary game to be one-shot games between every pair of players rather than iterated games.

Indeed, this is the kind of evolutionary game that Maynard Smith himself considered. Thus MS and usn-stability are non-trivial conditions in some contexts. This condition turns out to be equivalent to a weakened version of MS identified by Bendor and Swistak. BS and rwb-stability are non-trivial conditions in the more general evolutionary framework: strategies for the EPD that satisfy rwb-stability do exist. This does not particularly vindicate any of the strategies discussed above, however.

Bendor and Swistak prove a result analogous to the folk theorem mentioned previously: If the shadow of the future is sufficiently large, there are rwb-stable strategies supporting any degree of cooperation from zero to one. One way to distinguish among the strategies that meet BS is by the size of the invasion required to overturn the natives, or, equivalently, by the proportion of natives required to maintain stability. They maintain that this result does allow them to begin to provide a theoretical justification for Axelrod's claims.

They are able to show that, as the shadow of the future approaches one, any strategy that is nice meaning that it is never first to defect and retaliatory meaning that it always defects immediately after it has been defected against has a minimal stabilizing frequency approaching one half. TFT has both these properties and, in fact, they are the first two of the four properties Axelrod cited as instrumental to TFT 's success.

Bendor and Swistak's results must be interpreted with some care. Furthermore, imperfect versions of TFT do not satisfy rwb-stability. They can be overthrown by arbitrarily small invasions of deterministic TFT or, indeed, by arbitrary small invasions of any less imperfect TFT. Second, one must remember that the results about minimal stabilizing frequencies only concern weak stability.

If the number of generations is large compared with the original population as it often is in biological applications , a population that is initially composed entirely of players employing the same maximally robust strategy, could well admit a sequence of small invading groups that eventually reduces the original strategy to less than half of the population. At that point the original strategy could be overthrown. A strategy requiring a large invasion to overturn is likely to prevail longer than a strategy requiring only a small invasion.

Since the simulations required imperfection and since they generated a sequence of mutants vastly larger than the original population, there is no real contradiction here. Nevertheless the discrepancy suggests that we do not yet have a theoretical understanding of EPDs sufficient to predict the strategies that will emerge under various plausible conditions. Like usn-stability, the concept of rwb-stability can be more discriminating if it is relativized to a particular set of strategies.

The significance of results like these, however, depends on the plausibility of such limitations on the set of permissible strategies.

In most human interactions that come to mind, a refusal to engage with a particular partner does not represent quite the same loss of opportunity to engage with another as a choice to engage does. If I buy a car from an unscrupulous dealer, I'll have to wait a long time before my next car purchase to do better; but if I refuse to engage with her I can immediately begin negotiating with a neighboring dealer. Nevertheless, there may be situations among people and more likely among non-human animals or among nations or corporations that are appropriately modeled by evolutionary versions of the optional PD.

None of these strategies meets the BS condition, and so no strategy is rwb-stable within this family. Adding the option of not-playing to the evolutionary PD does permit escape from from the unhappy state of universal defection, but leads to an only slightly less undesirable outcome in which a population cycles repeatedly through states of universal non-engagement.

DA cooperates with any player that has never defected against it, and otherwise refuses to engage. Simulations among agents who are permitted any strategies where a move depend on the two previous moves of its opponent are said to provide rough corroboration.

Some caution is in order here. There is little analysis of which strategies underlie the cooperating populations in the simulations and, indeed, DA is not an option for an agent whose memory goes back only two games.

Oddly, slightly less cooperativity is reported for the fully optional version of the game than for the semi-optional though in each case, as would be expected, cooperativity is significantly greater tha n for the ordinary PD. The evolutionary dynamics employed and the measures of cooperativity employed are sufficiently idiosyncratic to make comparisons with other work difficult. Despite all these caveats, it seems safe to conclude that taking engagement to be optional can provide another explanation for the fact that universal, unrelenting defection is rarely seen in patterns of interaction sometimes modeled as evolutionary PDs.

When Kendall et al began organizing their IPD tournaments to mark the 20th anniversary of the publication of Axelrod's influential book, they received an innocent-seeming inquiry: could one entrant make multiple submissions? If they did not immediately realize the significance of this question, they must surely have done so when a group from the Technical University of Graz attempted to enter more than 10, individually named strategies to the first tournament.

Most of these aspiring entries were disallowed. As it turned out, however, the winning strategy came from a group from the University of Southhampton who themselves submitted over half of the strategies that were allowed. Thereafter the enablers always cooperate against the master allowing themselves to be exploited and defect against all others thereby lowering scores of the master's competitors. The master defects against the enablers and plays a reasonable strategy like TFT against all others.

Under these circumstances the score of the master depends on only two factors: the size of its enabling army, and the accuracy and cost of the identifying code sequence.

Cost is the payoff value lost by using early moves to signal one's identity rather than to follow a more productive strategy. Longer codes produce greater accuracy at greater cost. Coming to a better appreciation of these ideas, Kendall et al organized additional tournaments in , one restricting each author to a single entry and another restricting each author to a team of twenty entries though, as Slany and Klienrich observe, such restrictions are difficult or impossible to enforce.

One may well wonder whether this sort of signaling and team play has any importance beyond showing competitive scholars how to win round-robin IPD tournaments. In an evolutionary setting armies of enablers would rapidly head towards extinction, leaving a master strategy to face its high-scoring competitors alone.

Presumably, in these cases the exploiters transfer enough to the exploited to ensure the latter's continued availability. Perhaps such payoff transfers within teams should be permitted in IPD tournaments intended to explore these issues. Even without such rule changes, however, there are less extreme forms of team play that would perform better in an evolutionary setting.

If one allows enablers to recognize and cooperate with one another, they will gain considerably with no loss to their master, except when an enabler wrongly identified an outsider as one of its kind. If one allowed them to play reasonable strategies against outsiders they would gain still more, though the risk to their master through outsiders' gain would be considerably greater.

Slany and Kienreich the Graz group label these approaches EW, EP, DW, and DP and observe among other properties that for teams of equal and sufficiently large size this order mirrors the order of the best-performing member of the team from best to worst. The possibility of error raises special difficulties for team play with signaling of this kind: an incorrect signal could be accidentally sent, or a correct signal could be misintepreted.

Rogers et al the Southampton group realized that the problem of sending and receiving signals when error is possible is a well-studied problem in computer science: reliable communication over a noisy channel. In both and one of the IPD tournaments organized by Kendall et al introduced noise to simulate the possibility of error. By employing some of the standard error-correcting codes designed to deal with communication over a noisy channel as their signaling protocol, the Southampton group won both with a comfortable margin.

In IPD tournaments like those of Axelrod and Kendall et al, players know nothing about each other except their moves in the game, and so they can use no other information to signal their membership in a group. In the real world it would seem much more likely that other avenues of communication would be available.

The notion that cooperative outcomes might be facilitated by such communication among players is an old idea in game theory. Santos et al show how this might be possible. Their work borrows from an influential paper by Arthur Robson If the underlying game is a PD the population will stablize with universal defection.

To illustrate the beneficial possibilities of communication, let us suppose the former. Since these players do as well as the originals against ousiders and better against themselves, they will soon take over the population. So communications does seem to facilitate cooperation.

If the underlying game is a PD, however, once the new uniform cooperating population has taken over, it is itself vulnerable. The resulting population can then be infiltrated but not supplanted by other, non-signaling, defectors. So Robson concluded that signaling could move a population from the inferior equilibrium to the superior one in an evolutionary stag hunt, but could only delay the establishment of universal defection in a PD. Santos et al observe, however, that, if a second signal is available, the universally defecting population could be supplanted by a small group of mutants using it as new secret handshake.

Of course this group is itself vulnerable to mutants who mimic the second signal while defecting against all. In this case however, the resulting dark age may no longer be permanent.

If a mutant group of signal-one handshakers re-emerges before any signal-one defectors have drifted into the population, they will again take over, and the cycle will be repeated.

If a third signal were available, of course, the return of cooperation would be even easier. The time spent in each state depends on the payoffs of the PD and the number of available signals. Santos et al demonstrate, however, that, for finite populations with sufficiently slow mutation rates and large numbers of available signals cooperation predominates in EPDs with signaling.

A previous section discussed a controversial argument that cooperation is rational in a PD when each player knows that the other is enough like himself to make it likely that they will choose the same move.

An analog of this argument in the evolutionary context is more obviously cogent. If agents are not paired at random, but rather are more likely to play others employing similar strategies, then cooperative behavior is more likely to emerge.

This may be an array with a rectangular boundary, for example, or a circle, or surface of a sphere or surface of a torus with no boundary. From the geographical arrangement two possibly identical kinds of neighborhoods are identified for each player. This can model either the idea that each player is invaded by its most successful neighbor or the idea that each player adopts the most successful strategy that it sees. As usual, the impetus for looking at spatial SPDs seems to come from Axelrod.

Four copies of each of the 63 strategies submitted to Axelrod's tournament were arranged on a grid with a spherical geometry so that each cell had four neighbors for both interaction and comparison. For every initial random distribution, the resulting SPD eventually reached a state where the strategy in every cell was cooperating with all its neighbors, at which point no further evolution is possible.

In these end-states only about ten of the 63 original strategies remained. The remaining strategies were no longer randomly distributed, but segregated into clumps of various sizes. Axelrod also showed that under special conditions evolution in an SPD can create successions of complex symmetrical patterns that do not appear to reach any steady-state equilibrium. To get an idea of why cooperative behavior might spread in this and similar frameworks, consider two agents on either side of a frontier between cooperating and non-cooperating subpopulations.

The cooperative agent sees a cooperative neighbor whose four neighbors all cooperate, and who therefore gets four times the reward payoff after playing them all.

So he will imitate this neighbor's strategy and remain cooperative. The non-cooperating agent, on the other hand, sees his cooperative counterpart, who gets three reward payoffs from his cooperative neighbors and one sucker payoff. He compares this to the payoffs of his non-cooperatiave neighbors. The best these can do is to get three punishments and a temptation. These are the strategies appropriate among individuals lacking memory or recognition skills. They find that, for a variety of spatial configurations and distributions of strategies, evolution depends on relative payoffs in a uniform way.

For a narrow range of intermediate values, we get successions of complex patterns like those noted by Axelrod.

The evolving patterns exhibit great variety. The idea that these simulations partially explain the persistence of cooperation in nature has been questioned on the grounds that they assume deterministic error-free moves and updates. But the authors report similar phenomena under a variety of error-conditions, although lower relative temptation values are then required for the survival of the cooperators, and the level of error cannot exceed a certain threshold.

See Mukherjii et al, and the reply by Nowak et al that immediately follows it. In general their observations confirm the plausible conjecture that cooperative outcomes are more common in SPDs than ordinary EPDs.

Simulations starting with all of the pure reactive strategies of Nowak and Sigmund i. Simulations starting with all of the 64 possible pure strategies in which a move may depend on the opponent's previous two moves, ended with mixed populations of survivors employing a variety of TFT -like strategies.

Again, other outcomes are possible. Simulations starting with many viz. Simulations beginning with a random selection of a few viz. This contrasts with the continuous cycles for the non-spatialized versions of the evolutionary optional PD's discussed above.

Like the earlier observations, it may help to explain how a group might achieve a state other than universal defection, but not how it might achieve a state of universal cooperation.

For social applications, and probably even for many biological ones, there seems to be no motivation for any particular geometrical arrangement. Nevertheless SPD models of the evolution of cooperation in particular geometrical arrangements have given us some suggestive and pretty pictures to contemplate.

Several examples are accessible through the links at the end of this entry. One way to make the idea of local interaction more realistic for some applications is to let agents choose the partners with whom to interact, based on payoffs in past interactions.

Skyrms considers iterated PDs among a population of unconditional cooperators and defectors. Initially, as usual, each agent chooses a partner at random from the remaining members of the population.

In a typical PD, where the payoffs for temptation, reward, punishment and sucker are 3, 2, 1 and 0, both cooperators and defectors eventually choose only cooperators.

Since the cooperators are chosen by both cooperators and defectors, they play more often than the defectors who play only when they are doing the choosing.

Defectors can expect a return of one temptation payoff per play, but they play half as often. Here, as before, the cooperators quickly learn not to choose defectors as partners. The defectors get roughly the same payoffs whether they choose cooperators or defectors as partners. Since they rapidly cease being chosen by cooperators, however, their returns from interactions with cooperators will be less than returns from defectors and they will soon limit their choices to other defectors.

So in the attenuated game we end up with perfect association: defectors play defectors and cooperators play cooperators. Since the reward payoff slightly exceeds the punishment payoff, the cooperators again do better than the defectors. The social network games considered above are not really evolutionary PDs in the sense described above.

The patterns of interaction evolve, but the strategy profile of the population remains fixed. It is natural to allow both strategies and probabilities of interaction to evolve simultaneously as payoffs are distributed. Whether cooperation or defection or neither comes to dominate the population under such conditions depends on a multitude of factors: the values of the payoffs, the initial distribution of strategies, the relative speed of the adjustments in strategy and interaction probabilities, and other properties of those two evolutionary dynamics.

Skyrms contains a general discussion and a number of suggestive examples, but it does not provide or aim to provide a comprehensive account of social network PDs or a careful analysis of precise formulations to properly model particular phenomena.

Much remains unknown. In a social network game, agents choose from a population of potential opponents; in the version of the IPD that interested Axelrod, agents must play every other member of the population of which they are a part. The original description of the IPD by Dresher and Flood, however, concerned a single pair of players who repeatedly play the same PD game.

In a brief, but influential, paper a pair of distinguished physicists, William Press and Freeman Dyson, recently returned attention to this original version of the IPD, or rather to the infinitely repeated version of it.

In other versions of the IPD, where pairs from a larger population come together repeatedly to play the game, a successful strategy is one that scores well. Under those conditions, it is much more important, in a particular round of the game, to raise your own score than to lower your opponent's. Axelrod repeatedly and with cause advised participants in his tournaments not to be envious.

In the 2IPD, however, the population size is two. In that case it is as valuable to lower your opponent's payoff as to raise your own, and it can even be beneficial to lower your own payoff, if doing so lowers your opponent's more than yours.

Another noteworthy feature of the 2IPD, proved rigorously in Press and Dyson Appendix A , is that a long memory is unnecessary to play well. Suppose I adopt a memory-one strategy i. Then Press and Dyson show that you can't benefit by using a longer memory: whatever strategy you adopt, there is an equivalent memory-one strategy you could have adopted that would net us both the same scores.

By adopting a memory-one strategy myself, I ensure that a longer memory will be of no benefit to you. Thus we can, without loss of generality, take the 2IPD game to be a game between agents with memory-one strategies. A 2IPD game between memory-one agents and indeed any 2-player, 2-move game between memory-one agents can be represented in a particularly perspicuous way.

Viewing a game in this way makes it possible apply the machinery of matrix algebra and Markov chains, which led Press and Dyson to the identification of the class of Zero-Determinant ZD strategies. A simpler proof of Press and Dyson's central result, employing more modest mathematical machinery, is given in Appendix A of Hilbe et al. A ZD strategy is a strategy by which a player can ensure a fixed linear relation between his own long-term average payoff and his opponent's.

For other choices, you may get a payoff between the punishment and reward. Whatever you choose, however, you will still get the same payoff as me. There are a variety of such ZD strategies for the IPD and indeed for most two-player, two-move games.

For a standard PD with payoffs 5, 3, 1, 0, three other representative ZD strategies are the following:. In the memory-one 2IPD a player can set his opponent's strategy to any value between the punishment and reward payoffs. Hilbe et al. Of course, a more witting Player Two might realize that the same dictatorial strategies are available to her.

These will be of no use, however, unless they lead to a shift in Player One's behavior. If the existence of the dictator strategies is common knowledge for the two players, then they might profitably agree to set each other's scores to the reward payoff. Since each is employing a dictator strategy, neither can benefit in the short term by deviating.

If either deviates in hopes of a long term gain, the other could detect it by the change in his or her own payoff and take retaliatory action. Whether such an agreement is stable, of course, depends on whether the players can make their threats of retaliation credible. Player Two can, of course, guarantee herself a return of at least one by constant defection. If she does, Player One will defect with increasing frequency and their average payoffs will both approach the punishment value.

Because of the linear relation that links their payoffs, however, if she does better than this, she will necessarily lose to Player One. Indeed, any increment above punishment to her own score will be only half the same increment to her opponent's. No tricks are needed. Whatever she does to increase her own payoff, will, of necessity, increase the extortionist's by double the amount.

This would lower both of their payoffs in the short term, but she might hope for better results in the long term. The extortionist proposes an unfair division of the joint payoffs, leaving his opponent with the unhappy choice of accepting it or making both players worse off. It is perhaps worth noting that this analysis omits the possibility that the extorted party is aware of the payoffs to her opponent as well as herself, and, realizing that the IPD is being played between just two agents, seeks to minimize the difference between her opponent's payoff and her own.

Adopting such an attitude would presumably lead her to a strategy of unconditional defection. The payoffs of both players would then approach the punishment value, the extortionist's from below and the extorted's from above. Neither dictatorial nor extortionary strategies would seem likely to fare well in an evolutionary setting with larger populations. This leads to over-consumption and ultimately depletion of the common resource, to everybody's detriment. Basically, it highlights the concept of individuals neglecting the well-being of society in the pursuit of personal gain.

In the real world most economic and other human interactions are repeated more than once. This allows parties to choose strategies that reward co-operation or punish defection over time.

Another solution relies on developing formal institutional strategies to alter the incentives that individual decision makers face. Prisoner's dilemma can sometimes actually make society better off as a whole. A prime example is the behavior of an oil cartel. All cartel members can collectively enrich themselves by restricting output to keep the price of oil at a level where each maximizes revenue received from consumers, but each cartel member individually has an incentive to cheat on the cartel and increase output to also capture revenue away from the other cartel members.

The end result is not the optimal outcome that the cartel desires but, rather, an outcome that benefits the consumer in terms of lower oil prices.

Behavioral Economics. Your Privacy Rights. To change or withdraw your consent choices for Investopedia. At any time, you can update your settings through the "EU Privacy" link at the bottom of any page. These choices will be signaled globally to our partners and will not affect browsing data. We and our partners process data to: Actively scan device characteristics for identification.

I Accept Show Purposes. Your Money. Personal Finance. Your Practice. For example, assume you are in the market for a new car and you walk into a car dealership. The utility or payoff, in this case, is a non-numerical attribute i. You want to get the best possible deal in terms of price, car features, etc. On the other hand, defecting means bargaining. You want a lower price, while the salesman wants a higher price. Assigning numerical values to the levels of satisfaction, where 10 means fully satisfied with the deal and 0 implies no satisfaction, the payoff matrix is as shown below:.

What does this matrix tell us? If you drive a hard bargain and get a substantial reduction in the car price, you are likely to be fully satisfied with the deal, but the salesman is likely to be unsatisfied because of the loss of commission as can be seen in cell b.

Conversely, if the salesman sticks to his guns and does not budge on price, you are likely to be unsatisfied with the deal while the salesman would be fully satisfied cell c. Your satisfaction level may be less if you simply walked in and paid full sticker price cell a. Cell d shows a much lower degree of satisfaction for both buyer and seller, since prolonged haggling may have eventually led to a reluctant compromise on the price paid for the car.

Cooperating by taking the first offer may seem like an easy solution in a difficult job market , but it may result in you leaving some money on the table. Defecting i. Conversely, if the employer is not willing to pay more, you may be dissatisfied with the final offer.

Hopefully, the salary negotiations do not turn acrimonious, since that may result in a lower level of satisfaction for you and the employer. The buyer-salesman payoff matrix shown earlier can be easily extended to show the satisfaction level for the job seeker versus the employer. In fact, when shopping for a big-ticket item such as a car, bargaining is the preferred course of action from the consumers' point of view.

Otherwise, the car dealership may adopt a policy of inflexibility in price negotiations, maximizing its profits but resulting in consumers overpaying for their vehicles. Understanding the relative payoffs of cooperating versus defecting may stimulate you to engage in significant price negotiations before you make a big purchase. Harvard Business Review.

Harvard Business School. Pepsi-Cola and the Soft Drink Industry. Behavioral Economics. Company Profiles. Business Essentials. Your Privacy Rights. To change or withdraw your consent choices for Investopedia. At any time, you can update your settings through the "EU Privacy" link at the bottom of any page. These choices will be signaled globally to our partners and will not affect browsing data.

But when everyone is selfish, everyone suffers. As the only two actors in their market, the price each sells cars at has a direct connection to the price the other sells cars at. If one opts to sell at a higher price than the other, they will sell fewer cars as customers transfer. If one sells at a lower price, they will sell more cars at a lower profit margin, gaining customers from the other.

Peterson writes:. Imagine that you serve on the board of Row Cars. In a board meeting, you point out that irrespective of what Col Motors decides to do, it will be better for your company to opt for low prices. Gregory Mankiw gives another real-world example in Microeconomics , detailed here:.

Consider an oligopoly with two members, called Iran and Saudi Arabia. Both countries sell crude oil. After prolonged negotiation, the countries agree to keep oil production low in order to keep the world price of oil high.

After they agree on production levels, each country must decide whether to cooperate and live up to this agreement or to ignore it and produce at a higher level. The following image shows how the profits of the two countries depend on the strategies they choose.

I could keep production low as we agreed, or I could raise my production and sell more oil on world markets. In this case, Saudi Arabia is better off with high production. Once again, Saudi Arabia is better off with high production. So, regardless of what Iran chooses to do, my country is better off reneging on our agreement and producing at a high level.

munwardhalmand1984's Ownd

0コメント

1000 / 1000