skip to main content
article

Cooperation in the iterated prisoner's dilemma is learned by operant conditioning mechanisms

Authors Info & Claims
Published:01 September 2004Publication History
Skip Abstract Section

Abstract

The prisoner's dilemma (PD) is the leading metaphor for the evolution of cooperative behavior in populations of selfish agents. Although cooperation in the iterated prisoner's dilemma (IPD) has been studied for over twenty years, most of this research has been focused on strategies that involve nonlearned behavior. Another approach is to suppose that players' selection of the preferred reply might he enforced in the same way as feeding animals track the best way to feed in changing nonstationary environments. Learning mechanisms such as operant conditioning enable animals to acquire relevant characteristics of their environment in order to get reinforcements and to avoid punishments. In this study, the role of operant conditioning in the learning of cooperation was evaluated in the PD. We found that operant mechanisms allow the learning of IPD play against other strategies. When random moves are allowed in the game, the operant learning model showed low sensitivity. On the basis of this evidence, it is suggested that operant learning might be involved in reciprocal altruism.

References

  1. 1. Arita, T., & Suzuki, R. (2000). Interactions between learning and evolution: The outstanding strategy generated by the Baldwin effect. In Proceedings of Artificial Life VII (pp. 196-205).Google ScholarGoogle Scholar
  2. 2. Ainslie, G. W. (1974). Impulse control in pigeons. Journal of the Experimental Analysis of Behavior. 21, 485-489.Google ScholarGoogle ScholarCross RefCross Ref
  3. 3. Amsel, A. (1992). Frustration theory. Cambridge, UK: Cambridge University Press.Google ScholarGoogle Scholar
  4. 4. Ashby, W. R. (1952, 1960). Design for a brain. New York: Wiley.Google ScholarGoogle Scholar
  5. 5. Axelrod, R., & Hamilton, W. D. (1981). The evolution of cooperation. Science. 211, 1390-1396.Google ScholarGoogle ScholarCross RefCross Ref
  6. 6. Axelrod, R. (1984). The evolution of cooperation. New York: Basic Books.Google ScholarGoogle Scholar
  7. 7. Baldwin, J. M. (1896). A new factor in evolution. American Naturalist, 30, 441-451, 536-553.Google ScholarGoogle ScholarCross RefCross Ref
  8. 8. Bates, J. F., & Goldman-Rakic, P. S. (1993). Prefrontal connections of medial motor areas in the rhesus monkey. Journal of Comparative Neurolobiology, 336, 211-228.Google ScholarGoogle ScholarCross RefCross Ref
  9. 9. Bolles, R. C. (1969). Avoidance and escape learning: Simultaneous acquisition of different responses. Journal of comparative Psychology, and Physiology, 68, 355-358.Google ScholarGoogle ScholarCross RefCross Ref
  10. 10. Boyd, R. (1989). Mistakes allow evolutionary stability in the repeated prisoner's dilemma game. Journal of Theoretical Biology, 136, 47-56.Google ScholarGoogle ScholarCross RefCross Ref
  11. 11. Brembs, B. (1996). Chaos, cheating and cooperation: Potential solutions to the prisoner's dilemma. Oikos, 76, 14-24.Google ScholarGoogle ScholarCross RefCross Ref
  12. 12. Clements, K. C., & Stephens, D. W. (1995). Testing models of non-kin cooperation: Mutualism and the prisoner's dilemma. Animal Behaviour, 50, 527-549.Google ScholarGoogle ScholarCross RefCross Ref
  13. 13. Connor, R. C. (1986). Pseudo-reciprocity: Investing in mutualism. Animal Behavior 34, 1562-1566.Google ScholarGoogle ScholarCross RefCross Ref
  14. 14. Crespi, L. P. (1942). Quantitative variation in incentive and performance in the white rat. American Journal of Psychology, 5, 467-517.Google ScholarGoogle ScholarCross RefCross Ref
  15. 15. Darwin, C. (1859). The origin of species. London: John Murray.Google ScholarGoogle Scholar
  16. 16. Dragoi, V., & Staddon, J. E. R. (1999). The dynamics of operant conditioning. Psychological Review, 106, 20-61.Google ScholarGoogle ScholarCross RefCross Ref
  17. 17. Dugatkin, L. A. (1997). Cooperation among animals. An evolutionary approach. Oxford Series in Ecology and Evolution. Oxford, UK: Oxford University Press.Google ScholarGoogle Scholar
  18. 18. Flood, M., Lendenmann, K., & Rapoport, A. (1983). 2 ×× 2 games played by rats: Different delays of reinforcement as payoffs. Behavioral Science 28, 65-78.Google ScholarGoogle ScholarCross RefCross Ref
  19. 19. Fuster, J. M. (1997). The prefrontal cortex: Anatomy, physiology, and neurophysiology of the frontal lobe (p. 25). Philadelphia: Lippincott-Raven.Google ScholarGoogle Scholar
  20. 20. Gardner, R. M., Corbin, T. L., Beltramo, J. S., & Nickell, G. S. (1984). The prisoner's dilemma game and cooperation in the rat. Psychological Reports, 55, 687-696.Google ScholarGoogle ScholarCross RefCross Ref
  21. 21. Green, L., Price, P. C., & Hamburger, M. E. (1995). Prisoner's dilemma and the pigeon: Control by immediate consequences. Journal of the Experimental Analysis of Behavior, 64, 1-17.Google ScholarGoogle ScholarCross RefCross Ref
  22. 22. Grossman, K. E. (1973). Continuous, fixed-ratio, and fixed-interval reinforcement in honey bees. Journal of the Experimental Analysis of Behavior, 20, 105-109.Google ScholarGoogle ScholarCross RefCross Ref
  23. 23. Goldman-Rakic, P. S. (1987). Circuitry of primate prefrontal cortex and regulation of behavior by representational memory. In F. Plum (Ed.), Handbook of physiology: The nervous system (pp. 373-317). Bethesda, MD: American Physiology Society.Google ScholarGoogle Scholar
  24. 24. Hamilton, W. D. (1964). The genetical evolution of social behavior I. Journal of Theoretical Biology, 7, 1-16.Google ScholarGoogle ScholarCross RefCross Ref
  25. 25. Hebb, D. O. (1949). The organization of behavior: A neuropsychological theory. New York: Wiley.Google ScholarGoogle Scholar
  26. 26. Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of Experimental Animal Behavior, 4, 267-272.Google ScholarGoogle ScholarCross RefCross Ref
  27. 27. Herrnstein, R. J. (1969). Method and theory in the study of avoidance. Psychological Review, 76, 49-69.Google ScholarGoogle ScholarCross RefCross Ref
  28. 28. Herrnstein, R. J., & Hineline, D. H. (1966). Negative reinforcement as shock-frequency reduction. Journal of the Experimental Analysis of Behavior, 9, 421-430.Google ScholarGoogle ScholarCross RefCross Ref
  29. 29. Hull, C. L. (1943). Principles of behavior. New York: Appleton Century-Crofts.Google ScholarGoogle Scholar
  30. 30. Kacelnik, A., Kreps, J. R., & Ens, B. (1987). Foraging in a changing environment: An experiment with starlings. In M. L. Commons, A. Kacelnik, & S. L. Shettleworth (Eds.), Quantitative analyses of behavior VI. Foraging (pp. 63-87). Hillsdale, NJ: Erlbaum.Google ScholarGoogle Scholar
  31. 31. Kamin, L. J. (1957). The gradient of delay of secondary reward in avoidance learning. Journal of Comparative and Physiological Psychology, 50, 445-449.Google ScholarGoogle ScholarCross RefCross Ref
  32. 32. Lew, S. E., Wedemeyer, C., & Zanutto, B. S. (2001). Role of unconditioned stimulus prediction in the operant learning: A neural network model. In Proceeding of IEEE Conference on Neural Networks (pp. 331-336).Google ScholarGoogle Scholar
  33. 33. Mackintosh, N. J. (1974). The psychology of animal learning. San Diego, CA: Academic Press.Google ScholarGoogle Scholar
  34. 34. Macy, M. W., & Flache, A. (2002). Learning dynamics in social dilemmas (2002). Proceedings of the National Academy of Sciences of the United States of America, 3, 7229-7236.Google ScholarGoogle ScholarCross RefCross Ref
  35. 35. Mazur, J. E. (1984). Test of an equivalence rule for fixed and variable reinforcer delays. Journal of Experimental Psychology: Animal Behavior Processes, 10, 426-436.Google ScholarGoogle ScholarCross RefCross Ref
  36. 36. Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. In M. L. Commons, J. E. Mazur, J. A. Nevin, & H. Rachlin (Eds.), Quantitative analyses of behavior, Vol 5: The effect of delay and intervening events on reinforcement value (pp. 55-73). Hillsdale, NJ: Erlbaum.Google ScholarGoogle Scholar
  37. 37. Mowrer, O. H. (1947). On the dual nature of learning, an interpretation of conditioning and problem solving. Harvard Educational Review, 17, 102-148.Google ScholarGoogle Scholar
  38. 38. Nowak, M., & Sigmund, K. (1993). A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner's dilemma game. Nature, 364, 56-58.Google ScholarGoogle ScholarCross RefCross Ref
  39. 39. Overmier, J. B., & Seligman, M. E. P. (1967). Effects of inescapable shock upon subsequent escape and avoidance responding. Journal of Comparative and Physiological Psycology, 63, 28-33.Google ScholarGoogle ScholarCross RefCross Ref
  40. 40. Pear, J. J. (2001). The science of learning. Philadelphia: Psychology Press.Google ScholarGoogle Scholar
  41. 41. Pycock, C. J., Kerwin, R. W., & Carter, C. J. (1980). Effect of lesion of cortical dopamine terminals on subcortical dopamine receptors in rats. Nature, 286, 74-76.Google ScholarGoogle ScholarCross RefCross Ref
  42. 42. Rachlin, H., & Green, L. (1972). Commitment, choice and self-control. Journal of the Experimental Analysis of Behavior, 17, 15-22.Google ScholarGoogle ScholarCross RefCross Ref
  43. 43. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II. Current research and theory. New York: Appleton-Century-Crofts.Google ScholarGoogle Scholar
  44. 44. Rilling, J. K., Gutman, D. A., Zeh, T. R., Pagnoni, G., Berns, G. S., & Kilts, C. D. (2002). A neural basis for social cooperation. Neuron, 35, 395-405.Google ScholarGoogle ScholarCross RefCross Ref
  45. 45. Sandholm, T., & Crites, R. H. (1995). Multiagent reinforcement learning in the iterated prisoner's dilemma. Biosystems Journal, 37, 147-166.Google ScholarGoogle ScholarCross RefCross Ref
  46. 46. Schmajuk, N., & Zanutto, B.S. (1997). Escape, avoidance and imitation: A neural network approach. Adaptive Behavior, 6, 63-129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. 47. Schmajuk, N., Urry, D., & Zanutto, B.S. (1998). The frightening complexity of avoidance: A neural network approach. In C. Wynne & J. Staddon (Eds.), Models of action: Mechanisms for adaptive behavior. Hillsdale, NJ: Erlbaum.Google ScholarGoogle Scholar
  48. 48. Schneirla, T. C. (1943). The nature of ant learning: II. The intermediate stage of segmental maze adjustment. Journal of Comparative Psychology, 34, 149-176.Google ScholarGoogle ScholarCross RefCross Ref
  49. 49. Schultz, W. (2002). Getting formal with dopamine and reward. Neuro, 36, 241-263.Google ScholarGoogle Scholar
  50. 50. Schultz, W., Dayan P., & Montague, R. (1997). A neural substrate of prediction and reward. Science, 275, 1593-1598.Google ScholarGoogle ScholarCross RefCross Ref
  51. 51. Seligman, M. E. P., & Johnston, J. C. (1973). A cognitive theory of avoidance learning. In F. J. McGuigan & D. B. Lumsdenm (Eds.), Contemporary approaches to conditioning and learning. Washington, DC: Winston-Wiley.Google ScholarGoogle Scholar
  52. 52. Seligman, M. E. P., Rosellini, R. A., & Kozak, M. J. (1975). Learned helplessness in the rat: Time course, immunization and reversibility. Journal of comparative and physiological psychology, 88, 542-547.Google ScholarGoogle ScholarCross RefCross Ref
  53. 53. Simpson, G. G. (1953). The Baldwin effect, Evolution, 7, 110-117.Google ScholarGoogle ScholarCross RefCross Ref
  54. 54. Solomon, R. L., & Wynne, L. C. (1953). Traumatic avoidance learning: Acquisition in normal dogs. Psychological Monographs, 67, 354.Google ScholarGoogle ScholarCross RefCross Ref
  55. 55. Sutton, R. S., & Barto, A. G. (1981). Toward a modern theory of adaptive networks: Expectation and prediction. Psychological Review, 88, 135-170.Google ScholarGoogle ScholarCross RefCross Ref
  56. 56. Staddon, J. E. R. (1983). Adaptive behavior and learning. Cambridge, UK: Cambridge University Press.Google ScholarGoogle Scholar
  57. 57. Staddon, J. E. R., & Zhang, Y. (1991). On the assignment-of-credit problem in operant learning. In M. L. Commons, S. Grossberg, & J. E. R. Staddon (Eds.), Neural network models of conditioning and action. Hillsdale, NJ: Erlbaum.Google ScholarGoogle Scholar
  58. 58. Staddon J. E. R., & Zanutto, B. S. (1997). Feeding dynamics: Why rats eat in meals and what this means for foraging and feeding regulation. In M. E. Bouton & M. S. Fanselow (Eds.), Learning, motivation, and cognition: The functional behaviorism of Robert C. Bolles (pp. 131-162). Washington, DC: American Psychological Association.Google ScholarGoogle Scholar
  59. 59. Staddon, J. E. R., & Zanutto, B. S. (1998). In praise of parsimony. In C. Wynne & J. Staddon (Eds.), Models of action: Mechanisms for adaptive behavior. Hillsdale, NJ: Erlbaum.Google ScholarGoogle Scholar
  60. 60. Stephens, D. W., & Clements, K. C. (1995). Game theory and learning: The law of effect and altruistic cooperation, in L. A. Dugatkin & H. K. Reeve (Eds.), Advances in game theory and the study of animal behavior, Oxford, UK: Oxford University Press.Google ScholarGoogle Scholar
  61. 61. Stephens, D. W., McLinn, C. M., & Stevens, J. R. (2002). Discounting and reciprocity in an interated prisoner's dilemma. Science, 298, 2216-2218.Google ScholarGoogle ScholarCross RefCross Ref
  62. 62. Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. Psychological Review Monograph supplement, 2, 8.Google ScholarGoogle Scholar
  63. 63. Thorndike, E. L. (1911). Animal intelligence. New York: Macmillan.Google ScholarGoogle Scholar
  64. 64. Trivers, R. L. (1971). The evolution of reciprocal altruism. Quarterly Review of Biology, 46, 35-57.Google ScholarGoogle ScholarCross RefCross Ref
  65. 65. Trivers, R. L. (1985). Social evolution. Menlo Park, CA: Benjamin Cummings.Google ScholarGoogle Scholar
  66. 66. Waelti, P., Dickinson, A., & Schultz, W. (2001). Dopamine responses comply with basic assumptions of formal learning theory. Nature, 412, 43-48.Google ScholarGoogle ScholarCross RefCross Ref
  67. 67. Watanabe-Saguaguchi, K., Kubota, K., & Arikuni, T. (1991). Cytoarchitecture and intrafrontal connections of the frontal cortex of the brain of the Hamadryas baboon (Papio hamadryas). Journal of Comparative Neurology, 311, 108-133.Google ScholarGoogle ScholarCross RefCross Ref
  68. 68. Watkins, C. J. (1989) Learning with delayed rewards. Ph.D. dissertation, Psychology Department, Cambridge University.Google ScholarGoogle Scholar
  69. 69. Wiener, N. (1948, 1961). Cybernetics or control and communication in the animal and the machine, Cambridge, MA: MIT Press. Google ScholarGoogle Scholar
  70. 70. Wilson, D.S. (1975). A theory, of group selection. Proceedings of the National Academy of Sciences of the U.S.A., 72, 143-146.Google ScholarGoogle ScholarCross RefCross Ref
  71. 71. Zanutto, B. S., & Lew, S. (2000). A neural network model of aversive behavior. In M.H. Hamza (Ed.), Proceedings of the IASTED Neural Networks NN'2000 (pp. 118-123). Zürich: IASTED/ACTA Press.Google ScholarGoogle Scholar

Index Terms

  1. Cooperation in the iterated prisoner's dilemma is learned by operant conditioning mechanisms

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0

          Other Metrics