Abstract
The prisoner's dilemma (PD) is the leading metaphor for the evolution of cooperative behavior in populations of selfish agents. Although cooperation in the iterated prisoner's dilemma (IPD) has been studied for over twenty years, most of this research has been focused on strategies that involve nonlearned behavior. Another approach is to suppose that players' selection of the preferred reply might he enforced in the same way as feeding animals track the best way to feed in changing nonstationary environments. Learning mechanisms such as operant conditioning enable animals to acquire relevant characteristics of their environment in order to get reinforcements and to avoid punishments. In this study, the role of operant conditioning in the learning of cooperation was evaluated in the PD. We found that operant mechanisms allow the learning of IPD play against other strategies. When random moves are allowed in the game, the operant learning model showed low sensitivity. On the basis of this evidence, it is suggested that operant learning might be involved in reciprocal altruism.
- 1. Arita, T., & Suzuki, R. (2000). Interactions between learning and evolution: The outstanding strategy generated by the Baldwin effect. In Proceedings of Artificial Life VII (pp. 196-205).Google Scholar
- 2. Ainslie, G. W. (1974). Impulse control in pigeons. Journal of the Experimental Analysis of Behavior. 21, 485-489.Google Scholar
Cross Ref
- 3. Amsel, A. (1992). Frustration theory. Cambridge, UK: Cambridge University Press.Google Scholar
- 4. Ashby, W. R. (1952, 1960). Design for a brain. New York: Wiley.Google Scholar
- 5. Axelrod, R., & Hamilton, W. D. (1981). The evolution of cooperation. Science. 211, 1390-1396.Google Scholar
Cross Ref
- 6. Axelrod, R. (1984). The evolution of cooperation. New York: Basic Books.Google Scholar
- 7. Baldwin, J. M. (1896). A new factor in evolution. American Naturalist, 30, 441-451, 536-553.Google Scholar
Cross Ref
- 8. Bates, J. F., & Goldman-Rakic, P. S. (1993). Prefrontal connections of medial motor areas in the rhesus monkey. Journal of Comparative Neurolobiology, 336, 211-228.Google Scholar
Cross Ref
- 9. Bolles, R. C. (1969). Avoidance and escape learning: Simultaneous acquisition of different responses. Journal of comparative Psychology, and Physiology, 68, 355-358.Google Scholar
Cross Ref
- 10. Boyd, R. (1989). Mistakes allow evolutionary stability in the repeated prisoner's dilemma game. Journal of Theoretical Biology, 136, 47-56.Google Scholar
Cross Ref
- 11. Brembs, B. (1996). Chaos, cheating and cooperation: Potential solutions to the prisoner's dilemma. Oikos, 76, 14-24.Google Scholar
Cross Ref
- 12. Clements, K. C., & Stephens, D. W. (1995). Testing models of non-kin cooperation: Mutualism and the prisoner's dilemma. Animal Behaviour, 50, 527-549.Google Scholar
Cross Ref
- 13. Connor, R. C. (1986). Pseudo-reciprocity: Investing in mutualism. Animal Behavior 34, 1562-1566.Google Scholar
Cross Ref
- 14. Crespi, L. P. (1942). Quantitative variation in incentive and performance in the white rat. American Journal of Psychology, 5, 467-517.Google Scholar
Cross Ref
- 15. Darwin, C. (1859). The origin of species. London: John Murray.Google Scholar
- 16. Dragoi, V., & Staddon, J. E. R. (1999). The dynamics of operant conditioning. Psychological Review, 106, 20-61.Google Scholar
Cross Ref
- 17. Dugatkin, L. A. (1997). Cooperation among animals. An evolutionary approach. Oxford Series in Ecology and Evolution. Oxford, UK: Oxford University Press.Google Scholar
- 18. Flood, M., Lendenmann, K., & Rapoport, A. (1983). 2 ×× 2 games played by rats: Different delays of reinforcement as payoffs. Behavioral Science 28, 65-78.Google Scholar
Cross Ref
- 19. Fuster, J. M. (1997). The prefrontal cortex: Anatomy, physiology, and neurophysiology of the frontal lobe (p. 25). Philadelphia: Lippincott-Raven.Google Scholar
- 20. Gardner, R. M., Corbin, T. L., Beltramo, J. S., & Nickell, G. S. (1984). The prisoner's dilemma game and cooperation in the rat. Psychological Reports, 55, 687-696.Google Scholar
Cross Ref
- 21. Green, L., Price, P. C., & Hamburger, M. E. (1995). Prisoner's dilemma and the pigeon: Control by immediate consequences. Journal of the Experimental Analysis of Behavior, 64, 1-17.Google Scholar
Cross Ref
- 22. Grossman, K. E. (1973). Continuous, fixed-ratio, and fixed-interval reinforcement in honey bees. Journal of the Experimental Analysis of Behavior, 20, 105-109.Google Scholar
Cross Ref
- 23. Goldman-Rakic, P. S. (1987). Circuitry of primate prefrontal cortex and regulation of behavior by representational memory. In F. Plum (Ed.), Handbook of physiology: The nervous system (pp. 373-317). Bethesda, MD: American Physiology Society.Google Scholar
- 24. Hamilton, W. D. (1964). The genetical evolution of social behavior I. Journal of Theoretical Biology, 7, 1-16.Google Scholar
Cross Ref
- 25. Hebb, D. O. (1949). The organization of behavior: A neuropsychological theory. New York: Wiley.Google Scholar
- 26. Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of Experimental Animal Behavior, 4, 267-272.Google Scholar
Cross Ref
- 27. Herrnstein, R. J. (1969). Method and theory in the study of avoidance. Psychological Review, 76, 49-69.Google Scholar
Cross Ref
- 28. Herrnstein, R. J., & Hineline, D. H. (1966). Negative reinforcement as shock-frequency reduction. Journal of the Experimental Analysis of Behavior, 9, 421-430.Google Scholar
Cross Ref
- 29. Hull, C. L. (1943). Principles of behavior. New York: Appleton Century-Crofts.Google Scholar
- 30. Kacelnik, A., Kreps, J. R., & Ens, B. (1987). Foraging in a changing environment: An experiment with starlings. In M. L. Commons, A. Kacelnik, & S. L. Shettleworth (Eds.), Quantitative analyses of behavior VI. Foraging (pp. 63-87). Hillsdale, NJ: Erlbaum.Google Scholar
- 31. Kamin, L. J. (1957). The gradient of delay of secondary reward in avoidance learning. Journal of Comparative and Physiological Psychology, 50, 445-449.Google Scholar
Cross Ref
- 32. Lew, S. E., Wedemeyer, C., & Zanutto, B. S. (2001). Role of unconditioned stimulus prediction in the operant learning: A neural network model. In Proceeding of IEEE Conference on Neural Networks (pp. 331-336).Google Scholar
- 33. Mackintosh, N. J. (1974). The psychology of animal learning. San Diego, CA: Academic Press.Google Scholar
- 34. Macy, M. W., & Flache, A. (2002). Learning dynamics in social dilemmas (2002). Proceedings of the National Academy of Sciences of the United States of America, 3, 7229-7236.Google Scholar
Cross Ref
- 35. Mazur, J. E. (1984). Test of an equivalence rule for fixed and variable reinforcer delays. Journal of Experimental Psychology: Animal Behavior Processes, 10, 426-436.Google Scholar
Cross Ref
- 36. Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. In M. L. Commons, J. E. Mazur, J. A. Nevin, & H. Rachlin (Eds.), Quantitative analyses of behavior, Vol 5: The effect of delay and intervening events on reinforcement value (pp. 55-73). Hillsdale, NJ: Erlbaum.Google Scholar
- 37. Mowrer, O. H. (1947). On the dual nature of learning, an interpretation of conditioning and problem solving. Harvard Educational Review, 17, 102-148.Google Scholar
- 38. Nowak, M., & Sigmund, K. (1993). A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner's dilemma game. Nature, 364, 56-58.Google Scholar
Cross Ref
- 39. Overmier, J. B., & Seligman, M. E. P. (1967). Effects of inescapable shock upon subsequent escape and avoidance responding. Journal of Comparative and Physiological Psycology, 63, 28-33.Google Scholar
Cross Ref
- 40. Pear, J. J. (2001). The science of learning. Philadelphia: Psychology Press.Google Scholar
- 41. Pycock, C. J., Kerwin, R. W., & Carter, C. J. (1980). Effect of lesion of cortical dopamine terminals on subcortical dopamine receptors in rats. Nature, 286, 74-76.Google Scholar
Cross Ref
- 42. Rachlin, H., & Green, L. (1972). Commitment, choice and self-control. Journal of the Experimental Analysis of Behavior, 17, 15-22.Google Scholar
Cross Ref
- 43. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II. Current research and theory. New York: Appleton-Century-Crofts.Google Scholar
- 44. Rilling, J. K., Gutman, D. A., Zeh, T. R., Pagnoni, G., Berns, G. S., & Kilts, C. D. (2002). A neural basis for social cooperation. Neuron, 35, 395-405.Google Scholar
Cross Ref
- 45. Sandholm, T., & Crites, R. H. (1995). Multiagent reinforcement learning in the iterated prisoner's dilemma. Biosystems Journal, 37, 147-166.Google Scholar
Cross Ref
- 46. Schmajuk, N., & Zanutto, B.S. (1997). Escape, avoidance and imitation: A neural network approach. Adaptive Behavior, 6, 63-129. Google Scholar
Digital Library
- 47. Schmajuk, N., Urry, D., & Zanutto, B.S. (1998). The frightening complexity of avoidance: A neural network approach. In C. Wynne & J. Staddon (Eds.), Models of action: Mechanisms for adaptive behavior. Hillsdale, NJ: Erlbaum.Google Scholar
- 48. Schneirla, T. C. (1943). The nature of ant learning: II. The intermediate stage of segmental maze adjustment. Journal of Comparative Psychology, 34, 149-176.Google Scholar
Cross Ref
- 49. Schultz, W. (2002). Getting formal with dopamine and reward. Neuro, 36, 241-263.Google Scholar
- 50. Schultz, W., Dayan P., & Montague, R. (1997). A neural substrate of prediction and reward. Science, 275, 1593-1598.Google Scholar
Cross Ref
- 51. Seligman, M. E. P., & Johnston, J. C. (1973). A cognitive theory of avoidance learning. In F. J. McGuigan & D. B. Lumsdenm (Eds.), Contemporary approaches to conditioning and learning. Washington, DC: Winston-Wiley.Google Scholar
- 52. Seligman, M. E. P., Rosellini, R. A., & Kozak, M. J. (1975). Learned helplessness in the rat: Time course, immunization and reversibility. Journal of comparative and physiological psychology, 88, 542-547.Google Scholar
Cross Ref
- 53. Simpson, G. G. (1953). The Baldwin effect, Evolution, 7, 110-117.Google Scholar
Cross Ref
- 54. Solomon, R. L., & Wynne, L. C. (1953). Traumatic avoidance learning: Acquisition in normal dogs. Psychological Monographs, 67, 354.Google Scholar
Cross Ref
- 55. Sutton, R. S., & Barto, A. G. (1981). Toward a modern theory of adaptive networks: Expectation and prediction. Psychological Review, 88, 135-170.Google Scholar
Cross Ref
- 56. Staddon, J. E. R. (1983). Adaptive behavior and learning. Cambridge, UK: Cambridge University Press.Google Scholar
- 57. Staddon, J. E. R., & Zhang, Y. (1991). On the assignment-of-credit problem in operant learning. In M. L. Commons, S. Grossberg, & J. E. R. Staddon (Eds.), Neural network models of conditioning and action. Hillsdale, NJ: Erlbaum.Google Scholar
- 58. Staddon J. E. R., & Zanutto, B. S. (1997). Feeding dynamics: Why rats eat in meals and what this means for foraging and feeding regulation. In M. E. Bouton & M. S. Fanselow (Eds.), Learning, motivation, and cognition: The functional behaviorism of Robert C. Bolles (pp. 131-162). Washington, DC: American Psychological Association.Google Scholar
- 59. Staddon, J. E. R., & Zanutto, B. S. (1998). In praise of parsimony. In C. Wynne & J. Staddon (Eds.), Models of action: Mechanisms for adaptive behavior. Hillsdale, NJ: Erlbaum.Google Scholar
- 60. Stephens, D. W., & Clements, K. C. (1995). Game theory and learning: The law of effect and altruistic cooperation, in L. A. Dugatkin & H. K. Reeve (Eds.), Advances in game theory and the study of animal behavior, Oxford, UK: Oxford University Press.Google Scholar
- 61. Stephens, D. W., McLinn, C. M., & Stevens, J. R. (2002). Discounting and reciprocity in an interated prisoner's dilemma. Science, 298, 2216-2218.Google Scholar
Cross Ref
- 62. Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. Psychological Review Monograph supplement, 2, 8.Google Scholar
- 63. Thorndike, E. L. (1911). Animal intelligence. New York: Macmillan.Google Scholar
- 64. Trivers, R. L. (1971). The evolution of reciprocal altruism. Quarterly Review of Biology, 46, 35-57.Google Scholar
Cross Ref
- 65. Trivers, R. L. (1985). Social evolution. Menlo Park, CA: Benjamin Cummings.Google Scholar
- 66. Waelti, P., Dickinson, A., & Schultz, W. (2001). Dopamine responses comply with basic assumptions of formal learning theory. Nature, 412, 43-48.Google Scholar
Cross Ref
- 67. Watanabe-Saguaguchi, K., Kubota, K., & Arikuni, T. (1991). Cytoarchitecture and intrafrontal connections of the frontal cortex of the brain of the Hamadryas baboon (Papio hamadryas). Journal of Comparative Neurology, 311, 108-133.Google Scholar
Cross Ref
- 68. Watkins, C. J. (1989) Learning with delayed rewards. Ph.D. dissertation, Psychology Department, Cambridge University.Google Scholar
- 69. Wiener, N. (1948, 1961). Cybernetics or control and communication in the animal and the machine, Cambridge, MA: MIT Press. Google Scholar
- 70. Wilson, D.S. (1975). A theory, of group selection. Proceedings of the National Academy of Sciences of the U.S.A., 72, 143-146.Google Scholar
Cross Ref
- 71. Zanutto, B. S., & Lew, S. (2000). A neural network model of aversive behavior. In M.H. Hamza (Ed.), Proceedings of the IASTED Neural Networks NN'2000 (pp. 118-123). Zürich: IASTED/ACTA Press.Google Scholar
Index Terms
Cooperation in the iterated prisoner's dilemma is learned by operant conditioning mechanisms
Recommendations
Differences between the iterated prisoner's dilemma and the chicken game under noisy conditions
SAC '02: Proceedings of the 2002 ACM symposium on Applied computingThe prisoner's dilemma has evolved into a standard game for analyzing the success of cooperative strategies in repeated games. With the aim of investigating the behavior of strategies in some alternative games we analyzed the outcome of iterated games ...
Reaching pareto-optimality in prisoner's dilemma using conditional joint action learning
We consider the learning problem faced by two self-interested agents repeatedly playing a general-sum stage game. We assume that the players can observe each other's actions but not the payoffs received by the other player. The concept of Nash ...
Asynchronous Iterated Prisoner's Dilemma
The prisoner's dilemma is widely accepted as a standard model for studying the emergence of mutual cooperation within populations of selfish individuals. Simulation studies of the prisoner's dilemma, where players make probabilistic choices based on ...




Comments