Abstract
Dyna is an AI architecture that integrates learning, planning, and reactive execution. Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world. Planning is incremental and can use the probabilistic and ofttimes incorrect world models generated by learning processes. Execution is fully reactive in the sense that no planning intervenes between perception and action. Dyna relies on machine learning methods for learning from examples---these are among the basic building blocks making up the architecture---yet is not tied to any particular method. This paper briefly introduces Dyna and discusses its strengths and weaknesses with respect to other architectures.
References
- Barto, A. G., Sutton, R. S., & Watkins, C. J. C. H. (1990) Learning and sequential decision making. In Learning and Computational Neuroscience, M. Gabriel and J.W. Moore (Eds.), 539--602, MIT Press.Google Scholar
- Bertsekas, D. P. (1987) Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall. Google Scholar
Digital Library
- Bertsekas, D. P. & Tsitsiklis, J. N. (1989) Parallel Distributed Processing: Numerical Methods, Prentice-Hall. Google Scholar
Digital Library
- Craik, K. J. W. (1943) The Nature of Explanation. Cambridge University Press, Cambridge, UK.Google Scholar
- Dennett, D. C. (1978) Why the law of effect will not go away. In Brainstorms, by D. C. Dennett, 71--89, Bradford Books.Google Scholar
- Grefenstette, J. J., Ramsey, C. L., & Schultz, A. C. (1990) Learning sequential decision rules using simulation models and competition. Machine Learning 5, 355--382. Google Scholar
Digital Library
- Holland, J. H. (1986). Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In R. Michalski, J. Carbonell & T. Mitchell, Eds., Machine learning II, Morgan Kaufmann.Google Scholar
- Kaelbling, L. P. (1990) Learning in Embedded Systems. Ph.D. thesis, Stanford University. Google Scholar
Digital Library
- Korf, R. E. (1990) Real-Time Heuristic Search. Artificial Intelligence 42: 189--211. Google Scholar
Digital Library
- Lin, Long-Ji. (1991) Self-improving reactive agents: Case studies of reinforcement learning frameworks. In: Proceedings of the International Conference on the Simulation of Adaptive Behavior, 297--305, MIT Press. Google Scholar
Digital Library
- Mahadevan, S. & Connell, J. (1990) Automatic programming of behavior-based robots using reinforcement learning. IBM technical report.Google Scholar
- Riolo, R. (1991) Lookahead planning and latent learning in a classifier system. In: Proceedings of the International Conference on the Simulation of Adaptive Behavior, MIT Press. Google Scholar
Digital Library
- Russell, S. J. (1989) Execution architectures and compilation. Proceedings of IJCAI-89, 15--20.Google Scholar
- Sutton, R. S. (1984) Temporal credit assignment in reinforcement learning. PhD thesis, COINS Dept., Univ. of Mass., Amherst, MA 01003. Google Scholar
Digital Library
- Sutton, R.S. (1988) Learning to predict by the methods of temporal differences. Machine Learning 3: 9--44. Google Scholar
Digital Library
- Sutton, R. S. (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning, 216--224. Google Scholar
Digital Library
- Sutton, R.S., Barto, A.G. (1981) An adaptive network that constructs and uses an internal model of its environment. Cognition and Brain Theory Quarterly 4: 217--246.Google Scholar
- Watkins, C. J. C. H. (1989) Learning with Delayed Rewards. PhD thesis, Cambridge University Psychology Department.Google Scholar
- Werbos, P. J. (1987) Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Transactions on Systems, Man, and Cybernetics, SMC-17, No. 1, 7--20. Google Scholar
Digital Library
- Whitehead, S. D., Ballard, D.H. (1991) Learning to perceive and act by trial and error. Machine Learning 7:, 45--83. Google Scholar
Digital Library
- Whitehead, S. D. (1989) Scaling reinforcement learning systems. Technical Report 304, Dept. of Computer Science, University of Rochester, Rochester, NY 14627.Google Scholar
Index Terms
Dyna, an integrated architecture for learning, planning, and reacting




Comments