Abstract
In the adaptation-based programming (ABP) paradigm, programs may contain variable parts (function calls, parameter values, etc.) that can be take a number of different values. Programs also contain reward statements with which a programmer can provide feedback about how well a program is performing with respect to achieving its goals (for example, achieving a high score on some scale). By repeatedly running the program, a machine learning component will, guided by the rewards, gradually adjust the automatic choices made in the variable program parts so that they converge toward an optimal strategy.
ABP is a method for semi-automatic program generation in which the choices and rewards offered by programmers allow standard machine-learning techniques to explore a design space defined by the programmer to find an optimal instance of a program template. ABP effectively provides a DSL that allows non-machine-learning experts to exploit machine learning to generate self-optimizing programs.
Unfortunately, in many cases the placement and structuring of choices and rewards can have a detrimental effect on how an optimal solution to a program-generation problem can be found. To address this problem, we have developed a dataflow analysis that computes influence tracks of choices and rewards. This information can be exploited by an augmented machine-learning technique to ignore misleading rewards and to generally attribute rewards better to the choices that have actually influenced them. Moreover, this technique allows us to detect errors in the adaptive program that might arise out of program maintenance. Our evaluation shows that the dataflow analysis can lead to improvements in performance.
- P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse reinforcement learning. In International Conference on Machine Learning, pages 1--8, 2004. Google Scholar
Digital Library
- F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'Boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In Proceedings of the International Symposium on Code Generation and Optimization, CGO '06, pages 295--305, Washington, DC, USA, 2006. IEEE Computer Society. Google Scholar
Digital Library
- A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison Wesley, 2nd edition, 2006. Google Scholar
Digital Library
- D. Andre and S. Russell. State abstraction for programmable reinforcement learning agents. In Eighteenth National Conference on Artificial Intelligence, pages 119--125, 2002. Google Scholar
Digital Library
- David Andre. Programmabler Reinforcement Learning Agents. PhD thesis, University of California at Berkeley, 2003. Google Scholar
Digital Library
- T. Bauer, M. Erwig, A. Fern, and J. Pinto. Adaptation-Based Program Generation in Java. In ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, pages 81--90, 2011. Google Scholar
Digital Library
- T. Bauer, M. Erwig, A. Fern, and J. Pinto. Adaptation-Based Programming in Haskell. In IFIP Working Conference on Domain-Specific Languages, pages 1--23, 2011.Google Scholar
- S. Bhat, C. L. Isbell, and M. Mateas. On the difficulty of modular reinforcement learning for real-world partial programming. In Proceedings of the 21st national conference on Artificial intelligence - Volume 1, AAAI'06, pages 318--323. AAAI Press, 2006. Google Scholar
Digital Library
- R. Maclin, J. Shavlik, L. Torrey, T. Walker, and E. Wild. Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression. In Proceedings of the Twentieth National Conference on Artificial Intelligence, pages 819--824, 2005. Google Scholar
Digital Library
- B. Marthi. Concurrent hierarchical reinforcement learning. In Proceedings of the 20th national conference on Artificial intelligence - Volume 4, AAAI'05, pages 1652--1653. AAAI Press, 2005. Google Scholar
Digital Library
- J. Pinto, A. Fern, T. Bauer, and M. Erwig. Robust learning for adaptive programs by leveraging program structure. In ICMLA, pages 943--948, 2010. Google Scholar
Digital Library
- J. Pinto, A. Fern, T. Bauer, and M. Erwig. Improving policy gradient estimates with influence information. Journal of Machine Learning Research, 20: 1--18, 2011.Google Scholar
- R. Sutton. Temporal credit assignment in reinforcement learning. PhD thesis, University of Massachusetts Amherst, 1984. AAI8410337. Google Scholar
Digital Library
- R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, 2000. Google Scholar
Digital Library
- C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.Google Scholar
- J. Willcock, A. Lumsdaine, and D. Quinlan. Reusable, generic program analyses and transformations. SIGPLAN Not., 45(2): 5--14, October 2009. Google Scholar
Digital Library
Index Terms
Faster program adaptation through reward attribution inference
Recommendations
Faster program adaptation through reward attribution inference
GPCE '12: Proceedings of the 11th International Conference on Generative Programming and Component EngineeringIn the adaptation-based programming (ABP) paradigm, programs may contain variable parts (function calls, parameter values, etc.) that can be take a number of different values. Programs also contain reward statements with which a programmer can provide ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Adaptation-based programming in java
PEPM '11: Proceedings of the 20th ACM SIGPLAN workshop on Partial evaluation and program manipulationWriting deterministic programs is often difficult for problems whose optimal solutions depend on unpredictable properties of the programs' inputs. Difficulty is also encountered for problems where the programmer is uncertain about how to best implement ...






Comments