skip to main content
research-article

Faster program adaptation through reward attribution inference

Published:26 September 2012Publication History
Skip Abstract Section

Abstract

In the adaptation-based programming (ABP) paradigm, programs may contain variable parts (function calls, parameter values, etc.) that can be take a number of different values. Programs also contain reward statements with which a programmer can provide feedback about how well a program is performing with respect to achieving its goals (for example, achieving a high score on some scale). By repeatedly running the program, a machine learning component will, guided by the rewards, gradually adjust the automatic choices made in the variable program parts so that they converge toward an optimal strategy.

ABP is a method for semi-automatic program generation in which the choices and rewards offered by programmers allow standard machine-learning techniques to explore a design space defined by the programmer to find an optimal instance of a program template. ABP effectively provides a DSL that allows non-machine-learning experts to exploit machine learning to generate self-optimizing programs.

Unfortunately, in many cases the placement and structuring of choices and rewards can have a detrimental effect on how an optimal solution to a program-generation problem can be found. To address this problem, we have developed a dataflow analysis that computes influence tracks of choices and rewards. This information can be exploited by an augmented machine-learning technique to ignore misleading rewards and to generally attribute rewards better to the choices that have actually influenced them. Moreover, this technique allows us to detect errors in the adaptive program that might arise out of program maintenance. Our evaluation shows that the dataflow analysis can lead to improvements in performance.

References

  1. P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse reinforcement learning. In International Conference on Machine Learning, pages 1--8, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'Boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In Proceedings of the International Symposium on Code Generation and Optimization, CGO '06, pages 295--305, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison Wesley, 2nd edition, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Andre and S. Russell. State abstraction for programmable reinforcement learning agents. In Eighteenth National Conference on Artificial Intelligence, pages 119--125, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. David Andre. Programmabler Reinforcement Learning Agents. PhD thesis, University of California at Berkeley, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Bauer, M. Erwig, A. Fern, and J. Pinto. Adaptation-Based Program Generation in Java. In ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, pages 81--90, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Bauer, M. Erwig, A. Fern, and J. Pinto. Adaptation-Based Programming in Haskell. In IFIP Working Conference on Domain-Specific Languages, pages 1--23, 2011.Google ScholarGoogle Scholar
  8. S. Bhat, C. L. Isbell, and M. Mateas. On the difficulty of modular reinforcement learning for real-world partial programming. In Proceedings of the 21st national conference on Artificial intelligence - Volume 1, AAAI'06, pages 318--323. AAAI Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Maclin, J. Shavlik, L. Torrey, T. Walker, and E. Wild. Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression. In Proceedings of the Twentieth National Conference on Artificial Intelligence, pages 819--824, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Marthi. Concurrent hierarchical reinforcement learning. In Proceedings of the 20th national conference on Artificial intelligence - Volume 4, AAAI'05, pages 1652--1653. AAAI Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Pinto, A. Fern, T. Bauer, and M. Erwig. Robust learning for adaptive programs by leveraging program structure. In ICMLA, pages 943--948, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Pinto, A. Fern, T. Bauer, and M. Erwig. Improving policy gradient estimates with influence information. Journal of Machine Learning Research, 20: 1--18, 2011.Google ScholarGoogle Scholar
  13. R. Sutton. Temporal credit assignment in reinforcement learning. PhD thesis, University of Massachusetts Amherst, 1984. AAI8410337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.Google ScholarGoogle Scholar
  16. J. Willcock, A. Lumsdaine, and D. Quinlan. Reusable, generic program analyses and transformations. SIGPLAN Not., 45(2): 5--14, October 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Faster program adaptation through reward attribution inference

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 48, Issue 3
      GPCE '12
      March 2013
      140 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2480361
      Issue’s Table of Contents
      • cover image ACM Conferences
        GPCE '12: Proceedings of the 11th International Conference on Generative Programming and Component Engineering
        September 2012
        148 pages
        ISBN:9781450311298
        DOI:10.1145/2371401

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 September 2012

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!