skip to main content
research-article

Code completion with statistical language models

Published:09 June 2014Publication History
Skip Abstract Section

Abstract

We address the problem of synthesizing code completions for programs using APIs. Given a program with holes, we synthesize completions for holes with the most likely sequences of method calls.

Our main idea is to reduce the problem of code completion to a natural-language processing problem of predicting probabilities of sentences. We design a simple and scalable static analysis that extracts sequences of method calls from a large codebase, and index these into a statistical language model. We then employ the language model to find the highest ranked sentences, and use them to synthesize a code completion. Our approach is able to synthesize sequences of calls across multiple objects together with their arguments.

Experiments show that our approach is fast and effective. Virtually all computed completions typecheck, and the desired completion appears in the top 3 results in 90% of the cases.

References

  1. Android-er. http://android-er.blogspot.ch/2011/03/set-wallpaper-using-wallpapermanager.html.Google ScholarGoogle Scholar
  2. Android how-to's. https://sites.google.com/site/androidhowto/how-to-1/display-a-web-page.Google ScholarGoogle Scholar
  3. Stack overflow. http://www.stackoverflow.com/.Google ScholarGoogle Scholar
  4. Tutorial for android. http://www.tutorialforandroid.com/2009/01/changing-screen-brightness.html.Google ScholarGoogle Scholar
  5. Tutorial for android. http://www.tutorialforandroid.com/2009/10/turn-off-turn-on-wifi-in-android-using.html.Google ScholarGoogle Scholar
  6. Vogella tutorials. http://www.vogella.com/articles/AndroidMedia/article.html.Google ScholarGoogle Scholar
  7. Alnusair, A., Zhao, T., and Bodden, E. Effective API navigation and reuse. In IRI (aug. 2010), pp. 7--12.Google ScholarGoogle Scholar
  8. Ammons, G., Bodík, R., and Larus, J. R. Mining specifications. In POPL '02 (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Beckman, N., Kim, D., and Aldrich, J. An empirical study of object protocols in the wild. In ECOOP'11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 3 (Mar. 2003), 1137--1155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cook, J. E., and Wolf, A. L. Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. 7, 3 (1998), 215--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dagenais, B., and Hendren, L. J. Enabling static analysis for partial Java programs. In OOPSLA'08, pp. 313--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Elman, J. L. Finding structure in time. Cognitive Science 14, 2 (1990), 179--211.Google ScholarGoogle ScholarCross RefCross Ref
  14. Gulwani, S. Dimensions in program synthesis. In symp. on Principles and practice of declarative programming (2010), PPDP '10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gvero, T., Kuncak, V., Kuraj, I., and Piskac, R. Complete completion using types and weights. In PLDI '13 (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gvero, T., Kuncak, V., and Piskac, R. Interactive synthesis of code snippets. In CAV'11, vol. 6806 of LNCS. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hindle, A., Barr, E. T., Su, Z., Gabel, M., and Devanbu, P. On the naturalness of software. In ICSE 2012 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Holmes, R., and Murphy, G. C. Using structural context to recommend source code examples. In ICSE '05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Holmes, R., Walker, R. J., and Murphy, G. C. Strathcona example recommendation tool. In FSE'05, pp. 237--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Katz, S. M. Estimation of probabilities from sparse data for the language model component of a speech recognizer. In IEEE Trans. on Acoustics, Speech and Singal processing (March 1987), vol. ASSP-35.Google ScholarGoogle Scholar
  21. Kneser, R., and Ney, H. Improved backing-off for m-gram language modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (May 1995), vol. I.Google ScholarGoogle ScholarCross RefCross Ref
  22. Kombrink, S., Mikolov, T., Karafiát, M., and Burget, L. Recurrent neural network based language modeling in meeting recognition. In INTERSPEECH (2011), pp. 2877--2880.Google ScholarGoogle ScholarCross RefCross Ref
  23. Mandelin, D., Xu, L., Bodík, R., and Kimelman, D. Jungloid mining: Helping to navigate the api jungle. In PLDI '05 (2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mikolov, T., Deoras, A., Povey, D., Burget, L., and Cernocky, J. Strategies for training large scale neural network language models. In ASRU 2011 (2011), IEEE Signal Processing Society.Google ScholarGoogle ScholarCross RefCross Ref
  25. Mishne, A., Shoham, S., and Yahav, E. Typestate-based semantic code search over partial programs. In OOPSLA '12 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Perelman, D., Gulwani, S., Ball, T., and Grossman, D. Type-directed completion of partial expressions. In PLDI (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Reiss, S. P. Semantics-based code search. In ICSE'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Rosenfeld, R. Two decades of statistical language modeling: Where do we go from here. In Proceedings of the IEEE (2000), p. 2000.Google ScholarGoogle Scholar
  29. Shoham, S., Yahav, E., Fink, S., and Pistoia, M. Static specification mining using automata-based abstractions. In ISSTA '07 (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Solar-Lezama, A. The sketching approach to program synthesis. In APLAS '09 (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Solar-Lezama, A., Tancau, L., Bodík, R., Seshia, S. A., and Saraswat, V. A. Combinatorial sketching for finite programs. In ASPLOS (2006), pp. 404--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Srivastava, S., Gulwani, S., and Foster, J. S. From program verification to program synthesis. In POPL '10 (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Steensgaard, B. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages (1996), POPL '96, pp. 32--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Stolcke, A. SRILM-an Extensible Language Modeling Toolkit. International Conference on Spoken Language Processing (2002).Google ScholarGoogle Scholar
  35. Thummalapenta, S., and Xie, T. Parseweb: a programmer assistant for reusing open source code on the web. In ASE '07 (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Vallée-Rai, R., et al. Soot - a Java Optimization Framework. In Proceedings of CASCON 1999 (1999), pp. 125--135.Google ScholarGoogle Scholar
  37. Vechev, M., and Yahav, E. Deriving linearizable fine-grained concurrent objects. In PLDI '08 (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wasylkowski, A., and Zeller, A. Mining temporal specifications from object usage. In Autom. Softw. Eng. (2011), vol. 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Weimer, W., and Necula, G. Mining temporal specifications for error detection. In TACAS'05, vol. 3440 of LNCS. 2005, pp. 461--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Witten, I. H., and Bell, T. C. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory 37, 4 (1991), 1085--1094. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yang, J., Evans, D., Bhardwaj, D., Bhat, T., and Das, M. Perracotta: mining temporal API rules from imperfect traces. In ICSE '06, pp. 282--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Yessenov, K., Xu, Z., and Solar-Lezama, A. Data-driven synthesis for object-oriented frameworks. In OOPSLA '11 (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zhong, H., Xie, T., Zhang, L., Pei, J., and Mei, H. MAPO: Mining and recommending API usage patterns. In ECOOP'09. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Code completion with statistical language models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 49, Issue 6
      PLDI '14
      June 2014
      598 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2666356
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2014
        619 pages
        ISBN:9781450327848
        DOI:10.1145/2594291

      Copyright © 2014 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 June 2014

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!