skip to main content
research-article
Open Access

Aroma: code recommendation via structural code search

Published:10 October 2019Publication History
Skip Abstract Section

Abstract

Programmers often write code that has similarity to existing code written somewhere. A tool that could help programmers to search such similar code would be immensely useful. Such a tool could help programmers to extend partially written code snippets to completely implement necessary functionality, help to discover extensions to the partial code which are commonly included by other programmers, help to cross-check against similar code written by other programmers, or help to add extra code which would fix common mistakes and errors. We propose Aroma, a tool and technique for code recommendation via structural code search. Aroma indexes a huge code corpus including thousands of open-source projects, takes a partial code snippet as input, searches the corpus for method bodies containing the partial code snippet, and clusters and intersects the results of the search to recommend a small set of succinct code snippets which both contain the query snippet and appear as part of several methods in the corpus. We evaluated Aroma on 2000 randomly selected queries created from the corpus, as well as 64 queries derived from code snippets obtained from Stack Overflow, a popular website for discussing code. We implemented Aroma for 4 different languages, and developed an IDE plugin for Aroma. Furthermore, we conducted a study where we asked 12 programmers to complete programming tasks using Aroma, and collected their feedback. Our results indicate that Aroma is capable of retrieving and recommending relevant code snippets efficiently.

Skip Supplemental Material Section

Supplemental Material

a152-luan

Presentation at OOPSLA '19

References

  1. Sushil Bajracharya, Trung Ngo, Erik Linstead, Yimeng Dou, Paul Rigor, Pierre Baldi, and Cristina Lopes. 2006. Sourcerer: A Search Engine for Open Source Code Supporting Structure-based Search. In Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems, Languages, and Applications (OOPSLA ’06) . ACM, New York, NY, USA, 681–682. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Marcel Bruch, Martin Monperrus, and Mira Mezini. 2009. Learning from Examples to Improve Code Completion Systems. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE ’09) . ACM, New York, NY, USA, 213–222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Raymond P. L. Buse and Westley Weimer. 2012. Synthesizing API Usage Examples. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12) . IEEE Press, Piscataway, NJ, USA, 782–792. http://dl.acm.org/citation.cfm? id=2337223.2337316Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Wing-Kwan Chan, Hong Cheng, and David Lo. 2012. Searching Connected API Subgraph via Text Phrases. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE ’12) . ACM, New York, NY, USA, Article 10, 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Shaunak Chatterjee, Sudeep Juvekar, and Koushik Sen. 2009. SNIFF: A Search Engine for Java Using Free-Form Queries. In Fundamental Approaches to Software Engineering , Marsha Chechik and Martin Wirsing (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 385–400.Google ScholarGoogle Scholar
  6. J. R. Cordy and C. K. Roy. 2011. The NiCad Clone Detector. In 2011 IEEE 19th International Conference on Program Comprehension . 219–220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Hill and J. Rideout. 2004. Automatic method completion. In Proceedings. 19th International Conference on Automated Software Engineering, 2004. 228–235. Google ScholarGoogle ScholarCross RefCross Ref
  8. R. Holmes and G. C. Murphy. 2005. Using structural context to recommend source code examples. In Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005. 117–125. Google ScholarGoogle ScholarCross RefCross Ref
  9. L. Jiang, G. Misherghi, Z. Su, and S. Glondu. 2007. DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones. In 29th International Conference on Software Engineering (ICSE’07). 96–105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Kamiya, S. Kusumoto, and K. Inoue. 2002. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 7 (July 2002), 654–670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Kim, Y. Jung, S. Kim, and K. Yi. 2011. MeCC: memory comparison-based clone detector. In 2011 33rd International Conference on Software Engineering (ICSE) . 301–310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kisub Kim, Dongsun Kim, Tegawendé F. Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY: A Code-to-code Search Engine. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 946–957. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ken Krugler. 2013. Krugle Code Search Architecture. Springer New York, New York, NY, 103–120. Google ScholarGoogle ScholarCross RefCross Ref
  14. Otávio Augusto Lazzarini Lemos, Sushil Krishna Bajracharya, Joel Ossher, Ricardo Santos Morla, Paulo Cesar Masiero, Pierre Baldi, and Cristina Videira Lopes. 2007. CodeGenie: Using Test-cases to Search and Reuse Source Code. In Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering (ASE ’07) . ACM, New York, NY, USA, 525–526. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cristina V. Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéJàVu: A Map of Code Duplicates on GitHub. Proc. ACM Program. Lang. 1, OOPSLA, Article 84 (Oct. 2017), 28 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Fei Lv, Hongyu Zhang, Jian-guang Lou, Shaowei Wang, Dongmei Zhang, and Jianjun Zhao. 2015. CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E). In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE) (ASE ’15) . IEEE Computer Society, Washington, DC, USA, 260–270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Martie, T. D. LaToza, and A. v. d. Hoek. 2015. CodeExchange: Supporting Reformulation of Internet-Scale Code Queries in Context (T). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 24–35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. McMillan, M. Grechanik, D. Poshyvanyk, C. Fu, and Q. Xie. 2012. Exemplar: A Source Code Search Engine for Finding Highly Relevant Applications. IEEE Transactions on Software Engineering 38, 5 (Sept 2012), 1069–1087. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Collin McMillan, Mark Grechanik, Denys Poshyvanyk, Qing Xie, and Chen Fu. 2011. Portfolio: Finding Relevant Functions and Their Usage. In Proceedings of the 33rd International Conference on Software Engineering (ICSE ’11). ACM, New York, NY, USA, 111–120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Laura Moreno, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Andrian Marcus. 2015. How Can I Use This Method?. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE ’15). IEEE Press, Piscataway, NJ, USA, 880–890. http://dl.acm.org/citation.cfm?id=2818754.2818860Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Mover, S. Sankaranarayanan, R. B. Olsen, and B. E. Chang. 2018. Mining framework usage graphs from app corpora. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), Vol. 00. 277–289. Google ScholarGoogle ScholarCross RefCross Ref
  22. Anh Tuan Nguyen, Michael Hilton, Mihai Codoban, Hoan Anh Nguyen, Lily Mast, Eli Rademacher, Tien N. Nguyen, and Danny Dig. 2016a. API Code Recommendation Using Statistical Learning from Fine-grained Changes. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016) . ACM, New York, NY, USA, 511–522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Anh Tuan Nguyen, Tung Thanh Nguyen, Hoan Anh Nguyen, Ahmed Tamrawi, Hung Viet Nguyen, Jafar Al-Kofahi, and Tien N. Nguyen. 2012. Graph-based Pattern-oriented, Context-sensitive Source Code Completion. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12) . IEEE Press, Piscataway, NJ, USA, 69–79. http: //dl.acm.org/citation.cfm?id=2337223.2337232Google ScholarGoogle Scholar
  24. Thanh Nguyen, Ngoc Tran, Hung Phan, Trong Nguyen, Linh Truong, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2018. Complementing Global and Local Contexts in Representing API Descriptions to Improve API Retrieval Tasks. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018) . ACM, New York, NY, USA, 551–562. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Graph-based Mining of Multiple Object Usage Patterns. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE ’09) . ACM, New York, NY, USA, 383–392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Tam The Nguyen, Hung Viet Pham, Phong Minh Vu, and Tung Thanh Nguyen. 2016b. Learning API Usages from Bytecode: A Statistical Approach. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). ACM, New York, NY, USA, 416–427. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Terence Parr. 2013. The Definitive ANTLR 4 Reference (2 ed.). Pragmatic Bookshelf.Google ScholarGoogle Scholar
  28. M. F. Porter. 1997. Readings in Information Retrieval. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, Chapter An Algorithm for Suffix Stripping, 313–316. http://dl.acm.org/citation.cfm?id=275537.275705Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mukund Raghothaman, Yi Wei, and Youssef Hamadi. 2016. SWIM: Synthesizing What I Mean: Code Search and Idiomatic Snippet Synthesis. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). ACM, New York, NY, USA, 357–367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Robbes and M. Lanza. 2008. How Program History Can Improve Code Completion. In 2008 23rd IEEE/ACM International Conference on Automated Software Engineering . 317–326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Saksham Sachdev, Hongyu Li, Sifei Luan, Seohyun Kim, Koushik Sen, and Satish Chandra. 2018. Retrieval on Source Code: A Neural Code Search. In Proceedings of the 2Nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2018) . ACM, New York, NY, USA, 31–41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Caitlin Sadowski, Kathryn T. Stolee, and Sebastian Elbaum. 2015. How Developers Search for Code: A Case Study. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015) . ACM, New York, NY, USA, 191–201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vaibhav Saini, Farima Farmahinifarahani, Yadong Lu, Pierre Baldi, and Cristina V. Lopes. 2018. Oreo: Detection of Clones in the Twilight Zone. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018) . ACM, New York, NY, USA, 354–365. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling Code Clone Detection to Big-code. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). ACM, New York, NY, USA, 1157–1168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Gerard Salton and Michael J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Raphael Sirres, Tegawendé F. Bissyandé, Dongsun Kim, David Lo, Jacques Klein, Kisub Kim, and Yves Le Traon. 2018. Augmenting and structuring user queries to support efficient free-form code search. Empirical Software Engineering 23, 5 (01 Oct 2018), 2622–2654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes. 2014. Live API Documentation. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014) . ACM, New York, NY, USA, 643–652. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Christoph Treude and Martin P. Robillard. 2016. Augmenting API Documentation with Insights from Stack Overflow. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16) . ACM, New York, NY, USA, 392–403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Y. Ueda, T. Kamiya, S. Kusumoto, and K. Inoue. 2002. On detection of gapped code clones using gap locations. In Ninth Asia-Pacific Software Engineering Conference, 2002. 327–336. Google ScholarGoogle ScholarCross RefCross Ref
  40. Julien Verlaguet and Alok Menghrajani. 2014. Hack: a new programming language for HHVM. https://code.fb.com/developertools/hack-a-new-programming-language-for-hhvm/ .Google ScholarGoogle Scholar
  41. Pengcheng Wang, Jeffrey Svajlenko, Yanzhao Wu, Yun Xu, and Chanchal K. Roy. 2018. CCAligner: A Token Based Large-gap Clone Detector. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 1066–1077. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep Learning Code Fragments for Code Clone Detection. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016) . ACM, New York, NY, USA, 87–98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Hao Zhong, Tao Xie, Lu Zhang, Jian Pei, and Hong Mei. 2009. MAPO: Mining and Recommending API Usage Patterns. In ECOOP 2009 – Object-Oriented Programming , Sophia Drossopoulou (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 318–343.Google ScholarGoogle Scholar

Index Terms

  1. Aroma: code recommendation via structural code search

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!