Abstract
Programmers often write code that has similarity to existing code written somewhere. A tool that could help programmers to search such similar code would be immensely useful. Such a tool could help programmers to extend partially written code snippets to completely implement necessary functionality, help to discover extensions to the partial code which are commonly included by other programmers, help to cross-check against similar code written by other programmers, or help to add extra code which would fix common mistakes and errors. We propose Aroma, a tool and technique for code recommendation via structural code search. Aroma indexes a huge code corpus including thousands of open-source projects, takes a partial code snippet as input, searches the corpus for method bodies containing the partial code snippet, and clusters and intersects the results of the search to recommend a small set of succinct code snippets which both contain the query snippet and appear as part of several methods in the corpus. We evaluated Aroma on 2000 randomly selected queries created from the corpus, as well as 64 queries derived from code snippets obtained from Stack Overflow, a popular website for discussing code. We implemented Aroma for 4 different languages, and developed an IDE plugin for Aroma. Furthermore, we conducted a study where we asked 12 programmers to complete programming tasks using Aroma, and collected their feedback. Our results indicate that Aroma is capable of retrieving and recommending relevant code snippets efficiently.
Supplemental Material
- Sushil Bajracharya, Trung Ngo, Erik Linstead, Yimeng Dou, Paul Rigor, Pierre Baldi, and Cristina Lopes. 2006. Sourcerer: A Search Engine for Open Source Code Supporting Structure-based Search. In Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems, Languages, and Applications (OOPSLA ’06) . ACM, New York, NY, USA, 681–682. Google Scholar
Digital Library
- Marcel Bruch, Martin Monperrus, and Mira Mezini. 2009. Learning from Examples to Improve Code Completion Systems. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE ’09) . ACM, New York, NY, USA, 213–222. Google Scholar
Digital Library
- Raymond P. L. Buse and Westley Weimer. 2012. Synthesizing API Usage Examples. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12) . IEEE Press, Piscataway, NJ, USA, 782–792. http://dl.acm.org/citation.cfm? id=2337223.2337316Google Scholar
Digital Library
- Wing-Kwan Chan, Hong Cheng, and David Lo. 2012. Searching Connected API Subgraph via Text Phrases. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE ’12) . ACM, New York, NY, USA, Article 10, 11 pages. Google Scholar
Digital Library
- Shaunak Chatterjee, Sudeep Juvekar, and Koushik Sen. 2009. SNIFF: A Search Engine for Java Using Free-Form Queries. In Fundamental Approaches to Software Engineering , Marsha Chechik and Martin Wirsing (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 385–400.Google Scholar
- J. R. Cordy and C. K. Roy. 2011. The NiCad Clone Detector. In 2011 IEEE 19th International Conference on Program Comprehension . 219–220. Google Scholar
Digital Library
- R. Hill and J. Rideout. 2004. Automatic method completion. In Proceedings. 19th International Conference on Automated Software Engineering, 2004. 228–235. Google Scholar
Cross Ref
- R. Holmes and G. C. Murphy. 2005. Using structural context to recommend source code examples. In Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005. 117–125. Google Scholar
Cross Ref
- L. Jiang, G. Misherghi, Z. Su, and S. Glondu. 2007. DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones. In 29th International Conference on Software Engineering (ICSE’07). 96–105. Google Scholar
Digital Library
- T. Kamiya, S. Kusumoto, and K. Inoue. 2002. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 7 (July 2002), 654–670. Google Scholar
Digital Library
- H. Kim, Y. Jung, S. Kim, and K. Yi. 2011. MeCC: memory comparison-based clone detector. In 2011 33rd International Conference on Software Engineering (ICSE) . 301–310. Google Scholar
Digital Library
- Kisub Kim, Dongsun Kim, Tegawendé F. Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY: A Code-to-code Search Engine. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 946–957. Google Scholar
Digital Library
- Ken Krugler. 2013. Krugle Code Search Architecture. Springer New York, New York, NY, 103–120. Google Scholar
Cross Ref
- Otávio Augusto Lazzarini Lemos, Sushil Krishna Bajracharya, Joel Ossher, Ricardo Santos Morla, Paulo Cesar Masiero, Pierre Baldi, and Cristina Videira Lopes. 2007. CodeGenie: Using Test-cases to Search and Reuse Source Code. In Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering (ASE ’07) . ACM, New York, NY, USA, 525–526. Google Scholar
Digital Library
- Cristina V. Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéJàVu: A Map of Code Duplicates on GitHub. Proc. ACM Program. Lang. 1, OOPSLA, Article 84 (Oct. 2017), 28 pages. Google Scholar
Digital Library
- Fei Lv, Hongyu Zhang, Jian-guang Lou, Shaowei Wang, Dongmei Zhang, and Jianjun Zhao. 2015. CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E). In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE) (ASE ’15) . IEEE Computer Society, Washington, DC, USA, 260–270. Google Scholar
Digital Library
- L. Martie, T. D. LaToza, and A. v. d. Hoek. 2015. CodeExchange: Supporting Reformulation of Internet-Scale Code Queries in Context (T). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 24–35. Google Scholar
Digital Library
- C. McMillan, M. Grechanik, D. Poshyvanyk, C. Fu, and Q. Xie. 2012. Exemplar: A Source Code Search Engine for Finding Highly Relevant Applications. IEEE Transactions on Software Engineering 38, 5 (Sept 2012), 1069–1087. Google Scholar
Digital Library
- Collin McMillan, Mark Grechanik, Denys Poshyvanyk, Qing Xie, and Chen Fu. 2011. Portfolio: Finding Relevant Functions and Their Usage. In Proceedings of the 33rd International Conference on Software Engineering (ICSE ’11). ACM, New York, NY, USA, 111–120. Google Scholar
Digital Library
- Laura Moreno, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Andrian Marcus. 2015. How Can I Use This Method?. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE ’15). IEEE Press, Piscataway, NJ, USA, 880–890. http://dl.acm.org/citation.cfm?id=2818754.2818860Google Scholar
Digital Library
- S. Mover, S. Sankaranarayanan, R. B. Olsen, and B. E. Chang. 2018. Mining framework usage graphs from app corpora. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), Vol. 00. 277–289. Google Scholar
Cross Ref
- Anh Tuan Nguyen, Michael Hilton, Mihai Codoban, Hoan Anh Nguyen, Lily Mast, Eli Rademacher, Tien N. Nguyen, and Danny Dig. 2016a. API Code Recommendation Using Statistical Learning from Fine-grained Changes. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016) . ACM, New York, NY, USA, 511–522. Google Scholar
Digital Library
- Anh Tuan Nguyen, Tung Thanh Nguyen, Hoan Anh Nguyen, Ahmed Tamrawi, Hung Viet Nguyen, Jafar Al-Kofahi, and Tien N. Nguyen. 2012. Graph-based Pattern-oriented, Context-sensitive Source Code Completion. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12) . IEEE Press, Piscataway, NJ, USA, 69–79. http: //dl.acm.org/citation.cfm?id=2337223.2337232Google Scholar
- Thanh Nguyen, Ngoc Tran, Hung Phan, Trong Nguyen, Linh Truong, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2018. Complementing Global and Local Contexts in Representing API Descriptions to Improve API Retrieval Tasks. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018) . ACM, New York, NY, USA, 551–562. Google Scholar
Digital Library
- Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Graph-based Mining of Multiple Object Usage Patterns. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE ’09) . ACM, New York, NY, USA, 383–392. Google Scholar
Digital Library
- Tam The Nguyen, Hung Viet Pham, Phong Minh Vu, and Tung Thanh Nguyen. 2016b. Learning API Usages from Bytecode: A Statistical Approach. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). ACM, New York, NY, USA, 416–427. Google Scholar
Digital Library
- Terence Parr. 2013. The Definitive ANTLR 4 Reference (2 ed.). Pragmatic Bookshelf.Google Scholar
- M. F. Porter. 1997. Readings in Information Retrieval. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, Chapter An Algorithm for Suffix Stripping, 313–316. http://dl.acm.org/citation.cfm?id=275537.275705Google Scholar
Digital Library
- Mukund Raghothaman, Yi Wei, and Youssef Hamadi. 2016. SWIM: Synthesizing What I Mean: Code Search and Idiomatic Snippet Synthesis. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). ACM, New York, NY, USA, 357–367. Google Scholar
Digital Library
- R. Robbes and M. Lanza. 2008. How Program History Can Improve Code Completion. In 2008 23rd IEEE/ACM International Conference on Automated Software Engineering . 317–326. Google Scholar
Digital Library
- Saksham Sachdev, Hongyu Li, Sifei Luan, Seohyun Kim, Koushik Sen, and Satish Chandra. 2018. Retrieval on Source Code: A Neural Code Search. In Proceedings of the 2Nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2018) . ACM, New York, NY, USA, 31–41. Google Scholar
Digital Library
- Caitlin Sadowski, Kathryn T. Stolee, and Sebastian Elbaum. 2015. How Developers Search for Code: A Case Study. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015) . ACM, New York, NY, USA, 191–201. Google Scholar
Digital Library
- Vaibhav Saini, Farima Farmahinifarahani, Yadong Lu, Pierre Baldi, and Cristina V. Lopes. 2018. Oreo: Detection of Clones in the Twilight Zone. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018) . ACM, New York, NY, USA, 354–365. Google Scholar
Digital Library
- Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling Code Clone Detection to Big-code. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). ACM, New York, NY, USA, 1157–1168. Google Scholar
Digital Library
- Gerard Salton and Michael J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA.Google Scholar
Digital Library
- Raphael Sirres, Tegawendé F. Bissyandé, Dongsun Kim, David Lo, Jacques Klein, Kisub Kim, and Yves Le Traon. 2018. Augmenting and structuring user queries to support efficient free-form code search. Empirical Software Engineering 23, 5 (01 Oct 2018), 2622–2654. Google Scholar
Digital Library
- Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes. 2014. Live API Documentation. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014) . ACM, New York, NY, USA, 643–652. Google Scholar
Digital Library
- Christoph Treude and Martin P. Robillard. 2016. Augmenting API Documentation with Insights from Stack Overflow. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16) . ACM, New York, NY, USA, 392–403. Google Scholar
Digital Library
- Y. Ueda, T. Kamiya, S. Kusumoto, and K. Inoue. 2002. On detection of gapped code clones using gap locations. In Ninth Asia-Pacific Software Engineering Conference, 2002. 327–336. Google Scholar
Cross Ref
- Julien Verlaguet and Alok Menghrajani. 2014. Hack: a new programming language for HHVM. https://code.fb.com/developertools/hack-a-new-programming-language-for-hhvm/ .Google Scholar
- Pengcheng Wang, Jeffrey Svajlenko, Yanzhao Wu, Yun Xu, and Chanchal K. Roy. 2018. CCAligner: A Token Based Large-gap Clone Detector. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 1066–1077. Google Scholar
Digital Library
- Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep Learning Code Fragments for Code Clone Detection. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016) . ACM, New York, NY, USA, 87–98. Google Scholar
Digital Library
- Hao Zhong, Tao Xie, Lu Zhang, Jian Pei, and Hong Mei. 2009. MAPO: Mining and Recommending API Usage Patterns. In ECOOP 2009 – Object-Oriented Programming , Sophia Drossopoulou (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 318–343.Google Scholar
Index Terms
Aroma: code recommendation via structural code search
Recommendations
Learning to rank code examples for code search engines
Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user'...
Big Code Search: a Bibliography
Code search is an essential task in software development. Developers often search the internet and other code databases for necessary source code snippets to ease the development efforts. Code search techniques also help learn programming as novice ...
Context-aware code recommendation in Intellij IDEA
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringDevelopers spend a lot of time online, searching for code to help them implement their desired features. While code recommenders help improve developers’ productivity, there is currently no support for context-aware code recommendation for ...






Comments