skip to main content
research-article

Reusable Component Retrieval: A Semantic Search Approach for Low-Resource Languages

Authors Info & Claims
Published:10 May 2023Publication History
Skip Abstract Section

Abstract

A common practice among programmers is to reuse existing code, accomplished by performing natural language queries through search engines. The main aim of code retrieval is to search for the most relevant snippet from a corpus of code snippets. However, code retrieval frameworks for low-resource languages are insufficient. Retrieving the most relevant code snippet efficiently can be accomplished only by eliminating the semantic gap between the code snippets residing in the repository and the user’s query (natural language description). The primary objective of the research is to contribute to this field by providing a code search framework that can be extended for low-resource languages. The secondary objective is to provide a code retrieval mechanism that is semantically relevant to the user query and provide programmers with the ability to locate source code that they want to use when developing new applications. The proposed approach is implemented using a web platform to search for source code. As code retrieval is a sophisticated task, the proposed approach incorporates a semantic search mechanism. This research uses a semantic model for code retrieval, which generates meanings or synonyms of words. The proposed model integrates ontologies and Natural Language Processing. System performance measures and classification accuracy are computed using precision, recall, and F1-score. We also compare the proposed approach with state-of-the-art baseline models. The retrieved results are ranked, showing that our approach significantly outperforms robust code matching. Our evaluation shows that semantic matching leads to improved source code retrieval. This study marks a substantial advancement in integrating programming expertise with code retrieval techniques. Moreover, our system lets users know when and how it is used for successful semantic searching.

REFERENCES

  1. [1] Abebe Surafel Lemma and Tonella Paolo. 2015. Extraction of domain concepts from the source code. Science of Computer Programming 98 (2015), 680706.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Ahmed Usman, Lin Jerry Chun-Wei, and Srivastava Gautam. 2022. Fuzzy contrast set based deep attention network for lexical analysis and mental health treatment. Transactions on Asian and Low-Resource Language Information Processing 21, 5 (2022), 116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Alam Shadab, Shuaib Mohammed, Khan Wazir Zada, Garg Sahil, Kaddoum Georges, Hossain M. Shamim, and Zikria Yousaf Bin. 2021. Blockchain-based initiatives: Current state and challenges. Computer Networks 198 (2021), 108395.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Alnusair Awny and Zhao Tian. 2012. Retrieving reusable software components using enhanced representation of domain knowledge. In Recent Trends in Information Reuse and Integration. Springer, Vienna, 363379.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Alon Uri, Brody Shaked, Levy Omer, and Yahav Eran. 2018. code2seq: Generating sequences from structured representations of code. arXiv. https://arxiv.org/abs/1808.01400.Google ScholarGoogle Scholar
  6. [6] Antunes Bruno, Gomes Paulo, and Seco Nuno. 2007. SRS: A software reuse system based on the semantic web. In 3rd International Workshop on Semantic Web Enabled Software Engineering (SWESE’07). Citeseer. 1–15.Google ScholarGoogle Scholar
  7. [7] Atzeni Mattia and Atzori Maurizio. 2017. CodeOntology: RDF-ization of source code. In International Semantic Web Conference. Springer, Cham, 2028.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Bajracharya Sushil, Ossher Joel, and Lopes Cristina. 2014. Sourcerer: An infrastructure for large-scale collection and analysis of open-source code. Science of Computer Programming 79 (2014), 241259.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Barros-Justo Jose L., Benitti Fabiane B. V., and Matalonga Santiago. 2019. Trends in software reuse research: A tertiary study. Computer Standards & Interfaces 66 (2019), 103352.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Bhatia M. P. S., Kumar Akshi, and Beniwal Rohit. 2015. Ontology based framework for automatic software’s documentation. In 2nd International Conference on Computing for Sustainable Global Development (INDIACom’15). IEEE, Delhi, India, 421424.Google ScholarGoogle Scholar
  11. [11] Bhowmick Rajat Subhra, Ganguli Isha, Paul Jayanta, and Sil Jaya. 2021. A multimodal deep framework for derogatory social media post identification of a recognized person. Transactions on Asian and Low-Resource Language Information Processing 21, 1 (2021), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Bukhari Syed Ahmad Chan, Bashir Ali Kashif, and Malik Khalid Mahmood. 2018. Semantic web in the age of big data: A perspective. OSF Preprints.Google ScholarGoogle Scholar
  13. [13] Buregio Vanilson Arruda, Almeida Eduardo Santana, Lucredio Daniel, and Meira Silvio Lemos. 2007. Specification, design and implementation of a reuse repository. In 31st Annual International Computer Software and Applications Conference (COMPSAC’07), Vol. 1. IEEE, 579582.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Cafaro Luigi, Francese Rita, Palumbo Ciro, Risi Michele, and Tortora Genoveffa. 2018. An agile process supporting software reuse: An industrial experience. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing. Pau, France, 15441551.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Cambronero Jose, Li Hongyu, Kim Seohyun, Sen Koushik, and Chandra Satish. 2019. When deep learning met code search. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Tallinn, Estonia, 964974.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Collobert Ronan, Weston Jason, Bottou Léon, Karlen Michael, Kavukcuoglu Koray, and Kuksa Pavel. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, ARTICLE (2011), 24932537.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Diamantopoulos Themistoklis and Symeonidis Andreas L.. 2020. Mining source code for component reuse. In Mining Software Engineering Data for Software Reuse. Springer, 133174.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Diamantopoulos Themistoklis and Symeonidis Andreas L.. 2020. Providing reusability-aware recommendations. In Mining Software Engineering Data for Software Reuse. Springer, 207217.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Durão Frederico A., Vanderlei Taciana A., Almeida Eduardo S., and Meira Silvio R. de L.. 2008. Applying a semantic layer in a source code search tool. In Proceedings of the 2008 ACM Symposium on Applied Computing. Fortaleza, Ceará, Brazil, 11511157.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Frome Andrea, Corrado Greg S., Shlens Jon, Bengio Samy, Dean Jeff, Ranzato Marc’Aurelio, and Mikolov Tomas. 2013. Devise: A deep visual-semantic embedding model. Advances in Neural Information Processing Systems 26 (2013), 1–9.Google ScholarGoogle Scholar
  21. [21] Ghosh Uttam, Alazab Mamoun, Bashir Ali Kashif, and Pathan Al-Sakib Khan. 2021. Deep Learning for Internet of Things Infrastructure. CRC Press, 15–29.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Gonzalez-Gil Pedro, Martinez Juan Antonio, and Skarmeta Antonio F.. 2020. Lightweight data-security ontology for IoT. Sensors 20, 3 (2020), 801.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Grundy John. 2000. Storage and retrieval of software components using aspects. In Proceedings of the 23rd Australasian Computer Science Conference (ACSC’00). Cat. No. PR00518. IEEE, Canberra, ACT, Australia, 95103.Google ScholarGoogle Scholar
  24. [24] Gu Xiaodong, Zhang Hongyu, and Kim Sunghun. 2018. Deep code search. In IEEE/ACM 40th International Conference on Software Engineering (ICSE’18). IEEE, 933944.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Haldar Rajarshi, Wu Lingfei, Xiong Jinjun, and Hockenmaier Julia. 2020. A multi-perspective architecture for semantic code search. arXiv preprint arXiv:2005.06980 (2020).Google ScholarGoogle Scholar
  26. [26] Hoffmann Raphael, Fogarty James, and Weld Daniel S.. 2007. Assieme: Finding and leveraging implicit references in a web search interface for programmers. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology. Newport, Rhode Island, 1322.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Hogan Aidan. 2020. Resource description framework. In The Web of Data. Springer, 59109.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Hummel Oliver and Atkinson Colin. 2004. Extreme harvesting: Test driven discovery and reuse of software components. In Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration (IRI’04). IEEE, Las Vegas, NV, USA, 6672.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Husain Hamel, Wu Ho-Hsiang, Gazit Tiferet, Allamanis Miltiadis, and Brockschmidt Marc. 2019. CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019).Google ScholarGoogle Scholar
  30. [30] Jain Vishal and Singh Mayank. 2013. Ontology based information retrieval in semantic web: A survey. International Journal of Information Technology and Computer Science 5, 10 (2013), 62.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Jambhulkar Sanket V. and Karale S. J.. 2016. Semantic Web application generation using Protégé tool. In 2016 Online International Conference on Green Engineering and Technologies (IC-GET’16). IEEE, Coimbatore, India, 15.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Javed Abdul Rehman, Shahzad Faisal, Rehman Saif ur, Zikria Yousaf Bin, Razzak Imran, Jalil Zunera, and Xu Guandong. 2022. Future smart cities requirements, emerging technologies, applications, challenges, and future aspects. Cities 129 (2022), 103794.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Jawad Hadeel Mohammed, Laski-Smith Deb de, and Tout Samir. 2018. The code genie programming environment. In 2018 IEEE International Conference on Electro/Information Technology (EIT’18). IEEE, Rochester, MI, USA, 01630168.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Karnalim Oscar. 2020. Syntax trees and information retrieval to improve code similarity detection. In Proceedings of the 22nd Australasian Computing Education Conference. Melbourne, VIC, Australia, 4855.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Kumar Akshi, Esposito Christian, and Karras Dimitrios A.. 2021. Introduction to Special Issue on Misinformation, Fake News and Rumor Detection in Low-Resource Languages. (2021), 3 pages.Google ScholarGoogle Scholar
  36. [36] Lemos Otávio Augusto Lazzarini, Paula Adriano Carvalho de, Sajnani Hitesh, and Lopes Cristina V.. 2015. Can the use of types and query expansion help improve large-scale code search?. In IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM’15). IEEE, Bremen, Germany, 4150.Google ScholarGoogle Scholar
  37. [37] Linstead Erik, Bajracharya Sushil, Ngo Trung, Rigor Paul, Lopes Cristina, and Baldi Pierre. 2009. Sourcerer: Mining and searching Internet-scale software repositories. Data Mining and Knowledge Discovery 18, 2 (2009), 300336.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Lv Fei, Zhang Hongyu, Lou Jian-guang, Wang Shaowei, Zhang Dongmei, and Zhao Jianjun. 2015. Codehow: Effective code search based on API understanding and extended Boolean model (e). In 30th IEEE/ACM International Conference on Automated Software Engineering (ASE’15). IEEE, Lincoln, NE, USA, 260270.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] McMillan Collin, Grechanik Mark, Poshyvanyk Denys, Xie Qing, and Fu Chen. 2011. Portfolio: Finding relevant functions and their usage. In Proceedings of the 33rd International Conference on Software Engineering. Waikiki, Honolulu, HI, USA, 111120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Nasir Jamal Abdul, Varlamis Iraklis, and Ishfaq Samreen. 2019. A knowledge-based semantic framework for query expansion. Information Processing & Management 56, 5 (2019), 16051617.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Noy Natalya F., McGuinness Deborah L., et al. 2001. Ontology development 101: A guide to creating your first ontology. (2001), 1–25. http://protege.stanford.edu/publications.Google ScholarGoogle Scholar
  42. [42] Pfister Joachim and Zimmermann Hans-Dieter. 2008. Towards the introduction of an institutional repository: Basic principles and concepts. BOBCATSSS 2014 Proceedings 1, 1 (2008), 1–10. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1030321Google ScholarGoogle Scholar
  43. [43] Qu Xiangli, Feng Xiwei, Zhang Yue, Wang Siyuan, Sun Lei, Hua Pengcheng, and Wang Yujie. 2022. Research on component retrieval and matching methods. In 2022 International Seminar on Computer Science and Engineering Technology (SCSET’22). IEEE, Indianapolis, IN, USA, 358362.Google ScholarGoogle Scholar
  44. [44] Rahman Md Masudur, Barson Jed, Paul Sydney, Kayani Joshua, Lois Federico Andrés, Quezada Sebastián Fernandez, Parnin Christopher, Stolee Kathryn T., and Ray Baishakhi. 2018. Evaluating how developers use general-purpose web-search for code retrieval. In Proceedings of the 15th International Conference on Mining Software Repositories. Gothenburg, Sweden, 465475.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Rashid Ayesha, Farooq Muhammad Shoaib, Abid Adnan, Umer Tariq, Bashir Ali Kashif, and Zikria Yousaf Bin. 2021. Social media intention mining for sustainable information systems: Categories, taxonomy, datasets and challenges. Complex & Intelligent Systems (2021), 127.Google ScholarGoogle Scholar
  46. [46] Ren Feiliang, Liu Yongkang, Li Bochao, Wang Zhibo, Guo Yu, Liu Shilei, Wu Huimin, Wang Jiaqi, Liu Chunchao, and Wang Bingchao. 2022. Deep understanding based multi-document machine reading comprehension. Transactions on Asian and Low-Resource Language Information Processing 21, 5 (2022), 121.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Rodríguez-García Miguel Ángel, Valencia-García Rafael, García-Sánchez Francisco, and Samper-Zapater J. Javier. 2014. Ontology-based annotation and retrieval of services in the cloud. Knowledge-based Systems 56 (2014), 1525.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Shuai Jianhang, Xu Ling, Liu Chao, Yan Meng, Xia Xin, and Lei Yan. 2020. Improving code search with co-attentive representation learning. In Proceedings of the 28th International Conference on Program Comprehension. Seoul, Republic of Korea, 196207.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Sirres Raphael, Bissyandé Tegawendé F., Kim Dongsun, Lo David, Klein Jacques, Kim Kisub, and Traon Yves Le. 2018. Augmenting and structuring user queries to support efficient free-form code search. Empirical Software Engineering 23, 5 (2018), 26222654.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Stadnicki Adrian, Pietroń Filip Filip, and Burek Patryk. 2020. Towards a modern ontology development environment. Procedia Computer Science 176 (2020), 753762.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Su Chao, Huang Heyan, Shi Shumin, and Jian Ping. 2022. Improving neural machine translation by transferring knowledge from syntactic constituent alignment learning. Transactions on Asian and Low-Resource Language Information Processing 21, 5 (2022), 115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Sugumaran Vijayan and Storey Veda C.. 2003. A semantic-based approach to component retrieval. ACM SIGMIS Database: The DATABASE for Advances in Information Systems 34, 3 (2003), 824.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Vale Tassio, Crnkovic Ivica, Almeida Eduardo Santana De, Neto Paulo Anselmo da Mota Silveira, Cavalcanti Yguaratã Cerqueira, and Meira Silvio Romero de Lemos. 2016. Twenty-eight years of component-based software engineering. Journal of Systems and Software 111 (2016), 128148.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Wilde Evan and German Daniel. 2018. Merge-tree: Visualizing the integration of commits into Linux. Journal of Software: Evolution and Process 30, 2 (2018), e1936.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Xu Ling, Yang Huanhuan, Liu Chao, Shuai Jianhang, Yan Meng, Lei Yan, and Xu Zhou. 2021. Two-stage attention-based model for code search with textual and structural features. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’21). IEEE, 342353.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Yao Ziyu, Peddamail Jayavardhan Reddy, and Sun Huan. 2019. CoaCor: Code annotation for code retrieval with reinforcement learning. In The World Wide Web Conference. San Francisco, CA, USA, 22032214.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Reusable Component Retrieval: A Semantic Search Approach for Low-Resource Languages

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Asian and Low-Resource Language Information Processing
          ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 5
          May 2023
          653 pages
          ISSN:2375-4699
          EISSN:2375-4702
          DOI:10.1145/3596451
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 May 2023
          • Online AM: 22 September 2022
          • Accepted: 17 September 2022
          • Revised: 26 August 2022
          • Received: 4 March 2022
          Published in tallip Volume 22, Issue 5

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)180
          • Downloads (Last 6 weeks)14

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!