Abstract
A common practice among programmers is to reuse existing code, accomplished by performing natural language queries through search engines. The main aim of code retrieval is to search for the most relevant snippet from a corpus of code snippets. However, code retrieval frameworks for low-resource languages are insufficient. Retrieving the most relevant code snippet efficiently can be accomplished only by eliminating the semantic gap between the code snippets residing in the repository and the user’s query (natural language description). The primary objective of the research is to contribute to this field by providing a code search framework that can be extended for low-resource languages. The secondary objective is to provide a code retrieval mechanism that is semantically relevant to the user query and provide programmers with the ability to locate source code that they want to use when developing new applications. The proposed approach is implemented using a web platform to search for source code. As code retrieval is a sophisticated task, the proposed approach incorporates a semantic search mechanism. This research uses a semantic model for code retrieval, which generates meanings or synonyms of words. The proposed model integrates ontologies and Natural Language Processing. System performance measures and classification accuracy are computed using precision, recall, and F1-score. We also compare the proposed approach with state-of-the-art baseline models. The retrieved results are ranked, showing that our approach significantly outperforms robust code matching. Our evaluation shows that semantic matching leads to improved source code retrieval. This study marks a substantial advancement in integrating programming expertise with code retrieval techniques. Moreover, our system lets users know when and how it is used for successful semantic searching.
- [1] . 2015. Extraction of domain concepts from the source code. Science of Computer Programming 98 (2015), 680–706.Google Scholar
Digital Library
- [2] . 2022. Fuzzy contrast set based deep attention network for lexical analysis and mental health treatment. Transactions on Asian and Low-Resource Language Information Processing 21, 5 (2022), 1–16.Google Scholar
Digital Library
- [3] . 2021. Blockchain-based initiatives: Current state and challenges. Computer Networks 198 (2021), 108395.Google Scholar
Digital Library
- [4] . 2012. Retrieving reusable software components using enhanced representation of domain knowledge. In Recent Trends in Information Reuse and Integration. Springer, Vienna, 363–379.Google Scholar
Cross Ref
- [5] . 2018. code2seq: Generating sequences from structured representations of code. arXiv. https://arxiv.org/abs/1808.01400.Google Scholar
- [6] . 2007. SRS: A software reuse system based on the semantic web. In 3rd International Workshop on Semantic Web Enabled Software Engineering (SWESE’07). Citeseer. 1–15.Google Scholar
- [7] . 2017. CodeOntology: RDF-ization of source code. In International Semantic Web Conference. Springer, Cham, 20–28.Google Scholar
Digital Library
- [8] . 2014. Sourcerer: An infrastructure for large-scale collection and analysis of open-source code. Science of Computer Programming 79 (2014), 241–259.Google Scholar
Digital Library
- [9] . 2019. Trends in software reuse research: A tertiary study. Computer Standards & Interfaces 66 (2019), 103352.Google Scholar
Digital Library
- [10] . 2015. Ontology based framework for automatic software’s documentation. In 2nd International Conference on Computing for Sustainable Global Development (INDIACom’15). IEEE, Delhi, India, 421–424.Google Scholar
- [11] . 2021. A multimodal deep framework for derogatory social media post identification of a recognized person. Transactions on Asian and Low-Resource Language Information Processing 21, 1 (2021), 1–19.Google Scholar
Digital Library
- [12] . 2018. Semantic web in the age of big data: A perspective. OSF Preprints.Google Scholar
- [13] . 2007. Specification, design and implementation of a reuse repository. In 31st Annual International Computer Software and Applications Conference (COMPSAC’07), Vol. 1. IEEE, 579–582.Google Scholar
Digital Library
- [14] . 2018. An agile process supporting software reuse: An industrial experience. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing. Pau, France, 1544–1551.Google Scholar
Digital Library
- [15] . 2019. When deep learning met code search. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Tallinn, Estonia, 964–974.Google Scholar
Digital Library
- [16] . 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, ARTICLE (2011), 2493–2537.Google Scholar
Digital Library
- [17] . 2020. Mining source code for component reuse. In Mining Software Engineering Data for Software Reuse. Springer, 133–174.Google Scholar
Cross Ref
- [18] . 2020. Providing reusability-aware recommendations. In Mining Software Engineering Data for Software Reuse. Springer, 207–217.Google Scholar
Cross Ref
- [19] . 2008. Applying a semantic layer in a source code search tool. In Proceedings of the 2008 ACM Symposium on Applied Computing. Fortaleza, Ceará, Brazil, 1151–1157.Google Scholar
Digital Library
- [20] . 2013. Devise: A deep visual-semantic embedding model. Advances in Neural Information Processing Systems 26 (2013), 1–9.Google Scholar
- [21] . 2021. Deep Learning for Internet of Things Infrastructure. CRC Press, 15–29.Google Scholar
Cross Ref
- [22] . 2020. Lightweight data-security ontology for IoT. Sensors 20, 3 (2020), 801.Google Scholar
Cross Ref
- [23] . 2000. Storage and retrieval of software components using aspects. In Proceedings of the 23rd Australasian Computer Science Conference (ACSC’00). Cat. No. PR00518. IEEE, Canberra, ACT, Australia, 95–103.Google Scholar
- [24] . 2018. Deep code search. In IEEE/ACM 40th International Conference on Software Engineering (ICSE’18). IEEE, 933–944.Google Scholar
Digital Library
- [25] . 2020. A multi-perspective architecture for semantic code search. arXiv preprint arXiv:2005.06980 (2020).Google Scholar
- [26] . 2007. Assieme: Finding and leveraging implicit references in a web search interface for programmers. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology. Newport, Rhode Island, 13–22.Google Scholar
Digital Library
- [27] . 2020. Resource description framework. In The Web of Data. Springer, 59–109.Google Scholar
Cross Ref
- [28] . 2004. Extreme harvesting: Test driven discovery and reuse of software components. In Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration (IRI’04). IEEE, Las Vegas, NV, USA, 66–72.Google Scholar
Cross Ref
- [29] . 2019. CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019).Google Scholar
- [30] . 2013. Ontology based information retrieval in semantic web: A survey. International Journal of Information Technology and Computer Science 5, 10 (2013), 62.Google Scholar
Cross Ref
- [31] . 2016. Semantic Web application generation using Protégé tool. In 2016 Online International Conference on Green Engineering and Technologies (IC-GET’16). IEEE, Coimbatore, India, 1–5.Google Scholar
Cross Ref
- [32] . 2022. Future smart cities requirements, emerging technologies, applications, challenges, and future aspects. Cities 129 (2022), 103794.Google Scholar
Cross Ref
- [33] . 2018. The code genie programming environment. In 2018 IEEE International Conference on Electro/Information Technology (EIT’18). IEEE, Rochester, MI, USA, 0163–0168.Google Scholar
Cross Ref
- [34] . 2020. Syntax trees and information retrieval to improve code similarity detection. In Proceedings of the 22nd Australasian Computing Education Conference. Melbourne, VIC, Australia, 48–55.Google Scholar
Digital Library
- [35] . 2021. Introduction to Special Issue on Misinformation, Fake News and Rumor Detection in Low-Resource Languages. (2021), 3 pages.Google Scholar
- [36] . 2015. Can the use of types and query expansion help improve large-scale code search?. In IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM’15). IEEE, Bremen, Germany, 41–50.Google Scholar
- [37] . 2009. Sourcerer: Mining and searching Internet-scale software repositories. Data Mining and Knowledge Discovery 18, 2 (2009), 300–336.Google Scholar
Digital Library
- [38] . 2015. Codehow: Effective code search based on API understanding and extended Boolean model (e). In 30th IEEE/ACM International Conference on Automated Software Engineering (ASE’15). IEEE, Lincoln, NE, USA, 260–270.Google Scholar
Digital Library
- [39] . 2011. Portfolio: Finding relevant functions and their usage. In Proceedings of the 33rd International Conference on Software Engineering. Waikiki, Honolulu, HI, USA, 111–120.Google Scholar
Digital Library
- [40] . 2019. A knowledge-based semantic framework for query expansion. Information Processing & Management 56, 5 (2019), 1605–1617.Google Scholar
Digital Library
- [41] . 2001. Ontology development 101: A guide to creating your first ontology. (2001), 1–25. http://protege.stanford.edu/publications.Google Scholar
- [42] . 2008. Towards the introduction of an institutional repository: Basic principles and concepts. BOBCATSSS 2014 Proceedings 1, 1 (2008), 1–10. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1030321Google Scholar
- [43] . 2022. Research on component retrieval and matching methods. In 2022 International Seminar on Computer Science and Engineering Technology (SCSET’22). IEEE, Indianapolis, IN, USA, 358–362.Google Scholar
- [44] . 2018. Evaluating how developers use general-purpose web-search for code retrieval. In Proceedings of the 15th International Conference on Mining Software Repositories. Gothenburg, Sweden, 465–475.Google Scholar
Digital Library
- [45] . 2021. Social media intention mining for sustainable information systems: Categories, taxonomy, datasets and challenges. Complex & Intelligent Systems (2021), 1–27.Google Scholar
- [46] . 2022. Deep understanding based multi-document machine reading comprehension. Transactions on Asian and Low-Resource Language Information Processing 21, 5 (2022), 1–21.Google Scholar
Digital Library
- [47] . 2014. Ontology-based annotation and retrieval of services in the cloud. Knowledge-based Systems 56 (2014), 15–25.Google Scholar
Digital Library
- [48] . 2020. Improving code search with co-attentive representation learning. In Proceedings of the 28th International Conference on Program Comprehension. Seoul, Republic of Korea, 196–207.Google Scholar
Digital Library
- [49] . 2018. Augmenting and structuring user queries to support efficient free-form code search. Empirical Software Engineering 23, 5 (2018), 2622–2654.Google Scholar
Digital Library
- [50] . 2020. Towards a modern ontology development environment. Procedia Computer Science 176 (2020), 753–762.Google Scholar
Cross Ref
- [51] . 2022. Improving neural machine translation by transferring knowledge from syntactic constituent alignment learning. Transactions on Asian and Low-Resource Language Information Processing 21, 5 (2022), 1–15.Google Scholar
Digital Library
- [52] . 2003. A semantic-based approach to component retrieval. ACM SIGMIS Database: The DATABASE for Advances in Information Systems 34, 3 (2003), 8–24.Google Scholar
Digital Library
- [53] . 2016. Twenty-eight years of component-based software engineering. Journal of Systems and Software 111 (2016), 128–148.Google Scholar
Digital Library
- [54] . 2018. Merge-tree: Visualizing the integration of commits into Linux. Journal of Software: Evolution and Process 30, 2 (2018), e1936.Google Scholar
Cross Ref
- [55] . 2021. Two-stage attention-based model for code search with textual and structural features. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’21). IEEE, 342–353.Google Scholar
Cross Ref
- [56] . 2019. CoaCor: Code annotation for code retrieval with reinforcement learning. In The World Wide Web Conference. San Francisco, CA, USA, 2203–2214.Google Scholar
Digital Library
Index Terms
Reusable Component Retrieval: A Semantic Search Approach for Low-Resource Languages
Recommendations
An Empirical Evaluation on Semantic Search Performance of Keyword-Based and Semantic Search Engines: Google, Yahoo, Msn and Hakia
ICIMP '09: Proceedings of the 2009 Fourth International Conference on Internet Monitoring and ProtectionThis paper investigates the semantic search performance of search engines. Initially, three keyword-based search engines (Google, Yahoo and Msn) and a semantic search engine (Hakia) were selected. Then, ten queries, from various topics, and four phrases,...
Deep Graph Matching and Searching for Semantic Code Retrieval
Code retrieval is to find the code snippet from a large corpus of source code repositories that highly matches the query of natural language description. Recent work mainly uses natural language processing techniques to process both query texts (i.e., ...
A natural language interface for information retrieval on semantic web documents
AWIC'03: Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligenceWe present a dialogue system that enables the access in natural language to a web information retrieval system. We use a Web Semantic Language to model the knowledge conveyed by the texts. In this way we are able to obtain the associated knowledge ...






Comments