Abstract
Feature Location (FL) is a common task in the Software Engineering field, specially in maintenance and evolution of software products. The results of FL depend in a great manner in the style in which Feature Descriptions and software artifacts are written. Therefore, Natural Language Processing (NLP) techniques are used to process them. Through this paper, we analyze the influence of the most common NLP techniques over FL in Conceptual Models through Latent Semantic Indexing, and the influence of human participation when embedding domain knowledge in the process. We evaluated the techniques in a real-world industrial case study in the rolling stocks domain.
- Andrea Arcuri and Lionel Briand. 2014. A Hitchhiker's Guide to Statistical Tests for Assessing Randomized Algorithms in Software Engineering. Softw. Test. Verif. Reliab. 24, 3 (May 2014), 219-250. Google Scholar
Digital Library
- Chetan Arora, Mehrdad Sabetzadeh, Arda Goknil, Lionel C Briand, and Frank Zimmer. 2015. Change impact analysis for natural language requirements: An NLP approach. In Requirements Engineering Conference (RE), 2015 IEEE 23rd International. IEEE, 6-15.Google Scholar
Cross Ref
- Vimala Balakrishnan and Ethel Lloyd-Yemoh. 2014. Stemming and lemmatization: a comparison of retrieval performances. Lecture Notes on Software Engineering 2, 3 (2014), 262.Google Scholar
Cross Ref
- Giovanni Capobianco, Andrea De Lucia, Rocco Oliveto, Annibale Panichella, and Sebastiano Panichella. 2009. On the role of the nouns in IR-based traceability recovery. In Program Comprehension, 2009. ICPC'09. IEEE 17th International Conference on. IEEE, 148-157.Google Scholar
Cross Ref
- W. J Conover. 1999. Practical Nonparametric Statistics, 3rd Edition. Wiley.Google Scholar
- Krzysztof Czarnecki and Andrzej Wasowski. 2007. Feature Diagrams and Logics: There and Back Again. In Proceedings of the 11th International Software Product Lines Conference. Google Scholar
Digital Library
- Chuan Duan and Jane Cleland-Huang. 2007. Clustering support for automated tracing. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering. ACM, 244-253. Google Scholar
Digital Library
- Sebastian Eder, Henning Femmer, Benedikt Hauptmann, and Maximilian Junker. 2015. Configuring latent semantic indexing for requirements tracing. In Proceedings of the Second International Workshop on Requirements Engineering and Testing. IEEE Press, 27-33. Google Scholar
Digital Library
- Andrew David Eisenberg and Kris De Volder. 2005. Dynamic Feature Traces: Finding Features in Unfamiliar Code. In 21st IEEE International Conference on Software Maintenance. Google Scholar
Digital Library
- Davide Falessi, Giovanni Cantone, and Gerardo Canfora. 2013. Empirical principles and an industrial case study in retrieving equivalent requirements via natural language processing techniques. IEEE Transactions on Software Engineering 39, 1 (2013), 18-44. Google Scholar
Digital Library
- Jaime Font, Lorena Arcega, Øystein Haugen, and Carlos Cetina. 2016. Feature Location in Model-Based Software Product Lines Through a Genetic Algorithm. In Proceedings of the 15th International Conference on Software Reuse: Bridging with Social-Awareness. Google Scholar
Digital Library
- Jaime Font, Lorena Arcega, Øystein Haugen, and Carlos Cetina. 2016. Feature Location in Models Through a Genetic Algorithm Driven by Information Retrieval Techniques. In Proceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems. Google Scholar
Digital Library
- Bernhard Ganter and Rudolf Wille. 2012. Formal Concept Analysis: Mathematical Foundations. Springer Science & Business Media.Google Scholar
- Edel Garcia. 2006. Latent Semantic Indexing (LSI) A Fast Track Tutorial. Grossman and Frieders Information Retrieval, Algorithms and Heuristics, 2006 (2006).Google Scholar
- Salvador García, Alberto Fernández, Julián Luengo, and Francisco Herrera. 2010. Advanced Nonparametric Tests for Multiple Comparisons in the Design of Experiments in Computational Intelligence and Data Mining: Experimental Analysis of Power. Inf. Sci. 180, 10 (May 2010), 2044-2064. Google Scholar
Digital Library
- R. J. Grissom and J. J. Kim. 2005. Effect sizes for research: A broad practical approach. Mahwah, NJ: Earlbaum.Google Scholar
- Sonia Haiduc, Gabriele Bavota, Andrian Marcus, Rocco Oliveto, Andrea De Lucia, and Tim Menzies. 2013. Automatic query reformulations for text retrieval in software engineering. In Software Engineering (ICSE), 2013 35th International Conference on. IEEE, 842-851. Google Scholar
Digital Library
- Øystein Haugen, Birger Moller-Pedersen, Jon Oldevik, Goran K Olsen, and Andreas Svendsen. 2008. Adding standardized variability to domain specific languages. In Software Product Line Conference, 2008. SPLC'08. 12th International. IEEE, 139-148. Google Scholar
Digital Library
- Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 50-57. Google Scholar
Digital Library
- Sonke Holthusen, David Wille, Christoph Legat, Simon Beddig, Ina Schaefer, and Birgit Vogel-Heuser. 2014. Family Model Mining for Function Block Diagrams in Automation Software. In 18th International Software Product Lines Conference. Google Scholar
Digital Library
- Anette Hulth. 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on Empirical methods in natural language processing. Association for Computational Linguistics, 216-223. Google Scholar
Digital Library
- Christian Kastner, Paolo G. Giarrusso, Tillmann Rendel, Sebastian Erdweg, Klaus Ostermann, and Thorsten Berger. 2011. Variability-Aware Parsing in the Presence of Lexical Macros and Conditional Compilation. In Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. Google Scholar
Digital Library
- Thomas K Landauer, Peter W Foltz, and Darrell Laham. 1998. An introduction to latent semantic analysis. Discourse processes 25, 2-3 (1998), 259-284.Google Scholar
- Raúl Lapena, Manuel Ballarin, and Carlos Cetina. 2016. Towards Clone-and-Own Support: Locating Relevant Methods in Legacy Products. In Proceedings of the 20th International Conference on Software Product Lines. Google Scholar
Digital Library
- Dapeng Liu, Andrian Marcus, Denys Poshyvanyk, and Vaclav Rajlich. 2007. Feature Location via Information Retrieval Based Filtering of a Single Scenario Execution Trace. In Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering. Google Scholar
Digital Library
- Christopher D Manning, Hinrich Schütze, et al. 1999. Foundations of statistical natural language processing. Vol. 999. MIT Press. Google Scholar
Digital Library
- Jabier Martinez, Tewfik Ziadi, Tegawendé F. Bissyande, Jacques Klein, and Yves Le Traon. 2015. Bottom-up Adoption of Software Product Lines: a Generic and Extensible Approach. In Proceedings of the 19th International Conference on Software Product Lines. Google Scholar
Digital Library
- Sarah Nadi, Thorsten Berger, Christian Kästner, and Krzysztof Czarnecki. 2014. Mining Configuration Constraints: Static Analyses and Empirical Results. In 36th International Conference on Software Engineering. Google Scholar
Digital Library
- Joël Plisson, Nada Lavrac, Dunja Mladenic, et al. 2004. A rule based approach to word lemmatization. In Proceedings C of the 7th International Multi-Conference Information Society IS 2004, Vol. 1. Citeseer, 83-86.Google Scholar
- Denys Poshyvanyk, Yann-Gaël Guéhéneuc, Andrian Marcus, Giuliano Antoniol, and Václav Rajlich. 2007. Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval. IEEE Trans. Software Eng. 33, 6 (2007). Google Scholar
Digital Library
- Denys Poshyvanyk, Yann-Gael Gueheneuc, Andrian Marcus, Giuliano Antoniol, and Vaclav Rajlich. 2007. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Transactions on Software Engineering 33, 6 (2007). Google Scholar
Digital Library
- Per Runeson and Martin Host. 2009. Guidelines for conducting and reporting case study research in software engineering. Empirical software engineering 14, 2 (2009), 131. Google Scholar
Digital Library
- Kevin Ryan. 1993. The role of natural language in requirements engineering. In Requirements Engineering, 1993., Proceedings of IEEE International Symposium on. IEEE, 240-242.Google Scholar
- Steven She, Rafael Lotufo, Thorsten Berger, Andrzej Wasowski, and Krzysztof Czarnecki. 2011. Reverse Engineering Feature Models. In Proceedings of the 33rd International Conference on Software Engineering. Google Scholar
Digital Library
- Hakim Sultanov and Jane Huffman Hayes. 2010. Application of swarm techniques to requirements engineering: Requirements tracing. In Requirements Engineering Conference (RE), 2010 18th IEEE International. IEEE, 211-220. Google Scholar
Digital Library
- Senthil Karthikeyan Sundaram, Jane Huffman Hayes, Alex Dekhtyar, and E Ashlee Holbrook. 2010. Assessing traceability of software engineering artifacts. Requirements engineering 15, 3 (2010), 313-335. Google Scholar
Digital Library
- András Vargha and Harold D. Delaney. 2000. A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics 25, 2 (2000), 101-132. arXiv:http://jeb.sagepub.com/content/25/2/101.full.pdf+htmlGoogle Scholar
- David Wille, Sönke Holthusen, Sandro Schulze, and Ina Schaefer. 2013. Interface Variability in Family Model Mining. In 17th International Software Product Line Conference. Google Scholar
Digital Library
- Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in software engineering. Springer Science & Business Media. Google Scholar
Digital Library
- Yinxing Xue, Zhenchang Xing, and Stan Jarzabek. 2012. Feature Location in a Collection of Product Variants. In 19th Working Conference on Reverse Engineering. Google Scholar
Digital Library
- Xiaorui Zhang, Øystein Haugen, and Birger Moller-Pedersen. 2011. Model Comparison to Synthesize a Model-Driven Software Product Line. In Proceedings of the 15th International Conference on Software Product Lines. Google Scholar
Digital Library
- Xiaorui Zhang, Øystein Haugen, and Birger Moller-Pedersen. 2012. Augmenting Product Lines. In 19th Asia-Pacific Software Engineering Conference. Google Scholar
Digital Library
Index Terms
Analyzing the impact of natural language processing over feature location in models
Recommendations
Analyzing the impact of natural language processing over feature location in models
GPCE 2017: Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and ExperiencesFeature Location (FL) is a common task in the Software Engineering field, specially in maintenance and evolution of software products. The results of FL depend in a great manner in the style in which Feature Descriptions and software artifacts are ...
A Survey On Thesauri Application In Automatic Natural Language Processing
FRUCT'21: Proceedings of the 21st Conference of Open Innovations Association FRUCTThis paper is devoted to investigate efficiency of thesauri use in popular natural language processing (NLP) fields: information retrieval and analysis of texts and subject areas. A thesaurus is a natural language resource that models a subject area and ...
An empirical study of BM25 and BM25F based feature location techniques
InnoSWDev 2014: Proceedings of the International Workshop on Innovative Software Development Methodologies and PracticesFeature location is a software comprehension activity which aims at identifying source code entities that implement functionalities. Manual feature location is a labor-insensitive task, and developers need to find the target entities from thousands of ...







Comments