skip to main content
article

Analyzing the impact of natural language processing over feature location in models

Published:23 October 2017Publication History
Skip Abstract Section

Abstract

Feature Location (FL) is a common task in the Software Engineering field, specially in maintenance and evolution of software products. The results of FL depend in a great manner in the style in which Feature Descriptions and software artifacts are written. Therefore, Natural Language Processing (NLP) techniques are used to process them. Through this paper, we analyze the influence of the most common NLP techniques over FL in Conceptual Models through Latent Semantic Indexing, and the influence of human participation when embedding domain knowledge in the process. We evaluated the techniques in a real-world industrial case study in the rolling stocks domain.

References

  1. Andrea Arcuri and Lionel Briand. 2014. A Hitchhiker's Guide to Statistical Tests for Assessing Randomized Algorithms in Software Engineering. Softw. Test. Verif. Reliab. 24, 3 (May 2014), 219-250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chetan Arora, Mehrdad Sabetzadeh, Arda Goknil, Lionel C Briand, and Frank Zimmer. 2015. Change impact analysis for natural language requirements: An NLP approach. In Requirements Engineering Conference (RE), 2015 IEEE 23rd International. IEEE, 6-15.Google ScholarGoogle ScholarCross RefCross Ref
  3. Vimala Balakrishnan and Ethel Lloyd-Yemoh. 2014. Stemming and lemmatization: a comparison of retrieval performances. Lecture Notes on Software Engineering 2, 3 (2014), 262.Google ScholarGoogle ScholarCross RefCross Ref
  4. Giovanni Capobianco, Andrea De Lucia, Rocco Oliveto, Annibale Panichella, and Sebastiano Panichella. 2009. On the role of the nouns in IR-based traceability recovery. In Program Comprehension, 2009. ICPC'09. IEEE 17th International Conference on. IEEE, 148-157.Google ScholarGoogle ScholarCross RefCross Ref
  5. W. J Conover. 1999. Practical Nonparametric Statistics, 3rd Edition. Wiley.Google ScholarGoogle Scholar
  6. Krzysztof Czarnecki and Andrzej Wasowski. 2007. Feature Diagrams and Logics: There and Back Again. In Proceedings of the 11th International Software Product Lines Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chuan Duan and Jane Cleland-Huang. 2007. Clustering support for automated tracing. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering. ACM, 244-253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Sebastian Eder, Henning Femmer, Benedikt Hauptmann, and Maximilian Junker. 2015. Configuring latent semantic indexing for requirements tracing. In Proceedings of the Second International Workshop on Requirements Engineering and Testing. IEEE Press, 27-33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Andrew David Eisenberg and Kris De Volder. 2005. Dynamic Feature Traces: Finding Features in Unfamiliar Code. In 21st IEEE International Conference on Software Maintenance. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Davide Falessi, Giovanni Cantone, and Gerardo Canfora. 2013. Empirical principles and an industrial case study in retrieving equivalent requirements via natural language processing techniques. IEEE Transactions on Software Engineering 39, 1 (2013), 18-44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jaime Font, Lorena Arcega, Øystein Haugen, and Carlos Cetina. 2016. Feature Location in Model-Based Software Product Lines Through a Genetic Algorithm. In Proceedings of the 15th International Conference on Software Reuse: Bridging with Social-Awareness. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jaime Font, Lorena Arcega, Øystein Haugen, and Carlos Cetina. 2016. Feature Location in Models Through a Genetic Algorithm Driven by Information Retrieval Techniques. In Proceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Bernhard Ganter and Rudolf Wille. 2012. Formal Concept Analysis: Mathematical Foundations. Springer Science & Business Media.Google ScholarGoogle Scholar
  14. Edel Garcia. 2006. Latent Semantic Indexing (LSI) A Fast Track Tutorial. Grossman and Frieders Information Retrieval, Algorithms and Heuristics, 2006 (2006).Google ScholarGoogle Scholar
  15. Salvador García, Alberto Fernández, Julián Luengo, and Francisco Herrera. 2010. Advanced Nonparametric Tests for Multiple Comparisons in the Design of Experiments in Computational Intelligence and Data Mining: Experimental Analysis of Power. Inf. Sci. 180, 10 (May 2010), 2044-2064. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. J. Grissom and J. J. Kim. 2005. Effect sizes for research: A broad practical approach. Mahwah, NJ: Earlbaum.Google ScholarGoogle Scholar
  17. Sonia Haiduc, Gabriele Bavota, Andrian Marcus, Rocco Oliveto, Andrea De Lucia, and Tim Menzies. 2013. Automatic query reformulations for text retrieval in software engineering. In Software Engineering (ICSE), 2013 35th International Conference on. IEEE, 842-851. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Øystein Haugen, Birger Moller-Pedersen, Jon Oldevik, Goran K Olsen, and Andreas Svendsen. 2008. Adding standardized variability to domain specific languages. In Software Product Line Conference, 2008. SPLC'08. 12th International. IEEE, 139-148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 50-57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sonke Holthusen, David Wille, Christoph Legat, Simon Beddig, Ina Schaefer, and Birgit Vogel-Heuser. 2014. Family Model Mining for Function Block Diagrams in Automation Software. In 18th International Software Product Lines Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Anette Hulth. 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on Empirical methods in natural language processing. Association for Computational Linguistics, 216-223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Christian Kastner, Paolo G. Giarrusso, Tillmann Rendel, Sebastian Erdweg, Klaus Ostermann, and Thorsten Berger. 2011. Variability-Aware Parsing in the Presence of Lexical Macros and Conditional Compilation. In Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Thomas K Landauer, Peter W Foltz, and Darrell Laham. 1998. An introduction to latent semantic analysis. Discourse processes 25, 2-3 (1998), 259-284.Google ScholarGoogle Scholar
  24. Raúl Lapena, Manuel Ballarin, and Carlos Cetina. 2016. Towards Clone-and-Own Support: Locating Relevant Methods in Legacy Products. In Proceedings of the 20th International Conference on Software Product Lines. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Dapeng Liu, Andrian Marcus, Denys Poshyvanyk, and Vaclav Rajlich. 2007. Feature Location via Information Retrieval Based Filtering of a Single Scenario Execution Trace. In Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Christopher D Manning, Hinrich Schütze, et al. 1999. Foundations of statistical natural language processing. Vol. 999. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jabier Martinez, Tewfik Ziadi, Tegawendé F. Bissyande, Jacques Klein, and Yves Le Traon. 2015. Bottom-up Adoption of Software Product Lines: a Generic and Extensible Approach. In Proceedings of the 19th International Conference on Software Product Lines. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sarah Nadi, Thorsten Berger, Christian Kästner, and Krzysztof Czarnecki. 2014. Mining Configuration Constraints: Static Analyses and Empirical Results. In 36th International Conference on Software Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Joël Plisson, Nada Lavrac, Dunja Mladenic, et al. 2004. A rule based approach to word lemmatization. In Proceedings C of the 7th International Multi-Conference Information Society IS 2004, Vol. 1. Citeseer, 83-86.Google ScholarGoogle Scholar
  30. Denys Poshyvanyk, Yann-Gaël Guéhéneuc, Andrian Marcus, Giuliano Antoniol, and Václav Rajlich. 2007. Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval. IEEE Trans. Software Eng. 33, 6 (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Denys Poshyvanyk, Yann-Gael Gueheneuc, Andrian Marcus, Giuliano Antoniol, and Vaclav Rajlich. 2007. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Transactions on Software Engineering 33, 6 (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Per Runeson and Martin Host. 2009. Guidelines for conducting and reporting case study research in software engineering. Empirical software engineering 14, 2 (2009), 131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kevin Ryan. 1993. The role of natural language in requirements engineering. In Requirements Engineering, 1993., Proceedings of IEEE International Symposium on. IEEE, 240-242.Google ScholarGoogle Scholar
  34. Steven She, Rafael Lotufo, Thorsten Berger, Andrzej Wasowski, and Krzysztof Czarnecki. 2011. Reverse Engineering Feature Models. In Proceedings of the 33rd International Conference on Software Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Hakim Sultanov and Jane Huffman Hayes. 2010. Application of swarm techniques to requirements engineering: Requirements tracing. In Requirements Engineering Conference (RE), 2010 18th IEEE International. IEEE, 211-220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Senthil Karthikeyan Sundaram, Jane Huffman Hayes, Alex Dekhtyar, and E Ashlee Holbrook. 2010. Assessing traceability of software engineering artifacts. Requirements engineering 15, 3 (2010), 313-335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. András Vargha and Harold D. Delaney. 2000. A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics 25, 2 (2000), 101-132. arXiv:http://jeb.sagepub.com/content/25/2/101.full.pdf+htmlGoogle ScholarGoogle Scholar
  38. David Wille, Sönke Holthusen, Sandro Schulze, and Ina Schaefer. 2013. Interface Variability in Family Model Mining. In 17th International Software Product Line Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in software engineering. Springer Science & Business Media. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yinxing Xue, Zhenchang Xing, and Stan Jarzabek. 2012. Feature Location in a Collection of Product Variants. In 19th Working Conference on Reverse Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Xiaorui Zhang, Øystein Haugen, and Birger Moller-Pedersen. 2011. Model Comparison to Synthesize a Model-Driven Software Product Line. In Proceedings of the 15th International Conference on Software Product Lines. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xiaorui Zhang, Øystein Haugen, and Birger Moller-Pedersen. 2012. Augmenting Product Lines. In 19th Asia-Pacific Software Engineering Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Analyzing the impact of natural language processing over feature location in models

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 52, Issue 12
            GPCE '17
            December 2017
            258 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/3170492
            Issue’s Table of Contents
            • cover image ACM Conferences
              GPCE 2017: Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences
              October 2017
              258 pages
              ISBN:9781450355247
              DOI:10.1145/3136040

            Copyright © 2017 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 23 October 2017

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!