skip to main content
research-article

Concept Drift in Software Defect Prediction: A Method for Detecting and Handling the Drift

Published:19 May 2023Publication History
Skip Abstract Section

Abstract

Software Defect Prediction (SDP) is crucial towards software quality assurance in software engineering. SDP analyzes the software metrics data for timely prediction of defect prone software modules. Prediction process is automated by constructing defect prediction classification models using machine learning techniques. These models are trained using metrics data from historical projects of similar types. Based on the learned experience, models are used to predict defect prone modules in currently tested software. These models perform well if the concept is stationary in a dynamic software development environment. But their performance degrades unexpectedly in the presence of change in concept (Concept Drift). Therefore, concept drift (CD) detection is an important activity for improving the overall accuracy of the prediction model. Previous studies on SDP have shown that CD may occur in software defect data and the used defect prediction model may require to be updated to deal with CD. This phenomenon of handling the CD is known as CD adaptation. It is observed that still efforts need to be done in this direction in the SDP domain. In this article, we have proposed a pair of paired learners (PoPL) approach for handling CD in SDP. We combined the drift detection capabilities of two independent paired learners and used the paired learner (PL) with the best performance in recent time for next prediction. We experimented on various publicly available software defect datasets garnered from public data repositories. Experimentation results showed that our proposed approach performed better than the existing similar works and the base PL model based on various performance measures.

REFERENCES

  1. [1] S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. 2008. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering 34, 4 (July–Aug 2008), 485–496. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Cagatay Catal. 2011. Review: Software fault prediction: A literature review and current trends. Expert Systems with Applications 38, 4 (April 2011), 4626–4636. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Ezgi Erturk and Ebru Akcapinar Sezer. 2015. A comparison of some soft computing methods for software fault prediction. Expert Systems with Applications 42, 4 (March 2015), 1872–1879. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] J. Ekanayake, J. Tappolet, H. C. Gall, and A. Bernstein. 2009. Tracking concept drift of software projects using defect prediction quality. In Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. 51–60. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] J. Ekanayake, J. Tappolet, H. C. Gall, and A. Bernstein. 2012. Time variance and defect prediction in software projects. Empirical Software Engineering 17, 4–5 (2012), 348–389. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang. 2019. Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering 31, 12 (1 Dec. 2019), 2346–2363. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] J. Gama, P. Medas, G. Castillo, and P. Rodrigues. 2004. Learning with Drift Detection. Vol. 3171. Springer, Berlin. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] M. Baena-Garcıa, J. Del Campo-Avila, R. Fidalgo, and A. Bifet. 2006. Early drift detection method. In Proceedings of the 4th ECML PKDD International Workshop on Knowledge Discovery from Data Streams. 77–86.Google ScholarGoogle Scholar
  9. [9] F. Dong, J. Lu, K. Li, and G. Zhang. 2017. Concept drift region identification via competence-based discrepancy distribution estimation. In Proceedings of the 12th International Conference on Intelligent Systems and Knowledge Engineering. 1–7. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] S. R. Chidamber and C. F. Kemerer. 1994. A metrics suite for object-oriented design. IEEE Transactions on Software Engineering, 20, 6 (June 1994), 476–493. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] J. Sayyad Shirabad and T. J. Menzies. 2005. The (PROMISE) repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Ottawa. Retrieved April 5, 2020 from http://promise.site.uottawa.ca/SERepository.Google ScholarGoogle Scholar
  12. [12] T. Menzies, R. Krishna, and D. Pryor. 2017. The SEACRAFT repository of empirical software engineering data. Retrieved April 5, 2020 from https://zenodo.org/communities/seacraft.Google ScholarGoogle Scholar
  13. [13] K. Nishida and K. Yamauchi. 2007. Detecting concept drift using statistical testing. In Discovery Science. DS 2007. V. Corruble, M. Takeda, and E. Suzuki (Eds.), Lecture Notes in Computer Science, Vol. 4755, Springer, Berlin. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] G. Widmer and M. Kubat. 1996. Learning in the presence of concept drift and hidden contexts. Machine Learning 23, 1 (1996), 69–101. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Albert Bifet, Richard Kirkby, Geoff Holmes, Ricard Gavalda, and Bernhard Pfahringer. 2009. New ensemble methods for evolving data streams. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 139–148. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] S. H. Bach and M. A. Maloof. 2008. Paired learners for concept drift. In Proceedings of the 8th IEEE International Conference on Data Mining. 23–32. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] W. Nick Street and Yong Seog Kim. 2001. A streaming ensemble algorithm (SEA) for large-scale classification. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 377–388. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] J. Z. Kolter and M. A. Maloof. 2003. Dynamic weighted majority: A new ensemble method for tracking concept drift. In Proceedings of the 3rd IEEE International Conference on Data Mining. 123–130. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] H. Wang, W. Fan, P. S. Yu, and J. Han. 2003. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 226–235.Google ScholarGoogle Scholar
  20. [20] Albert Bifet, Geoffrey Holmes, Bernhard Pfahringer, and Ricard Gavalda. 2009. Improving adaptive bagging methods for evolving data streams. In Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning. 23–27. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Leandro L. Minku and Xin Yao. 2012. DDD: A new ensemble approach for dealing with concept drift. Phys. Rev. E. 24, 4 (2012), 619–633. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] João Gama, Indre Zliobaite, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM Computing Surveys 46, 4 (2014), 1–37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Henry B. Mann and Donald R. Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics 18, 1 (1947), 50–60. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics Bulletin 1, 6 (1945), 80–83. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] M. A. Kabir, J. W. Keung, K. E. Benniny, M. Zhang. 2019. Assessing the significant impact of concept drift in software defect prediction. In Proceedings of the IEEE 43rd Annual Computer Software and Applications Conference. 53–58. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] P. Singh and S. Verma. 2012. Empirical investigation of fault prediction capability of object-oriented metrics of open source software. In Proceedings of the 9th International Joint Conference on Computer Science and Software Engineering. 323–327. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayse Bener. 2010. Defect prediction from static code features: Current results, limitations, new approaches. Automated Software Engineering 17, 4 (2010), 375–407. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell. 2012. A systematic review of fault prediction performance in software engineering. IEEE Transactions on Software Engineering 38, 6 (2012), 1276–1304. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] D. Radjenovic, M. Hericko, R. Torkar, and A. Zivkovic. 2013. Software fault prediction metrics: A systematic literature review. Information and Software Technology 55, 8 (2013), 1397–1418. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Y. Kamei and E. Shihab. 2016. Defect prediction: Accomplishments and future challenges. In Proceeding of the 23rd International Conference on Software Analysis, Evolution, and Reengineering. Vol. 5, 33–45. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] S. S. Rathore and S. Kumar. 2019. An approach for the prediction of number of software faults based on the dynamic selection of learning techniques. IEEE Transactions on Reliability 68, 1 (2019), 216–236. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] L. L. Minku and X. Yao. 2012. DDD: A new ensemble approach for dealing with concept drift. IEEE Transactions on Knowledge and Data Engineering 24, 4 (2012), 619–633. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Abdullateef Oluwagbemiga Balogun, Shuib Basri, Said Jadid Abdulkadir, and Ahmad Sobri Hashim. 2019. Performance analysis of feature selection methods in software defect prediction: A search method approach. MDPI Journal of Applied Sciences 9, 13 (2019), 2764. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] S. Wang, L. L. Minku, and X. Yao. 2013. Online class imbalance learning and its applications in fault detection. International Journal of Computational Intelligence and Applications 12, 04 (2013), 1340001. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] A. Tsymbal. 2004. The Problem of Concept Drift: Definitions and Related Work. Department of Computer Science Trinity College Technical Report.Google ScholarGoogle Scholar
  36. [36] T. J. McCabe. 1976. A complexity measure. IEEE Transactions on Software Engineering SE-2, 4 (Dec. 1976), 308–320. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] K. E. Bennin, N. b. Ali, J. Börstler, and X. Yu. 2020. Revisiting the impact of concept drift on just-in-time quality assurance. In Proceedings of the 2020 IEEE 20th International Conference on Software Quality, Reliability and Security . 53–59. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Jingwen Wang, Jingxin Liu, Juntao Pu, Qinghong Yang, Zhongchen Miao, Jian Gao, and You Song. 2019. An anomaly prediction framework for financial IT systems using hybrid machine learning methods. Journal of Ambient Intelligence and Humanized Computing (2019). https://link.springer.com/article/10.1007/s12652-019-01645-z.Google ScholarGoogle Scholar
  39. [39] A. K. Gangwar, S. Kumar, and A. Mishra. 2021. A paired learner-based approach for concept drift detection and adaptation in software defect prediction. Applied Sciences 11, 14 (2021), 6663. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Haroon Malik, Ian J. Davis, Michael W. Godfrey, Douglas Neuse, and Serge Manskovskii. 2016. Connecting the dots: anomaly and discontinuity detection in large-scale systems. Journal of Ambient Intelligence and Humanized Computing 7, 4 (2016), 509–522. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] M. H. Halstead. 1977. Elements of Software Science. Elsevier. isbn:0444002057Google ScholarGoogle Scholar
  42. [42] S. Priya and R. A. Uthra. 2021. Comprehensive analysis for class imbalance data with concept drift using ensemble-based classification. Journal of Ambient Intelligence and Humanized Computing 12, 5 (2021), 4943–4956. DOI:Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Concept Drift in Software Defect Prediction: A Method for Detecting and Handling the Drift

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Internet Technology
      ACM Transactions on Internet Technology  Volume 23, Issue 2
      May 2023
      276 pages
      ISSN:1533-5399
      EISSN:1557-6051
      DOI:10.1145/3597634
      • Editor:
      • Ling Liu
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 May 2023
      • Online AM: 27 March 2023
      • Accepted: 21 March 2023
      • Revised: 17 December 2022
      • Received: 26 October 2021
      Published in toit Volume 23, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)126
      • Downloads (Last 6 weeks)35

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!