ABSTRACT
Finding and fixing code quality concerns, such as defects or poor understandability of code, decreases software development and evolution costs. A common industrial practice to identify code quality concerns early on are code reviews. While code reviews help to identify problems early on, they also impose costs on development and only take place after a code change is already completed. The goal of our research is to automatically identify code quality concerns while a developer is making a change to the code. By using biometrics, such as heart rate variability, we aim to determine the difficulty a developer experiences working on a part of the code as well as identify and help to fix code quality concerns before they are even committed to the repository.
In a field study with ten professional developers over a two-week period we investigated the use of biometrics to determine code quality concerns. Our results show that biometrics are indeed able to predict quality concerns of parts of the code while a developer is working on, improving upon a naive classifier by more than 26% and outperforming classifiers based on more traditional metrics. In a second study with five professional developers from a different country and company, we found evidence that some of our findings from our initial study can be replicated. Overall, the results from the presented studies suggest that biometrics have the potential to predict code quality concerns online and thus lower development and evolution costs.
- A. F. Ackerman, P. J. Fowler, and R. G. Ebenau. Software inspections and the industrial production of software. In Proc. of Symp. on Softw. Validation, 1984. Google Scholar
Digital Library
- E. H. Alikacem and H. Sahraoui. Generic metric extraction framework. In Proc. of IWSM/MetriKon, 2006.Google Scholar
- L. Anthony, P. Carrington, P. Chu, C. Kidd, J. Lai, and A. Sears. Gesture dynamics: Features sensitive to task difficulty and correlated with physiological sensors. Stress, 1418(360), 2011.Google Scholar
- http://www.apple.com/watch/.Google Scholar
- P. Ayres. Systematic mathematical errors and cognitive load. In Contemporary Educational Psychology, 2001.Google Scholar
Cross Ref
- A. Bacchelli and C. Bird. Expectations, outcomes, and challenges of modern code review. In Proc. of ICSE, 2013. Google Scholar
Digital Library
- R. Bednarik and M. Tukiainen. An eye-tracking methodology for characterizing program comprehension processes. In Proc. of ETRA, 2006. Google Scholar
Digital Library
- R. Bednarik, H. Vrzakova, and M. Hradis. What do you want to do next: a novel approach for intent prediction in gaze-based interaction. In Proc. of ETRA, 2012. Google Scholar
Digital Library
- G. G. Berntson, J. T. J. Bigger, D. L. Eckberg, P. Grossman, P. G. Kaufmann, M. Malik, H. N. Nagaraja, S. W. Porges, J. P. Saul, P. H. Stone, and M. W. van der Molen. Heart rate variability: origins, methods, and interpretive caveats. Psychophysiology, 34(6):623--648, 1997.Google Scholar
- B. W. Boehm. Software engineering economics. Prentice-Hall, 1981. Google Scholar
Digital Library
- B. W. Boehm, J. R. Brown, and M. Lipow. Quantitative evaluation of software quality. In Proc. of ICSE, 1976. Google Scholar
Digital Library
- A. Bosu, M. Greiler, and C. Bird. Characteristics of useful code reviews: An empirical study at microsoft. In Proc. of MSR, 2015. Google Scholar
Digital Library
- L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001. Google Scholar
Digital Library
- S. Butterworth. On the theory of filter amplifiers. Wireless Engineer, 7:536--541, 1930.Google Scholar
- J. Carter and P. Dewan. Are you having difficulty? In Proc. of CSCW, 2010. Google Scholar
Digital Library
- J. Cohen. A coefficient of agreement for nominal scales. Education and Psychological Measurement, 20:37--46, 1960.Google Scholar
Cross Ref
- A. M. Connor. Mining software metrics for the jazz repository. Journal of Systems and Software, 1(5):194--204, 2011.Google Scholar
- D. J. Cornforth, A. Koenig, R. Riener, K. August, A. H. Khandoker, C. Karmakar, M. Palaniswami, and H. F. Jelinek. The role of serious games in robot exoskeleton-assisted rehabilitation of stroke patients. In Serious Games Analytics: Methodologies for Performance Measurement, Assessment, and Improvement. Springer International Publisher, 2015.Google Scholar
Cross Ref
- M. Crosby and J. Stelovsky. How do we read algorithms? a case study. Computer, 23(1), 1990. Google Scholar
Digital Library
- W. Cunningham. The wycash portfolio management system. OOPS Messenger, 4(2):29--30, 1993. Google Scholar
Digital Library
- B. Curtis, S. Sheppard, P. Milliman, M. Borst, and T. Love. Measuring the psychological complexity of software maintenance tasks with the Halstead and McCabe metrics. Trans. on Software Engineering, SE-5(2):96--104, 1979. Google Scholar
Digital Library
- R. G. Ebenau and S. H. Strauss. Software Inspection Process. McGraw-Hill, Inc., 1994. Google Scholar
Digital Library
- K. O. Elish and M. O. Elish. Predicting defect-prone software modules using support vector machines. Journal of Systems and Software, 81(5):649--660, 2008. Google Scholar
Digital Library
- http://www.empatica.com.Google Scholar
- http://techcrunch.com/2011/08/07/oh-what-noble-scribe-hath-penned-these-words/.Google Scholar
- S. H. Fairclough, L. Venables, and A. Tattersall. The influence of task demand and learning on the psychophysiological response. International Journal of Psychophysiology, 56, 2005.Google Scholar
- J. Feigenspan, S. Apel, J. Liebig, and C. Kastner. Exploring software measures to assess program comprehension. In Proc. of ESEM, 2011. Google Scholar
Digital Library
- http://findbugs.sourceforge.net/.Google Scholar
- T. Fritz, A. Begel, S. C. Müller, S. Yigit-Elliot, and M. Züger. Using psycho-physiological measures to assess task difficulty in software development. In Proc. of ICSE, 2014. Google Scholar
Digital Library
- E. Giger, M. D'Ambros, M. Pinzger, and H. C. Gall. Method-level bug prediction. In Proc. of ESEM, 2012. Google Scholar
Digital Library
- http://www.niallkennedy.com/blog/2006/11/google-mondrian.html.Google Scholar
- R. Grady and T. Slack. Key lessons in achieving widespread inspection use. Software, 11(4):46--57, 1994. Google Scholar
Digital Library
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations Newsletter, 11(1):10--18, 2009. Google Scholar
Digital Library
- Y. Ikutani and H. Uwano. Brain activity measurement during program comprehension with NIRS. In Proc. of SNPD, 2014.Google Scholar
Cross Ref
- K. Kevic, B. M. Walters, T. R. Shaffer, B. Sharif, D. C. Shepherd, and T. Fritz. Tracing software developers' eyes and interactions for change tasks. In Proc. of ESEC/FSE, 2015. Google Scholar
Digital Library
- A. J. Ko and B. A. Myers. A framework and methodology for studying the causes of software errors in programming systems. Journal of Visual Languages & Computing, 16(1):41--84, 2005. Google Scholar
Digital Library
- N. A. Kuznetsov, K. D. Shockley, M. J. Richardson, and M. A. Riley. Effect of precision aiming on respiration and postural-respiratory synergy. Neuroscience letters, 502(1):13--17, 2011.Google Scholar
Cross Ref
- J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical data. Biometrics, 33(1):159--174, 1977.Google Scholar
- M. Lanza and R. Marinescu. Object-oriented Metrics in Practice: Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented Systems. Springer, 2006. Google Scholar
Digital Library
- T. Lee, J. Nam, D. Han, S. Kim, and H. P. In. Micro interaction metrics for defect prediction. In Proc. of ESEC/FSE, 2011. Google Scholar
Digital Library
- M. M. Lehman. On understanding laws, evolution, and conservation in the large-program life cycle. Journal of Systems and Software, 1:213--221, 1980. Google Scholar
Digital Library
- S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. Trans. on Software Engineering, 34(4):485--496, 2008. Google Scholar
Digital Library
- O. Maimon and L. Rokach, editors. Data Mining and Knowledge Discovery Handbook. Springer, 2006. Google Scholar
Digital Library
- R. Marinescu. Detection strategies: Metrics-based rules for detecting design flaws. In Proc. of ICSM, 2004. Google Scholar
Digital Library
- S. McConnell. Code complete. Pearson, 2004.Google Scholar
- N. Moha, Y. Guéhéneuc, L. Duchien, and A. Le Meur. Decor: A method for the specification and detection of code and design smells. Trans. on Software Engineering, 36(1), 2010. Google Scholar
Digital Library
- R. Moser, W. Pedrycz, and G. Succi. Analysis of the reliability of a subset of change metrics for defect prediction. In Proc. of ESEM, 2008. Google Scholar
Digital Library
- S. C. Müller and T. Fritz. Stuck and frustrated or in flow and happy: Sensing developers' emotions and progress. In Proc. of ICSE, 2015. Google Scholar
Digital Library
- M. Munro. Product metrics for automatic identification of "bad smell" design problems in java source-code. In Proc. of METRICS, 2005. Google Scholar
Digital Library
- N. Nagappan and T. Ball. Use of relative code churn measures to predict system defect density. In Proc. of ICSE, 2005. Google Scholar
Digital Library
- N. Nagappan, T. Ball, and A. Zeller. Mining metrics to predict component failures. In Proc. of ICSE, 2006. Google Scholar
Digital Library
- N. Nagappan, B. Murphy, and V. Basili. The influence of organizational structure on software quality: An empirical case study. In Proc. of ICSE, 2008. Google Scholar
Digital Library
- T. Nakagawa, Y. Kamei, H. Uwano, A. Monden, K. Matsumoto, and D. M. German. Quantifying programmers' mental workload during program comprehension based on cerebral blood flow measurement: A controlled experiment. In Companion Proc. of ICSE, 2014. Google Scholar
Digital Library
- D. Novak, J. Ziherl, A. Olenšek, M. Milavec, J. Podobnik, M. Mihelj, and M. Munih. Psychophysiological response to robotic rehabilitation tasks in stroke. Trans. on Neural Systems and Rehabilitation Engineering, 18(4), 2010.Google Scholar
- F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia, and D. Poshyvanyk. Detecting bad smells in source code using change history information. In Proc. of ASE, 2013.Google Scholar
Digital Library
- C. Parnin. Subvocalization - toward hearing the inner thoughts of developers. In Proc. of ICPC, 2011. Google Scholar
Digital Library
- https://pmd.github.io/.Google Scholar
- Y. Qi. Random forest for bioinformatics. In Ensemble Machine Learning. Springer, 2012.Google Scholar
Cross Ref
- S. Radevski, H. Hata, and K. Matsumoto. Real-time monitoring of neural state in assessing and improving software developers' productivity. Proc. of CHASE, 2015. Google Scholar
Digital Library
- http://www.ifi.uzh.ch/seal/people/mueller/PredictCodeQualityWithBiometrics.Google Scholar
- P. Richter, T. Wagner, R. Heger, and G. Weise. Psychophysiological analysis of mental load during driving on rural roads - a quasi-experimental field study. Ergonomics, 41(5), 1998.Google Scholar
- P. C. Rigby, D. M. German, and M.-A. Storey. Open source software peer review practices: A case study of the apache server. In Proc. of ICSE, 2008. Google Scholar
Digital Library
- P. Rodeghero, C. McMillan, P. W. McBurney, N. Bosch, and S. D'Mello. Improving automated source code summarization via an eye-tracking study of programmers. In Proc. of ICSE, 2014. Google Scholar
Digital Library
- S. Schmidth and H. Walach. Electrodermal activity (EDA) - state-of-the-art measurements and techniques for parapsychological purposes. Journal of Parapsychology, 64(2), 2000.Google Scholar
- C. Setz, B. Arnrich, J. Schumm, R. L. Marca, G. Tröster, and U. Ehlert. Discriminating stress from cognitive load using a wearable eda device. Trans. on Information Technology in Biomedicine, 14(2), 2010. Google Scholar
Digital Library
- J. Siegmund, C. Kästner, S. Apel, C. Parnin, A. Bethmann, T. Leich, G. Saake, and A. Brechmann. Understanding understanding source code with functional magnetic resonance imaging. In Proc. of ICSE, 2014. Google Scholar
Digital Library
- L. A. Sroufe and E. Waters. Heart rate as a convergent measure in clinical and developmental research. Merrill-Palmer Quarterly of Behavior and Development, 23(1):3--27, 1977.Google Scholar
- J. Sweller. Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2):257--285, 1988.Google Scholar
Digital Library
- J. Sweller, P. Ayres, and S. Kalyuga. Cognitive Load Theory. Springer, 2011.Google Scholar
- E. van Emden and L. Moonen. Java quality assurance by detecting code smells. In Proc. of WCRE, 2002. Google Scholar
Digital Library
- J. Veltman and A. W. Gaillard. Physiological workload reactions to increasing levels of task difficulty. Ergonomics, 41(5):656--669, 1998.Google Scholar
- G. F. Walter and S. W. Porges. Heart rate and respiratory responses as a function of task difficulty: The use of discriminant analysis in the selection of psychologically sensitive physiological responses. Psychophysiology, 13(6), 1976.Google Scholar
- R. A. Weast and N. G. Neiman. The effect of cognitive load and meaning on selective attention. In Annual Meeting of the Cognitive Science Society, 2010.Google Scholar
- E. J. Weyuker, T. J. Ostrand, and R. M. Bell. Do too many cooks spoil the broth? using the number of developers to enhance defect prediction models. Empirical Software Engineering, 13(5):539--559, 2008. Google Scholar
Digital Library
- G. F. Wilson. An analysis of mental workload in pilots during flight using multiple psychphysiological measures. International Journal of Aviation Psychology, 12(1), 2002.Google Scholar
Cross Ref
- H. Zhang, X. Zhang, and M. Gu. Predicting defective software components from code complexity measures. In Proc. of PRDC, 2007. Google Scholar
Digital Library
- T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In Proc. of PROMISE, 2007. Google Scholar
Digital Library
Recommendations
Code review quality: how developers see it
ICSE '16: Proceedings of the 38th International Conference on Software EngineeringIn a large, long-lived project, an effective code review process is key to ensuring the long-term quality of the code base. In this work, we study code review practices of a large, open source project, and we investigate how the developers themselves ...
Code Clone Graph Metrics for Detecting Diffused Code Clones
APSEC '09: Proceedings of the 2009 16th Asia-Pacific Software Engineering ConferenceCode clones (duplicated source code in a software system) are one of the major factors in decreasing maintainability. Many code clone detection methods have been proposed to find code clones automatically from large-scale software. However, it is still ...
Prioritising Refactoring Using Code Bad Smells
ICSTW '11: Proceedings of the 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation WorkshopsWe investigated the relationship between six of Fowler et al.'s Code Bad Smells (Duplicated Code, Data Clumps, Switch Statements, Speculative Generality, Message Chains, and Middle Man) and software faults. In this paper we discuss how our results can ...





Comments