skip to main content

Well-typed programs can go wrong: a study of typing-related bugs in JVM compilers

Published:15 October 2021Publication History
Skip Abstract Section

Abstract

Despite the substantial progress in compiler testing, research endeavors have mainly focused on detecting compiler crashes and subtle miscompilations caused by bugs in the implementation of compiler optimizations. Surprisingly, this growing body of work neglects other compiler components, most notably the front-end. In statically-typed programming languages with rich and expressive type systems and modern features, such as type inference or a mix of object-oriented with functional programming features, the process of static typing in compiler front-ends is complicated by a high-density of bugs. Such bugs can lead to the acceptance of incorrect programs (breaking code portability or the type system's soundness), the rejection of correct (e.g. well-typed) programs, and the reporting of misleading errors and warnings.

We conduct, what is to the best of our knowledge, the first empirical study for understanding and characterizing typing-related compiler bugs. To do so, we manually study 320 typing-related bugs (along with their fixes and test cases) that are randomly sampled from four mainstream JVM languages, namely Java, Scala, Kotlin, and Groovy. We evaluate each bug in terms of several aspects, including their symptom, root cause, bug fix's size, and the characteristics of the bug-revealing test cases. Some representative observations indicate that: (1) more than half of the typing-related bugs manifest as unexpected compile-time errors: the buggy compiler wrongly rejects semantically correct programs, (2) the majority of typing-related bugs lie in the implementations of the underlying type systems and in other core components related to operations on types, (3) parametric polymorphism is the most pervasive feature in the corresponding test cases, (4) one third of typing-related bugs are triggered by non-compilable programs.

We believe that our study opens up a new research direction by driving future researchers to build appropriate methods and techniques for a more holistic testing of compilers.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

This is a presentation video for the OOPSLA 2021 paper titled “Well-Typed Programs Can Go Wrong: A Study of Typing-Related Bugs in JVM Compilers,” accepted in the research track.

References

  1. Nada Amin, Samuel Grütter, Martin Odersky, Tiark Rompf, and Sandro Stucki. 2016. The Essence of Dependent Object Types. Springer International Publishing, Cham. 249–272. https://doi.org/10.1007/978-3-319-30936-1_14 Google ScholarGoogle Scholar
  2. Mehdi Bagherzadeh, Nicholas Fireman, Anas Shawesh, and Raffi Khatchadourian. 2020. Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API Usages, and Differences. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 214, Nov., 32 pages. https://doi.org/10.1145/3428282 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Gilad Bracha, Martin Odersky, David Stoutamire, and Philip Wadler. 1998. Making the Future Safe for the Past: Adding Genericity to the Java Programming Language. In Proceedings of the 13th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA ’98). Association for Computing Machinery, New York, NY, USA. 183–200. https://doi.org/10.1145/286936.286957 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Junjie Chen, Yanwei Bai, Dan Hao, Yingfei Xiong, Hongyu Zhang, and Bing Xie. 2017. Learning to Prioritize Test Programs for Compiler Testing. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, 700–711. https://doi.org/10.1109/ICSE.2017.70 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Junjie Chen, Y. Bai, D. Hao, Y. Xiong, H. Zhang, L. Zhang, and B. Xie. 2016. Test Case Prioritization for Compilers: A Text-Vector Based Approach. In 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST). 266–277. https://doi.org/10.1109/ICST.2016.19 Google ScholarGoogle ScholarCross RefCross Ref
  6. Junjie Chen, Jibesh Patra, Michael Pradel, Yingfei Xiong, Hongyu Zhang, Dan Hao, and Lu Zhang. 2020. A Survey of Compiler Testing. ACM Comput. Surv., 53, 1 (2020), Article 4, Feb., 36 pages. issn:0360-0300 https://doi.org/10.1145/3363562 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yuting Chen, Ting Su, and Zhendong Su. 2019. Deep Differential Testing of JVM Implementations. In Proceedings of the 41st International Conference on Software Engineering (ICSE ’19). IEEE Press, 1257–1268. https://doi.org/10.1109/ICSE.2019.00127 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yuting Chen, Ting Su, Chengnian Sun, Zhendong Su, and Jianjun Zhao. 2016. Coverage-Directed Differential Testing of JVM Implementations. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). Association for Computing Machinery, New York, NY, USA. 85–99. https://doi.org/10.1145/2908080.2908095 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Shafiul Azam Chowdhury, Sohil Lal Shrestha, Taylor T. Johnson, and Christoph Csallner. 2020. SLEMI: Equivalence modulo Input (EMI) Based Mutation of CPS Models for Finding Compiler Bugs in Simulink. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 335–346. https://doi.org/10.1145/3377811.3380381 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kyle Dewey, Jared Roesch, and Ben Hardekopf. 2015. Fuzzing the Rust Typechecker Using CLP. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE ’15). IEEE Press, 482–493. https://doi.org/10.1109/ASE.2015.65 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Anthony Di Franco, Hui Guo, and Cindy Rubio-González. 2017. A Comprehensive Study of Real-World Numerical Bug Characteristics. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017). IEEE Press, 509–519. https://doi.org/10.1109/ASE.2017.8115662 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alastair F. Donaldson, Hugues Evrard, Andrei Lascu, and Paul Thomson. 2017. Automated Testing of Graphics Shader Compilers. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 93, Oct., 29 pages. https://doi.org/10.1145/3133917 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alastair F. Donaldson, Hugues Evrard, and Paul Thomson. 2020. Putting Randomized Compiler Testing into Production (Experience Report). In 34th European Conference on Object-Oriented Programming (ECOOP 2020), Robert Hirschfeld and Tobias Pape (Eds.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 166). Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany. 22:1–22:29. issn:1868-8969 https://doi.org/10.4230/LIPIcs.ECOOP.2020.22 Google ScholarGoogle ScholarCross RefCross Ref
  14. Saikat Dutta, Owolabi Legunsen, Zixin Huang, and Sasa Misailovic. 2018. Testing Probabilistic Programming Systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA. 574–586. https://doi.org/10.1145/3236024.3236057 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Github Inc.. 2021. The state of the Octoverse. https://octoverse.github.com/ Online accessed; 05-03-2021Google ScholarGoogle Scholar
  16. James Gosling, Bill Joy, Guy Steele, Gilad Bracha, and Alex Buckley. 2015. The Java Language Specification: Java SE 8 Edition. https://docs.oracle.com/javase/specs/jls/se8/jls8.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with Code Fragments. In Proceedings of the 21st USENIX Conference on Security Symposium (Security’12). USENIX Association, USA. 38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. 2012. Understanding and Detecting Real-World Performance Bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’12). Association for Computing Machinery, New York, NY, USA. 77–88. https://doi.org/10.1145/2254064.2254075 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Filip Křikava, Heather Miller, and Jan Vitek. 2019. Scala Implicits Are Everywhere: A Large-Scale Study of the Use of Scala Implicits in the Wild. Proc. ACM Program. Lang., 3, OOPSLA (2019), Article 163, Oct., 28 pages. https://doi.org/10.1145/3360589 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler Validation via Equivalence modulo Inputs. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). Association for Computing Machinery, New York, NY, USA. 216–226. https://doi.org/10.1145/2594291.2594334 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Vu Le, Chengnian Sun, and Zhendong Su. 2015. Finding Deep Compiler Bugs via Guided Stochastic Program Mutation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). Association for Computing Machinery, New York, NY, USA. 386–399. https://doi.org/10.1145/2814270.2814319 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tanakorn Leesatapornwongsa, Jeffrey F. Lukman, Shan Lu, and Haryadi S. Gunawi. 2016. TaxDC: A Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter Distributed Systems. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’16). Association for Computing Machinery, New York, NY, USA. 517–530. https://doi.org/10.1145/2872362.2872374 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Christopher Lidbury, Andrei Lascu, Nathan Chong, and Alastair F. Donaldson. 2015. Many-Core Compiler Fuzzing. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 65–76. https://doi.org/10.1145/2737924.2737986 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Vsevolod Livinskii, Dmitry Babokin, and John Regehr. 2020. Random Testing for C and C++ Compilers with YARPGen. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 196, Nov., 25 pages. https://doi.org/10.1145/3428264 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Zalewski. 2013. American fuzzy lop. https://lcamtuf.coredump.cx/afl/ Online accessed; 05-08-2021Google ScholarGoogle Scholar
  26. Michaël Marcozzi, Qiyi Tang, Alastair F. Donaldson, and Cristian Cadar. 2019. Compiler Fuzzing: How Much Does It Matter? Proc. ACM Program. Lang., 3, OOPSLA (2019), Article 155, Oct., 29 pages. https://doi.org/10.1145/3360581 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Luis Mastrangelo, Matthias Hauswirth, and Nathaniel Nystrom. 2019. Casting about in the Dark: An Empirical Study of Cast Operations in Java Programs. Proc. ACM Program. Lang., 3, OOPSLA (2019), Article 158, Oct., 31 pages. https://doi.org/10.1145/3360584 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Bruno Gois Mateus and Matias Martinez. 2020. On the Adoption, Usage and Evolution of Kotlin Features in Android Development. In Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) (ESEM ’20). Association for Computing Machinery, New York, NY, USA. Article 15, 12 pages. https://doi.org/10.1145/3382494.3410676 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Robin Milner. 1978. A Theory of Type Polymorphism in Programming. J. Comput. System Sci., 17, 3 (1978), 348–375.Google ScholarGoogle Scholar
  30. Adriaan Moors, Frank Piessens, and Martin Odersky. 2008. Generics of a Higher Kind. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications (OOPSLA ’08). Association for Computing Machinery, New York, NY, USA. 423–438. https://doi.org/10.1145/1449764.1449798 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Eriko Nagai, Hironobu Awazu, Nagisa Ishiura, and Naoya Takeda. 2012. Random testing of C compilers targeting arithmetic optimization. In Workshop on Synthesis And System Integration of Mixed Information Technologies (SASIMI 2012). 48–53.Google ScholarGoogle Scholar
  32. Eriko Nagai, Atsushi Hashimoto, and Nagisa Ishiura. 2014. Reinforcing Random Testing of Arithmetic Optimization of C Compilers by Scaling up Size and Number of Expressions. IPSJ Transactions on System LSI Design Methodology, 7 (2014), 91–100. https://doi.org/10.2197/ipsjtsldm.7.91 Google ScholarGoogle Scholar
  33. Martin Odersky, Philippe Altherr, Vincent Cremet, Burak Emir, Sebastian Maneth, Stéphane Micheloud, Nikolay Mihaylov, Michel Schinz, Erik Stenman, and Matthias Zenger. 2004. An overview of the Scala programming language.Google ScholarGoogle Scholar
  34. S. Park, W. Xu, I. Yun, D. Jang, and T. Kim. 2020. Fuzzing JavaScript Engines with Aspect-preserving Mutation. In 2020 IEEE Symposium on Security and Privacy (SP). 1629–1642. https://doi.org/10.1109/SP40000.2020.00067 Google ScholarGoogle ScholarCross RefCross Ref
  35. John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. 2012. Test-Case Reduction for C Compiler Bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’12). Association for Computing Machinery, New York, NY, USA. 335–346. https://doi.org/10.1145/2254064.2254104 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding and Analyzing Compiler Warning Defects. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). Association for Computing Machinery, New York, NY, USA. 203–213. https://doi.org/10.1145/2884781.2884879 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding Compiler Bugs via Live Code Mutation. OOPSLA 2016. Association for Computing Machinery, New York, NY, USA. 849–863. https://doi.org/10.1145/2983990.2984038 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Chengnian Sun, Vu Le, Qirun Zhang, and Zhendong Su. 2016. Toward Understanding Compiler Bugs in GCC and LLVM. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA 2016). Association for Computing Machinery, New York, NY, USA. 294–305. https://doi.org/10.1145/2931037.2931074 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. TIOBE Software BV. 2021. TIOBE index. https://www.tiobe.com/tiobe-index/ Online accessed; 05-03-2021Google ScholarGoogle Scholar
  40. Seth Tisue. 2017. Bye bye JIRA — Scala issues migrated to GitHub scala/bug. https://contributors.scala-lang.org/t/bye-bye-jira-scala-issues-migrated-to-github-scala-bug/715Google ScholarGoogle Scholar
  41. Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2019. Superion: Grammar-Aware Greybox Fuzzing. In Proceedings of the 41st International Conference on Software Engineering (ICSE ’19). IEEE Press, 724–735. https://doi.org/10.1109/ICSE.2019.00081 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jie Wang, Wensheng Dou, Yu Gao, Chushu Gao, Feng Qin, Kang Yin, and Jun Wei. 2017. A Comprehensive Study on Real World Concurrency Bugs in Node.js. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017). IEEE Press, 520–531. https://doi.org/10.1109/ASE.2017.8115663 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11). Association for Computing Machinery, New York, NY, USA. 283–294. https://doi.org/10.1145/1993498.1993532 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Qirun Zhang, Chengnian Sun, and Zhendong Su. 2017. Skeletal Program Enumeration for Rigorous Compiler Testing. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). Association for Computing Machinery, New York, NY, USA. 347–361. https://doi.org/10.1145/3062341.3062379 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Zhide Zhou, Zhilei Ren, Guojun Gao, and He Jiang. 2021. An empirical study of optimization bugs in GCC and LLVM. Journal of Systems and Software, 174 (2021), 110884. issn:0164-1212 https://doi.org/10.1016/j.jss.2020.110884 Google ScholarGoogle Scholar
  46. David Zubrow. 2010. IEEE Standard Classification for Software Anomalies. IEEE Std 1044-2009 (Revision of IEEE Std 1044-1993), 1–23. https://doi.org/10.1109/IEEESTD.2010.5399061 Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Well-typed programs can go wrong: a study of typing-related bugs in JVM compilers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!