skip to main content

Fast shadow execution for debugging numerical errors using error free transformations

Published:31 October 2022Publication History
Skip Abstract Section

Abstract

This paper proposes, EFTSanitizer, a fast shadow execution framework for detecting and debugging numerical errors during late stages of testing especially for long-running applications. Any shadow execution framework needs an oracle to compare against the floating point (FP) execution. This paper makes a case for using error free transformations, which is a sequence of operations to compute the error of a primitive operation with existing hardware supported FP operations, as an oracle for shadow execution. Although the error of a single correctly rounded FP operation is bounded, the accumulation of errors across operations can result in exceptions, slow convergences, and even crashes. To ease the job of debugging such errors, EFTSanitizer provides a directed acyclic graph (DAG) that highlights the propagation of errors, which results in exceptions or crashes. Unlike prior work, DAGs produced by EFTSanitizer include operations that span various function calls while keeping the memory usage bounded. To enable the use of such shadow execution tools with long-running applications, EFTSanitizer also supports starting the shadow execution at an arbitrary point in the dynamic execution, which we call selective shadow execution. EFTSanitizer is an order of magnitude faster than prior state-of-art shadow execution tools such as FPSanitizer and Herbgrind. We have discovered new numerical errors and debugged them using EFTSanitizer.

References

  1. Mridul Aanjaneya, Jay P. Lim, and Santosh Nagarakatte. 2022. Progressive Polynomial Approximations for Fast Correctly Rounded Math Libraries. In 43rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’22). https://doi.org/10.1145/3519939.3523447 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Tao Bao and Xiangyu Zhang. 2013. On-the-Fly Detection of Instability Problems in Floating-Point Program Execution. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’13). Association for Computing Machinery, New York, NY, USA. 817–832. isbn:9781450323741 https://doi.org/10.1145/2509136.2509526 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Earl T. Barr, Thanh Vo, Vu Le, and Zhendong Su. 2013. Automatic Detection of Floating-Point Exceptions. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’13). Association for Computing Machinery, New York, NY, USA. 549–560. isbn:9781450318327 https://doi.org/10.1145/2429069.2429133 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Florian Benz, Andreas Hildebrandt, and Sebastian Hack. 2012. A Dynamic Program Analysis to Find Floating-Point Accuracy Problems. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’12). Association for Computing Machinery, New York, NY, USA. 453–462. isbn:9781450312059 https://doi.org/10.1145/2254064.2254118 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Boldo and Marc Daumas. 2003. Representable correcting terms for possibly underflowing floating point operations. Proceedings - Symposium on Computer Arithmetic, 79– 86. isbn:0-7695-1894-X https://doi.org/10.1109/ARITH.2003.1207663 Google ScholarGoogle ScholarCross RefCross Ref
  6. Sylvie Boldo, Stef Graillat, and Jean-Michel Muller. 2017. On the Robustness of the 2Sum and Fast2Sum Algorithms. ACM Trans. Math. Softw., 44, 1 (2017), Article 4, jul, 14 pages. issn:0098-3500 https://doi.org/10.1145/3054947 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cadna. 2022. The gaussian method. https://www-pequan.lip6.fr/cadna/Examples_Dir/ex6.php Google ScholarGoogle Scholar
  8. Wei-Fan Chiang, Mark Baranowski, Ian Briggs, Alexey Solovyev, Ganesh Gopalakrishnan, and Zvonimir Rakamarić. 2017. Rigorous Floating-point Mixed-precision Tuning. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA. 300–315. isbn:978-1-4503-4660-3 https://doi.org/10.1145/3009837.3009846 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sangeeta Chowdhary, Jay P. Lim, and Santosh Nagarakatte. 2020. Debugging and Detecting Numerical Errors in Computation with Posits. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 731–746. isbn:9781450376136 https://doi.org/10.1145/3385412.3386004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sangeeta Chowdhary, Jay P Lim, and Santosh Nagarakatte. 2020. FPSanitizer - A debugger to detect and diagnose numerical errors in floating point programs. https://github.com/rutgers-apl/fpsanitizer Google ScholarGoogle Scholar
  11. Sangeeta Chowdhary and Santosh Nagarakatte. 2021. Parallel Shadow Execution to Accelerate the Debugging of Numerical Errors. ESEC/FSE 2021. Association for Computing Machinery, New York, NY, USA. isbn:9781450385626 https://doi.org/10.1145/3468264.3468585 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sangeeta Chowdhary and Santosh Nagarakatte. 2022. Artifact for Fast Shadow Execution for Debugging Numerical Errors using Error Free Transformations. https://doi.org/10.5281/zenodo.7080559 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sangeeta Chowdhary and Santosh Nagarakatte. 2022. EFTSantizer: Fast Shadow Execution for Debugging Numerical Errors using Error Free Transformations. https://github.com/rutgers-apl/EFTSanitizer Google ScholarGoogle Scholar
  14. Catherine Daramy, David Defour, Florent Dinechin, and Jean-Michel Muller. 2003. CR-LIBM: A correctly rounded elementary function library. In Proceedings of SPIE Vol. 5205: Advanced Signal Processing Algorithms, Architectures, and Implementations XIII. 5205, https://doi.org/10.1117/12.505591 Google ScholarGoogle ScholarCross RefCross Ref
  15. Catherine Daramy-Loirat, David Defour, Florent de Dinechin, Matthieu Gallet, Nicolas Gast, Christoph Lauter, and Jean-Michel Muller. 2006. CR-LIBM A library of correctly rounded elementary functions in double-precision. Laboratoire de l’Informatique du Parallélisme. https://hal-ens-lyon.archives-ouvertes.fr/ensl-01529804 Google ScholarGoogle Scholar
  16. Eva Darulova, Anastasiia Izycheva, Fariha Nasir, Fabian Ritter, Heiko Becker, and Robert Bastian. 2018. Daisy-framework for analysis and optimization of numerical programs (tool paper). In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 270–287. https://doi.org/10.1007/978-3-319-89960-2_15 Google ScholarGoogle ScholarCross RefCross Ref
  17. Eva Darulova and Viktor Kuncak. 2014. Sound Compilation of Reals. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’14). Association for Computing Machinery, New York, NY, USA. 235–248. isbn:9781450325448 https://doi.org/10.1145/2535838.2535874 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Arnab Das, Ian Briggs, Ganesh Gopalakrishnan, Sriram Krishnamoorthy, and Pavel Panchekha. 2020. Scalable yet Rigorous Floating-Point Error Analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20). IEEE Press, Article 51, 14 pages. isbn:9781728199986 https://doi.org/10.1109/SC41405.2020.00055 Google ScholarGoogle ScholarCross RefCross Ref
  19. Florent de Dinechin, Christoph Quirin Lauter, and Guillaume Melquiond. 2006. Assisted verification of elementary functions using Gappa. In Proceedings of the 2006 ACM Symposium on Applied Computing (SAC). ACM, 1318–1322. https://doi.org/10.1145/1141277.1141584 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. J. Dekker. 1971. A floating-point technique for extending the available precision. Numer. Math., 18, 3 (1971), 224–242. isbn:0945-3245 https://doi.org/10.1007/BF01397083 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. David Delmas and Jean Souyris. 2007. Astrée: From Research to Industry. In Proceedings of the 14th International Conference on Static Analysis (SAS’07). Springer-Verlag, Berlin, Heidelberg. 437–451. isbn:3540740600 https://doi.org/10.1007/978-3-540-74061-2_27 Google ScholarGoogle ScholarCross RefCross Ref
  22. Nestor Demeure. 2020. Compromise between precision and performance in high-performance computing. Ph. D. Dissertation. Université Paris-Saclay. https://tel.archives-ouvertes.fr/tel-03116750 Google ScholarGoogle Scholar
  23. Peter Dinda, Alex Bernat, and Conor Hetland. 2020. Spying on the Floating Point Behavior of Existing, Unmodified Scientific Applications. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’20). Association for Computing Machinery, New York, NY, USA. 5–16. isbn:9781450370523 https://doi.org/10.1145/3369583.3392673 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Marco A Feliú, Mariano Moscato, and César A Muñoz. 2018. An abstract interpretation framework for the round-off error analysis of floating-point programs. In International Conference on Verification, Model Checking, and Abstract Interpretation. 516–537. https://doi.org/10.1007/978-3-319-73721-8_24 Google ScholarGoogle ScholarCross RefCross Ref
  25. François Févotte and Bruno Lathuilière. 2016. VERROU: Assessing Floating-Point Accuracy Without Recompiling. Oct., https://hal.archives-ouvertes.fr/hal-01383417 working paper or preprint Google ScholarGoogle Scholar
  26. Laurent Fousse, Guillaume Hanrot, Vincent Lefèvre, Patrick Pélissier, and Paul Zimmermann. 2007. MPFR: A Multiple-precision Binary Floating-point Library with Correct Rounding. In ACM Transactions on Mathematical Software. 33, ACM, New York, NY, USA. Article 13, issn:0098-3500 https://doi.org/10.1145/1236463.1236468 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Khalil Ghorbal, Franjo Ivancic, Gogul Balakrishnan, Naoto Maeda, and Aarti Gupta. 2012. Donut Domains: Efficient Non-convex Domains for Abstract Interpretation. In Verification, Model Checking, and Abstract Interpretation (Lecture Notes in Computer Science, Vol. 7148). Springer, 235–250. https://doi.org/10.1007/978-3-642-27940-9_16 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. David Goldberg. 1991. What Every Computer Scientist Should Know About Floating-point Arithmetic. In ACM Computing Surveys. 23, ACM, New York, NY, USA. 5–48. issn:0360-0300 https://doi.org/10.1145/103162.103163 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Eric Goubault. 2001. Static Analyses of the Precision of Floating-Point Operations. In Proceedings of the 8th International Symposium on Static Analysis (SAS). Springer, 234–259. isbn:978-3-540-47764-8 https://doi.org/10.1007/3-540-47764-0_14 Google ScholarGoogle ScholarCross RefCross Ref
  30. Eric Goubault, Sylvie Putot, Philippe Baufreton, and Jean Gassino. 2007. Static analysis of the accuracy in control systems: Principles and experiments. In Revised Selected Papers from the 12th International Workshop on Formal Methods for Industrial Critical Systems. Springer, 3–20. https://doi.org/10.1007/978-3-540-79707-4_3 Google ScholarGoogle ScholarCross RefCross Ref
  31. Nicholas J. Higham. 2002. Accuracy and Stability of Numerical Algorithms (2nd ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. isbn:0898715210 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Claude-Pierre Jeannerod, Jean-Michel Muller, and Paul Zimmermann. 2018. On Various Ways to Split a Floating-Point Number. In 2018 IEEE 25th Symposium on Computer Arithmetic (ARITH). 53–60. https://doi.org/10.1109/ARITH.2018.8464793 Google ScholarGoogle ScholarCross RefCross Ref
  33. Fabienne Jézéquel and Jean-Marie Chesneaux. 2008. CADNA: a library for estimating round-off error propagation. Computer Physics Communications, 178, 12 (2008), June, 933–955. https://doi.org/10.1016/j.cpc.2008.02.003 Google ScholarGoogle ScholarCross RefCross Ref
  34. William Kahan. 1965. Pracniques: Further Remarks on Reducing Truncation Errors. In Communications of the ACM. 8, ACM, New York, NY, USA. issn:0001-0782 https://doi.org/10.1145/363707.363723 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ian Karlin, Jeff Keasler, and Rob Neely. 2013. LULESH 2.0 Updates and Changes. 1–9. Google ScholarGoogle Scholar
  36. Donald E. Knuth. 1997. The Art of Computer Programming, Volume 2 (3rd Ed.): Seminumerical Algorithms. Addison-Wesley Longman Publishing Co., Inc., USA. isbn:0201896842 Google ScholarGoogle Scholar
  37. Philippe Langlois, Stef Graillat, and Nicolas Louvet. 2006. Compensated Horner Scheme. In Algebraic and Numerical Algorithms and Computer-assisted Proofs, Bruno Buchberger, Shin’ichi Oishi, Michael Plum, and Sigfried M. Rump (Eds.) (Dagstuhl Seminar Proceedings (DagSemProc), Vol. 5391). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany. issn:1862-4405 https://doi.org/10.4230/DagSemProc.05391.3 Google ScholarGoogle ScholarCross RefCross Ref
  38. Wen-Chuan Lee, Tao Bao, Yunhui Zheng, Xiangyu Zhang, Keval Vora, and Rajiv Gupta. 2015. RAIVE: Runtime Assessment of Floating-Point Instability by Vectorization. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). Association for Computing Machinery, New York, NY, USA. 623–638. isbn:9781450336895 https://doi.org/10.1145/2814270.2814299 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jay P. Lim, Mridul Aanjaneya, John Gustafson, and Santosh Nagarakatte. 2020. A Novel Approach to Generate Correctly Rounded Math Libraries for New Floating Point Representations. arxiv:2007.05344. Rutgers Department of Computer Science Technical Report DCS-TR-753 Google ScholarGoogle Scholar
  40. Jay P. Lim, Mridul Aanjaneya, John Gustafson, and Santosh Nagarakatte. 2021. An Approach to Generate Correctly Rounded Math Libraries for New Floating Point Variants. Proceedings of the ACM on Programming Languages, 6, POPL (2021), Article 29, Jan., 30 pages. https://doi.org/10.1145/3434310 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jay P. Lim and Santosh Nagarakatte. 2021. High Performance Correctly Rounded Math Libraries for 32-bit Floating Point Representations. In 42nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’21). https://doi.org/10.1145/3453483.3454049 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jay P Lim and Santosh Nagarakatte. 2021. RLIBM-32: High Performance Correctly Rounded Math Libraries for 32-bit Floating Point Representations. arxiv:2104.04043. Rutgers Department of Computer Science Technical Report DCS-TR-754 Google ScholarGoogle Scholar
  43. Jay P. Lim and Santosh Nagarakatte. 2021. RLIBM-ALL: A Novel Polynomial Approximation Method to Produce Correctly Rounded Results for Multiple Representations and Rounding Modes. arxiv:2108.06756. Rutgers Department of Computer Science Technical Report DCS-TR-757 Google ScholarGoogle Scholar
  44. Jay P. Lim and Santosh Nagarakatte. 2022. One Polynomial Approximation to Produce Correctly Rounded Results of an Elementary Function for Multiple Representations and Rounding Modes. Proceedings of the ACM on Programming Languages, 6, POPL (2022), Article 3, Jan., 28 pages. https://doi.org/10.1145/3498664 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. LLNL. 2022. AMG. https://asc.llnl.gov/codes/proxy-apps/amg2013 Google ScholarGoogle Scholar
  46. LLNL. 2022. High-order Lagrangian Hydrodynamics Miniapp. https://github.com/CEED/Laghos Google ScholarGoogle Scholar
  47. Jean-Michel Muller. 2016. Elementary Functions: Algorithms and Implementation. Springer, 3rd edition. https://doi.org/10.1007/978-1-4899-7983-4 Google ScholarGoogle ScholarCross RefCross Ref
  48. Jean-Michel Muller, Nicolas Brunie, Florent de Dinechin, Claude-Pierre Jeannerod, Mioara Joldes, Vincent Lefvre, Guillaume Melquiond, Nathalie Revol, and Serge Torres. 2018. Handbook of Floating-Point Arithmetic (2nd ed.). Birkhäuser Basel. isbn:3319765256 https://doi.org/10.1007/978-3-319-76526-6 Google ScholarGoogle ScholarCross RefCross Ref
  49. Santosh Nagarakatte. 2012. Practical Low-Overhead Enforcement of Memory Safety for C Programs. Ph. D. Dissertation. University of Pennsylvania. Google ScholarGoogle Scholar
  50. Santosh Nagarakatte, Jianzhou Zhao, Milo M. K. Martin, and Steve Zdancewic. 2010. CETS: Compiler Enforced Temporal Safety for C. In Proceedings of the 2010 International Symposium on Memory Management. https://doi.org/10.1145/1806651.1806657 Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. NAS. 2022. NAS Parallel Benchmarks 3.0. https://github.com/benchmark-subsetting/NPB3.0-omp-C Google ScholarGoogle Scholar
  52. Nicholas Nethercote and Julian Seward. 2007. Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’07). Association for Computing Machinery, New York, NY, USA. 89–100. isbn:9781595936332 https://doi.org/10.1145/1250734.1250746 Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Takeshi Ogita, Siegfried Rump, and Shin’ichi Oishi. 2005. Accurate Sum and Dot Product. SIAM J. Scientific Computing, 26 (2005), 01, 1955–1988. https://doi.org/10.1137/030601818 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Pavel Panchekha, Alex Sanchez-Stern, James R. Wilcox, and Zachary Tatlock. 2015. Automatically Improving Accuracy for Floating Point Expressions. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, New York, NY, USA. 1–11. isbn:978-1-4503-3468-6 https://doi.org/10.1145/2813885.2737959 Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Douglas M. Priest. 1992. On Properties of Floating Point Arithmetics: Numerical Stability and the Cost of Accurate Computations. Ph. D. Dissertation. USA. UMI Order No. GAX93-30692 Google ScholarGoogle Scholar
  56. Siegfried M. Rump. 2009. Ultimately Fast Accurate Summation. SIAM Journal on Scientific Computing, 31, 5 (2009), 3466–3502. https://doi.org/10.1137/080738490 Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Alex Sanchez-Stern, Pavel Panchekha, Sorin Lerner, and Zachary Tatlock. 2018. Finding Root Causes of Floating Point Error. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). Association for Computing Machinery, New York, NY, USA. 256–269. isbn:9781450356985 https://doi.org/10.1145/3192366.3192411 Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Jonathan Shewchuk. 1996. Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates. Discrete and Computational Geometry, 18 (1996), 07, https://doi.org/10.1007/PL00009321 Google ScholarGoogle ScholarCross RefCross Ref
  59. Alexey Solovyev, Marek S. Baranowski, Ian Briggs, Charles Jacobsen, Zvonimir Rakamarić, and Ganesh Gopalakrishnan. 2018. Rigorous Estimation of Floating-Point Round-Off Errors with Symbolic Taylor Expansions. ACM Trans. Program. Lang. Syst., 41, 1 (2018), Article 2, dec, 39 pages. issn:0164-0925 https://doi.org/10.1145/3230733 Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Pat H Sterbenz. 1974. Floating-point computation. Prentice-Hall, Englewood Cliffs, NJ. Google ScholarGoogle Scholar
  61. US-GAO United States General Accounting Office. 1992. Patriot Missile Defense: Software Problem Led to System Failure at Dhahran, Saudi Arabia. https://www.gao.gov/products/IMTEC-92-26 Google ScholarGoogle Scholar
  62. Yuhao Zhang, Luyao Ren, Liqian Chen, Yingfei Xiong, Shing-Chi Cheung, and Tao Xie. 2020. Detecting Numerical Bugs in Neural Network Architectures. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE 2020). Association for Computing Machinery, New York, NY, USA. 826–837. https://doi.org/10.1145/3368089.3409720 Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Daming Zou, Muhan Zeng, Yingfei Xiong, Zhoulai Fu, Lu Zhang, and Zhendong Su. 2019. Detecting Floating-Point Errors via Atomic Conditions. Proc. ACM Program. Lang., 4, POPL (2019), Article 60, Dec., 27 pages. https://doi.org/10.1145/3371128 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast shadow execution for debugging numerical errors using error free transformations

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Article Metrics

        • Downloads (Last 12 months)104
        • Downloads (Last 6 weeks)15

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!