Abstract
This paper proposes, EFTSanitizer, a fast shadow execution framework for detecting and debugging numerical errors during late stages of testing especially for long-running applications. Any shadow execution framework needs an oracle to compare against the floating point (FP) execution. This paper makes a case for using error free transformations, which is a sequence of operations to compute the error of a primitive operation with existing hardware supported FP operations, as an oracle for shadow execution. Although the error of a single correctly rounded FP operation is bounded, the accumulation of errors across operations can result in exceptions, slow convergences, and even crashes. To ease the job of debugging such errors, EFTSanitizer provides a directed acyclic graph (DAG) that highlights the propagation of errors, which results in exceptions or crashes. Unlike prior work, DAGs produced by EFTSanitizer include operations that span various function calls while keeping the memory usage bounded. To enable the use of such shadow execution tools with long-running applications, EFTSanitizer also supports starting the shadow execution at an arbitrary point in the dynamic execution, which we call selective shadow execution. EFTSanitizer is an order of magnitude faster than prior state-of-art shadow execution tools such as FPSanitizer and Herbgrind. We have discovered new numerical errors and debugged them using EFTSanitizer.
- Mridul Aanjaneya, Jay P. Lim, and Santosh Nagarakatte. 2022. Progressive Polynomial Approximations for Fast Correctly Rounded Math Libraries. In 43rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’22). https://doi.org/10.1145/3519939.3523447
Google Scholar
Digital Library
- Tao Bao and Xiangyu Zhang. 2013. On-the-Fly Detection of Instability Problems in Floating-Point Program Execution. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’13). Association for Computing Machinery, New York, NY, USA. 817–832. isbn:9781450323741 https://doi.org/10.1145/2509136.2509526
Google Scholar
Digital Library
- Earl T. Barr, Thanh Vo, Vu Le, and Zhendong Su. 2013. Automatic Detection of Floating-Point Exceptions. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’13). Association for Computing Machinery, New York, NY, USA. 549–560. isbn:9781450318327 https://doi.org/10.1145/2429069.2429133
Google Scholar
Digital Library
- Florian Benz, Andreas Hildebrandt, and Sebastian Hack. 2012. A Dynamic Program Analysis to Find Floating-Point Accuracy Problems. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’12). Association for Computing Machinery, New York, NY, USA. 453–462. isbn:9781450312059 https://doi.org/10.1145/2254064.2254118
Google Scholar
Digital Library
- S. Boldo and Marc Daumas. 2003. Representable correcting terms for possibly underflowing floating point operations. Proceedings - Symposium on Computer Arithmetic, 79– 86. isbn:0-7695-1894-X https://doi.org/10.1109/ARITH.2003.1207663
Google Scholar
Cross Ref
- Sylvie Boldo, Stef Graillat, and Jean-Michel Muller. 2017. On the Robustness of the 2Sum and Fast2Sum Algorithms. ACM Trans. Math. Softw., 44, 1 (2017), Article 4, jul, 14 pages. issn:0098-3500 https://doi.org/10.1145/3054947
Google Scholar
Digital Library
- Cadna. 2022. The gaussian method. https://www-pequan.lip6.fr/cadna/Examples_Dir/ex6.php
Google Scholar
- Wei-Fan Chiang, Mark Baranowski, Ian Briggs, Alexey Solovyev, Ganesh Gopalakrishnan, and Zvonimir Rakamarić. 2017. Rigorous Floating-point Mixed-precision Tuning. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA. 300–315. isbn:978-1-4503-4660-3 https://doi.org/10.1145/3009837.3009846
Google Scholar
Digital Library
- Sangeeta Chowdhary, Jay P. Lim, and Santosh Nagarakatte. 2020. Debugging and Detecting Numerical Errors in Computation with Posits. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 731–746. isbn:9781450376136 https://doi.org/10.1145/3385412.3386004
Google Scholar
Digital Library
- Sangeeta Chowdhary, Jay P Lim, and Santosh Nagarakatte. 2020. FPSanitizer - A debugger to detect and diagnose numerical errors in floating point programs. https://github.com/rutgers-apl/fpsanitizer
Google Scholar
- Sangeeta Chowdhary and Santosh Nagarakatte. 2021. Parallel Shadow Execution to Accelerate the Debugging of Numerical Errors. ESEC/FSE 2021. Association for Computing Machinery, New York, NY, USA. isbn:9781450385626 https://doi.org/10.1145/3468264.3468585
Google Scholar
Digital Library
- Sangeeta Chowdhary and Santosh Nagarakatte. 2022. Artifact for Fast Shadow Execution for Debugging Numerical Errors using Error Free Transformations. https://doi.org/10.5281/zenodo.7080559
Google Scholar
Digital Library
- Sangeeta Chowdhary and Santosh Nagarakatte. 2022. EFTSantizer: Fast Shadow Execution for Debugging Numerical Errors using Error Free Transformations. https://github.com/rutgers-apl/EFTSanitizer
Google Scholar
- Catherine Daramy, David Defour, Florent Dinechin, and Jean-Michel Muller. 2003. CR-LIBM: A correctly rounded elementary function library. In Proceedings of SPIE Vol. 5205: Advanced Signal Processing Algorithms, Architectures, and Implementations XIII. 5205, https://doi.org/10.1117/12.505591
Google Scholar
Cross Ref
- Catherine Daramy-Loirat, David Defour, Florent de Dinechin, Matthieu Gallet, Nicolas Gast, Christoph Lauter, and Jean-Michel Muller. 2006. CR-LIBM A library of correctly rounded elementary functions in double-precision. Laboratoire de l’Informatique du Parallélisme. https://hal-ens-lyon.archives-ouvertes.fr/ensl-01529804
Google Scholar
- Eva Darulova, Anastasiia Izycheva, Fariha Nasir, Fabian Ritter, Heiko Becker, and Robert Bastian. 2018. Daisy-framework for analysis and optimization of numerical programs (tool paper). In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 270–287. https://doi.org/10.1007/978-3-319-89960-2_15
Google Scholar
Cross Ref
- Eva Darulova and Viktor Kuncak. 2014. Sound Compilation of Reals. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’14). Association for Computing Machinery, New York, NY, USA. 235–248. isbn:9781450325448 https://doi.org/10.1145/2535838.2535874
Google Scholar
Digital Library
- Arnab Das, Ian Briggs, Ganesh Gopalakrishnan, Sriram Krishnamoorthy, and Pavel Panchekha. 2020. Scalable yet Rigorous Floating-Point Error Analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20). IEEE Press, Article 51, 14 pages. isbn:9781728199986 https://doi.org/10.1109/SC41405.2020.00055
Google Scholar
Cross Ref
- Florent de Dinechin, Christoph Quirin Lauter, and Guillaume Melquiond. 2006. Assisted verification of elementary functions using Gappa. In Proceedings of the 2006 ACM Symposium on Applied Computing (SAC). ACM, 1318–1322. https://doi.org/10.1145/1141277.1141584
Google Scholar
Digital Library
- T. J. Dekker. 1971. A floating-point technique for extending the available precision. Numer. Math., 18, 3 (1971), 224–242. isbn:0945-3245 https://doi.org/10.1007/BF01397083
Google Scholar
Digital Library
- David Delmas and Jean Souyris. 2007. Astrée: From Research to Industry. In Proceedings of the 14th International Conference on Static Analysis (SAS’07). Springer-Verlag, Berlin, Heidelberg. 437–451. isbn:3540740600 https://doi.org/10.1007/978-3-540-74061-2_27
Google Scholar
Cross Ref
- Nestor Demeure. 2020. Compromise between precision and performance in high-performance computing. Ph. D. Dissertation. Université Paris-Saclay. https://tel.archives-ouvertes.fr/tel-03116750
Google Scholar
- Peter Dinda, Alex Bernat, and Conor Hetland. 2020. Spying on the Floating Point Behavior of Existing, Unmodified Scientific Applications. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’20). Association for Computing Machinery, New York, NY, USA. 5–16. isbn:9781450370523 https://doi.org/10.1145/3369583.3392673
Google Scholar
Digital Library
- Marco A Feliú, Mariano Moscato, and César A Muñoz. 2018. An abstract interpretation framework for the round-off error analysis of floating-point programs. In International Conference on Verification, Model Checking, and Abstract Interpretation. 516–537. https://doi.org/10.1007/978-3-319-73721-8_24
Google Scholar
Cross Ref
- François Févotte and Bruno Lathuilière. 2016. VERROU: Assessing Floating-Point Accuracy Without Recompiling. Oct., https://hal.archives-ouvertes.fr/hal-01383417 working paper or preprint
Google Scholar
- Laurent Fousse, Guillaume Hanrot, Vincent Lefèvre, Patrick Pélissier, and Paul Zimmermann. 2007. MPFR: A Multiple-precision Binary Floating-point Library with Correct Rounding. In ACM Transactions on Mathematical Software. 33, ACM, New York, NY, USA. Article 13, issn:0098-3500 https://doi.org/10.1145/1236463.1236468
Google Scholar
Digital Library
- Khalil Ghorbal, Franjo Ivancic, Gogul Balakrishnan, Naoto Maeda, and Aarti Gupta. 2012. Donut Domains: Efficient Non-convex Domains for Abstract Interpretation. In Verification, Model Checking, and Abstract Interpretation (Lecture Notes in Computer Science, Vol. 7148). Springer, 235–250. https://doi.org/10.1007/978-3-642-27940-9_16
Google Scholar
Digital Library
- David Goldberg. 1991. What Every Computer Scientist Should Know About Floating-point Arithmetic. In ACM Computing Surveys. 23, ACM, New York, NY, USA. 5–48. issn:0360-0300 https://doi.org/10.1145/103162.103163
Google Scholar
Digital Library
- Eric Goubault. 2001. Static Analyses of the Precision of Floating-Point Operations. In Proceedings of the 8th International Symposium on Static Analysis (SAS). Springer, 234–259. isbn:978-3-540-47764-8 https://doi.org/10.1007/3-540-47764-0_14
Google Scholar
Cross Ref
- Eric Goubault, Sylvie Putot, Philippe Baufreton, and Jean Gassino. 2007. Static analysis of the accuracy in control systems: Principles and experiments. In Revised Selected Papers from the 12th International Workshop on Formal Methods for Industrial Critical Systems. Springer, 3–20. https://doi.org/10.1007/978-3-540-79707-4_3
Google Scholar
Cross Ref
- Nicholas J. Higham. 2002. Accuracy and Stability of Numerical Algorithms (2nd ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. isbn:0898715210
Google Scholar
Digital Library
- Claude-Pierre Jeannerod, Jean-Michel Muller, and Paul Zimmermann. 2018. On Various Ways to Split a Floating-Point Number. In 2018 IEEE 25th Symposium on Computer Arithmetic (ARITH). 53–60. https://doi.org/10.1109/ARITH.2018.8464793
Google Scholar
Cross Ref
- Fabienne Jézéquel and Jean-Marie Chesneaux. 2008. CADNA: a library for estimating round-off error propagation. Computer Physics Communications, 178, 12 (2008), June, 933–955. https://doi.org/10.1016/j.cpc.2008.02.003
Google Scholar
Cross Ref
- William Kahan. 1965. Pracniques: Further Remarks on Reducing Truncation Errors. In Communications of the ACM. 8, ACM, New York, NY, USA. issn:0001-0782 https://doi.org/10.1145/363707.363723
Google Scholar
Digital Library
- Ian Karlin, Jeff Keasler, and Rob Neely. 2013. LULESH 2.0 Updates and Changes. 1–9.
Google Scholar
- Donald E. Knuth. 1997. The Art of Computer Programming, Volume 2 (3rd Ed.): Seminumerical Algorithms. Addison-Wesley Longman Publishing Co., Inc., USA. isbn:0201896842
Google Scholar
- Philippe Langlois, Stef Graillat, and Nicolas Louvet. 2006. Compensated Horner Scheme. In Algebraic and Numerical Algorithms and Computer-assisted Proofs, Bruno Buchberger, Shin’ichi Oishi, Michael Plum, and Sigfried M. Rump (Eds.) (Dagstuhl Seminar Proceedings (DagSemProc), Vol. 5391). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany. issn:1862-4405 https://doi.org/10.4230/DagSemProc.05391.3
Google Scholar
Cross Ref
- Wen-Chuan Lee, Tao Bao, Yunhui Zheng, Xiangyu Zhang, Keval Vora, and Rajiv Gupta. 2015. RAIVE: Runtime Assessment of Floating-Point Instability by Vectorization. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). Association for Computing Machinery, New York, NY, USA. 623–638. isbn:9781450336895 https://doi.org/10.1145/2814270.2814299
Google Scholar
Digital Library
- Jay P. Lim, Mridul Aanjaneya, John Gustafson, and Santosh Nagarakatte. 2020. A Novel Approach to Generate Correctly Rounded Math Libraries for New Floating Point Representations. arxiv:2007.05344. Rutgers Department of Computer Science Technical Report DCS-TR-753
Google Scholar
- Jay P. Lim, Mridul Aanjaneya, John Gustafson, and Santosh Nagarakatte. 2021. An Approach to Generate Correctly Rounded Math Libraries for New Floating Point Variants. Proceedings of the ACM on Programming Languages, 6, POPL (2021), Article 29, Jan., 30 pages. https://doi.org/10.1145/3434310
Google Scholar
Digital Library
- Jay P. Lim and Santosh Nagarakatte. 2021. High Performance Correctly Rounded Math Libraries for 32-bit Floating Point Representations. In 42nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’21). https://doi.org/10.1145/3453483.3454049
Google Scholar
Digital Library
- Jay P Lim and Santosh Nagarakatte. 2021. RLIBM-32: High Performance Correctly Rounded Math Libraries for 32-bit Floating Point Representations. arxiv:2104.04043. Rutgers Department of Computer Science Technical Report DCS-TR-754
Google Scholar
- Jay P. Lim and Santosh Nagarakatte. 2021. RLIBM-ALL: A Novel Polynomial Approximation Method to Produce Correctly Rounded Results for Multiple Representations and Rounding Modes. arxiv:2108.06756. Rutgers Department of Computer Science Technical Report DCS-TR-757
Google Scholar
- Jay P. Lim and Santosh Nagarakatte. 2022. One Polynomial Approximation to Produce Correctly Rounded Results of an Elementary Function for Multiple Representations and Rounding Modes. Proceedings of the ACM on Programming Languages, 6, POPL (2022), Article 3, Jan., 28 pages. https://doi.org/10.1145/3498664
Google Scholar
Digital Library
- LLNL. 2022. AMG. https://asc.llnl.gov/codes/proxy-apps/amg2013
Google Scholar
- LLNL. 2022. High-order Lagrangian Hydrodynamics Miniapp. https://github.com/CEED/Laghos
Google Scholar
- Jean-Michel Muller. 2016. Elementary Functions: Algorithms and Implementation. Springer, 3rd edition. https://doi.org/10.1007/978-1-4899-7983-4
Google Scholar
Cross Ref
- Jean-Michel Muller, Nicolas Brunie, Florent de Dinechin, Claude-Pierre Jeannerod, Mioara Joldes, Vincent Lefvre, Guillaume Melquiond, Nathalie Revol, and Serge Torres. 2018. Handbook of Floating-Point Arithmetic (2nd ed.). Birkhäuser Basel. isbn:3319765256 https://doi.org/10.1007/978-3-319-76526-6
Google Scholar
Cross Ref
- Santosh Nagarakatte. 2012. Practical Low-Overhead Enforcement of Memory Safety for C Programs. Ph. D. Dissertation. University of Pennsylvania.
Google Scholar
- Santosh Nagarakatte, Jianzhou Zhao, Milo M. K. Martin, and Steve Zdancewic. 2010. CETS: Compiler Enforced Temporal Safety for C. In Proceedings of the 2010 International Symposium on Memory Management. https://doi.org/10.1145/1806651.1806657
Google Scholar
Digital Library
- NAS. 2022. NAS Parallel Benchmarks 3.0. https://github.com/benchmark-subsetting/NPB3.0-omp-C
Google Scholar
- Nicholas Nethercote and Julian Seward. 2007. Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’07). Association for Computing Machinery, New York, NY, USA. 89–100. isbn:9781595936332 https://doi.org/10.1145/1250734.1250746
Google Scholar
Digital Library
- Takeshi Ogita, Siegfried Rump, and Shin’ichi Oishi. 2005. Accurate Sum and Dot Product. SIAM J. Scientific Computing, 26 (2005), 01, 1955–1988. https://doi.org/10.1137/030601818
Google Scholar
Digital Library
- Pavel Panchekha, Alex Sanchez-Stern, James R. Wilcox, and Zachary Tatlock. 2015. Automatically Improving Accuracy for Floating Point Expressions. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, New York, NY, USA. 1–11. isbn:978-1-4503-3468-6 https://doi.org/10.1145/2813885.2737959
Google Scholar
Digital Library
- Douglas M. Priest. 1992. On Properties of Floating Point Arithmetics: Numerical Stability and the Cost of Accurate Computations. Ph. D. Dissertation. USA. UMI Order No. GAX93-30692
Google Scholar
- Siegfried M. Rump. 2009. Ultimately Fast Accurate Summation. SIAM Journal on Scientific Computing, 31, 5 (2009), 3466–3502. https://doi.org/10.1137/080738490
Google Scholar
Digital Library
- Alex Sanchez-Stern, Pavel Panchekha, Sorin Lerner, and Zachary Tatlock. 2018. Finding Root Causes of Floating Point Error. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). Association for Computing Machinery, New York, NY, USA. 256–269. isbn:9781450356985 https://doi.org/10.1145/3192366.3192411
Google Scholar
Digital Library
- Jonathan Shewchuk. 1996. Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates. Discrete and Computational Geometry, 18 (1996), 07, https://doi.org/10.1007/PL00009321
Google Scholar
Cross Ref
- Alexey Solovyev, Marek S. Baranowski, Ian Briggs, Charles Jacobsen, Zvonimir Rakamarić, and Ganesh Gopalakrishnan. 2018. Rigorous Estimation of Floating-Point Round-Off Errors with Symbolic Taylor Expansions. ACM Trans. Program. Lang. Syst., 41, 1 (2018), Article 2, dec, 39 pages. issn:0164-0925 https://doi.org/10.1145/3230733
Google Scholar
Digital Library
- Pat H Sterbenz. 1974. Floating-point computation. Prentice-Hall, Englewood Cliffs, NJ.
Google Scholar
- US-GAO United States General Accounting Office. 1992. Patriot Missile Defense: Software Problem Led to System Failure at Dhahran, Saudi Arabia. https://www.gao.gov/products/IMTEC-92-26
Google Scholar
- Yuhao Zhang, Luyao Ren, Liqian Chen, Yingfei Xiong, Shing-Chi Cheung, and Tao Xie. 2020. Detecting Numerical Bugs in Neural Network Architectures. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE 2020). Association for Computing Machinery, New York, NY, USA. 826–837. https://doi.org/10.1145/3368089.3409720
Google Scholar
Digital Library
- Daming Zou, Muhan Zeng, Yingfei Xiong, Zhoulai Fu, Lu Zhang, and Zhendong Su. 2019. Detecting Floating-Point Errors via Atomic Conditions. Proc. ACM Program. Lang., 4, POPL (2019), Article 60, Dec., 27 pages. https://doi.org/10.1145/3371128
Google Scholar
Digital Library
Index Terms
Fast shadow execution for debugging numerical errors using error free transformations
Recommendations
Parallel shadow execution to accelerate the debugging of numerical errors
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringThis paper proposes a new approach for debugging errors in floating point computation by performing shadow execution with higher precision in parallel. The programmer specifies parts of the program that need to be debugged for errors. Our compiler ...
An efficient alias-free shadow algorithm for opaque and transparent objects using per-triangle shadow volumes
This paper presents a novel method for generating pixel-accurate shadows from point light-sources in real-time. The new method is able to quickly cull pixels that are not in shadow and to trivially accept large chunks of pixels thanks mainly to using ...






Comments