skip to main content
research-article

Demystifying Soft-Error Mitigation by Control-Flow Checking -- A New Perspective on its Effectiveness

Published:27 September 2017Publication History
Skip Abstract Section

Abstract

Soft errors are a challenging and urging problem in the domain of safety-critical embedded systems. For decades, checking schemes have been investigated and improved to mitigate soft-error effects for the class of control-flow faults, with current industrial standards strongly recommending their use.

However, reality looks different: Taking a systems perspective, we implemented four representative Control-Flow Checking (CFC) schemes and put them through their paces in 396 fault-injection campaigns. In contrast to previous work, which typically relied on probability-based vulnerability metrics, we accounted for the influence of memory and time overheads on the fault-space dimensions and applied those in full-scan fault injections. This change in procedure alone severely degraded the perceived effectiveness of CFC.

In addition, we expanded the perspective to data-flow faults and their influence on the overall susceptibility, an aspect that so far has been largely ignored. Our results suggest that, without accompanying measures, any improvement regarding control-flow faults is dominated by the increase in data faults caused by the increased attack surface in terms of memory and runtime overhead. Moreover, CFC performance less depended on the detection capabilities than on general aspects of the concrete binary compilation and execution.

In conclusion, incorporating CFC is not as straightforward as often assumed and the vulnerability of systems with hardened control-flow may in many cases even be increased by the schemes themselves.

References

  1. R. Alexandersson and J. Karlsson. 2011. Fault injection-based assessment of aspect-oriented implementation of fault tolerance. In 2011 IEEE/IFIP 41st International Conference on Dependable Systems Networks (DSN). 303--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Z. Alkhalifa, V. S. S. Nair, N. Krishnamurthy, and J. A. Abraham. 1999. Design and evaluation of system-level checks for on-line control flow error detection. IEEE Trans. Parallel Distrib. Syst. 10, 6 (June 1999), 627--641. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. A. Asghari, H. Taheri, H. Pedram, and O. Kaynak. 2014. Software-Based control flow checking against transient faults in Industrial Environments. IEEE Transactions on Industrial Informatics 10, 1 (Feb. 2014), 481--490.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. Baumann. 2005. Soft errors in advanced computer systems. IEEE Design Test of Computers 22, 3 (May 2005), 258--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Y. Borkar. 2005. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro 25, 6 (2005), 10--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Cheynet, B. Nicolescu, R. Velazco, M. Rebaudengo, M. Sonza Reorda, and M. Violante. 2000. Experimentally evaluating an automatic approach for generating safety-critical software with respect to transient errors. IEEE Transactions on Nuclear Science 47 (2000), 2231--2236.Google ScholarGoogle ScholarCross RefCross Ref
  7. J.-D. Choi, M. Gupta, M. J. Serrano, V. C. Sreedhar, and S. P. Midkiff. 2003. Stack allocation and synchronization optimizations for java using escape analysis. ACM Trans. Program. Lang. Syst. 25, 6 (Nov. 2003), 876--910. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Dietrich, M. Hoffmann, and D. Lohmann. 2017. Global optimization of fixed-Priority real-Time systems by RTOS-Aware control-Flow analysis. ACM Trans. Embed. Comput. Syst. 16, 2 (Jan. 2017), 35:1--35:25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Feldt and A. Magazinius. 2010. Validity threats in empirical software engineering research-An Initial Survey. In SEKE. 374--379.Google ScholarGoogle Scholar
  10. R. R. Ferreira, R. B. Parizi, L. Carro, and Á. F. Moreira. 2013. Compiler optimizations impact the reliability of the control-Flow of radiation-Hardened software. Journal of Aerospace Technology and Management 5, 3 (Aug. 2013), 323--334.Google ScholarGoogle ScholarCross RefCross Ref
  11. P. Forin. 1989. Vital coded microprocessor principles and application for various transit systems. In Symp. on Control, Computers, Communication in Transportation (CCCT’89). 79--84.Google ScholarGoogle Scholar
  12. P. Gawkowski, J. Sosnowski, and B. Radko. 2005. Analyzing the effectiveness of fault hardening procedures. In 11th IEEE International On-Line Testing Symposium. 14--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. O. Goloubeva, M. Rebaudengo, M. S. Reorda, and M. Violante. 2003. Soft-error detection using control flow assertions. In 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 2003. Proceedings. 581--588. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. O. Goloubeva, M. Rebaudengo, M. S. Reorda, and M. Violante. 2005. Improved software-based processor control-flow errors detection technique. In Annual Reliability and Maintainability Symposium, 2005. Proceedings. 583--589.Google ScholarGoogle Scholar
  15. O. Goloubeva, M. Rebaudengo, M. S. Reorda, and M. Violante. 2006. Software-Implemented Hardware Fault Tolerance. Springer US. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. W. Hamming. 1950. Error detecting and error correcting codes. Bell System Technical Journal 29, 2 (1950), 147--160.Google ScholarGoogle ScholarCross RefCross Ref
  17. F. Irom and D. Nguyen. 2007. IEEE Transactions on Nuclear Science 54, 6 (Dec 2007), 2547--2553.Google ScholarGoogle Scholar
  18. ISO 26262-9. 2011. ISO 26262-9:2011: Road vehicles -- Functional safety -- Part 9: Automotive Safety Integrity Level (ASIL)-oriented and safety-oriented analyses. ISO, Geneva, Switzerland.Google ScholarGoogle Scholar
  19. S. Kim and M. A. Rouf. 2010. Modeling and evaluation of control flow vulnerability in the Embedded System. In 18th IEEE/ACM International Symposium on Modelling, Analysis 8 Simulation of Computer and Telecommunication Systems (MASCOTS 2010). IEEE Computer Society, Los Alamitos, CA, USA, 430--433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. Kleeberger, C. Gimmler-Dumont, C. Weis, A. Herkersdorf, D. Mueller-Gritschneder, S. Nassif, U. Schlichtmann, and N. Wehn. 2013. A cross-layer technology-based study of how memory errors impact system resilience. IEEE Micro 33, 4 (July 2013), 46--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. X. Li, K. Shen, M. C. Huang, and L. Chu. 2007. A memory soft error measurement on production systems. In Proceedings of the USENIX Annual Technical Conference (ATC’07). USENIX Association, Berkeley, CA, USA, Article 21, 6 pages. http://dl.acm.org/citation.cfm?id=1364385.1364406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Mahmood and E. J. McCluskey. 1988. Concurrent error detection using watchdog processors-A survey. IEEE TC 37 (February 1988), 160--174. Issue 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Maiz, S. Hareland, K. Zhang, and P. Armstrong. 2003. Characterization of multi-bit soft error events in advanced SRAMs. In Intern. Electron Devices Meeting (IEDM’03). IEEE Press, New York, NY, USA, 21.4.1--21.4.4.Google ScholarGoogle Scholar
  24. N. Oh, P. Shirvani, and E. McCluskey. 2002. Control-flow checking by software signatures. IEEE Transactions on Reliability 51, 1 (2002), 111--122.Google ScholarGoogle ScholarCross RefCross Ref
  25. T. Santini, C. Borchert, C. Dietrich, H. Schirmeier, M. Hoffmann, O. Spinczyk, D. Lohmann, F. R. Wagner, and P. Rech. 2017. Effectiveness of software-based hardening for radiation-induced soft errors in real-time operating systems. Lecture Notes in Computer Science (LNCS) (2017), 3--15.Google ScholarGoogle Scholar
  26. U. Schiffel, A. Schmitt, M. Süßkraut, and C. Fetzer. 2010. ANB- and ANBDmem-Encoding: Detecting hardware errors in software. In 29th Int. Conf. on Comp. Safety, Reliability, and Security (SAFECOMP’10), Erwin Schoitsch (Ed.). Springer, Heidelberg, Germany, 169--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Schirmeier, C. Borchert, and O. Spinczyk. 2015. Avoiding pitfalls in fault-Injection based comparison of program susceptibility to soft errors. In 45th Int. Conf. on Dep. Systems 8 Networks (DSN’15). IEEE, Washington, DC, USA, 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Schirmeier, M. Hoffmann, C. Dietrich, M. Lenz, D. Lohmann, and O. Spinczyk. 2015. FAIL*: An open and versatile fault-injection framework for the assessment of software-implemented hardware fault tolerance. In 12th Int. Conf. on Eur. Dep. Computing Conf. (EDCC’15), Pierre Sens (Ed.). 245--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Shrivastava, A. Rhisheekesan, R. Jeyapaul, and C. J. Wu. 2014. Quantitative analysis of control flow checking mechanisms for soft errors. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC). 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. V. Sridharan, N. DeBardeleben, S. Blanchard, K. B. Ferreira, J. Stearley, J. Shalf, and S. Gurumurthi. 2015. Memory errors in modern systems: The good, the bad, and the ugly. In 20th Int. Conf. on Arch. Support for Programming Languages 8 Operating Systems (ASPLOS’15). ACM, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. I. Stilkerich, C. Lang, C. Erhardt, C. Bay, and M. Stilkerich. 2017. The perfect getaway: Using escape analysis in embedded real-time systems. ACM Trans. Embed. Comp. Syst. 16, Article 99 (2017), 99:1--99:30 pages. Issue 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. I. Stilkerich, M. Strotz, C. Erhardt, M. Hoffmann, D. Lohmann, F. Scheler, and W. Schröder-Preikschat. 2013. A JVM for soft-error-prone embedded systems. In 2013 ACM SIGPLAN/SIGBED Conf. on Languages, Compilers and Tools for Embedded Systems (LCTES’13). ACM, New York, NY, USA, 21--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Stilkerich, I. Thomm, C. Wawersich, and W. Schröder-Preikschat. 2012. Tailor-made JVMs for statically configured embedded systems. Concurrency and Computation: Practice and Experience 24, 8 (2012), 789--812. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. N. Theißing, D. Merli, M. Smola, F. Stumpf, and G. Sigl. 2013. Comprehensive analysis of software countermeasures against fault attacks. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’13). EDA Consortium, San Jose, CA, USA, 404--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. I. Thomm, M. Stilkerich, R. Kapitza, D. Lohmann, and W. Schröder-Preikschat. 2011. Automated application of fault tolerance mechanisms in a component-based system. In JTRES’11: 9th Int. W’shop on Java Technologies for real-time 8 embedded systems. ACM, New York, NY, USA, 87--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. I. Thomm, M. Stilkerich, C. Wawersich, and W. Schröder-Preikschat. 2010. KESO: An open-source multi-JVM for deeply embedded systems. In JTRES’10: 8th Int. W’shop on Java Technologies for real-time 8 embedded systems. ACM, New York, NY, USA, 109--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. P. Ulbrich, R. Kapitza, C. Harkort, R. Schmid, and W. Schröder-Preikschat. 2011. I4Copter: An adaptable and modular quadrotor platform. In 26th ACM Symp. on Applied Computing (SAC’11). ACM, New York, NY, USA, 380--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. N. J. Wang, J. Quek, T. M. Rafacz, and S. J. patel. 2004. Characterizing the effects of transient faults on a high-performance processor pipeline. In 34th Int. Conf. on Dep. Systems 8 Networks (DSN’04). IEEE, Washington, DC, USA, 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Demystifying Soft-Error Mitigation by Control-Flow Checking -- A New Perspective on its Effectiveness

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader
                About Cookies On This Site

                We use cookies to ensure that we give you the best experience on our website.

                Learn more

                Got it!