Abstract
Soft errors are a challenging and urging problem in the domain of safety-critical embedded systems. For decades, checking schemes have been investigated and improved to mitigate soft-error effects for the class of control-flow faults, with current industrial standards strongly recommending their use.
However, reality looks different: Taking a systems perspective, we implemented four representative Control-Flow Checking (CFC) schemes and put them through their paces in 396 fault-injection campaigns. In contrast to previous work, which typically relied on probability-based vulnerability metrics, we accounted for the influence of memory and time overheads on the fault-space dimensions and applied those in full-scan fault injections. This change in procedure alone severely degraded the perceived effectiveness of CFC.
In addition, we expanded the perspective to data-flow faults and their influence on the overall susceptibility, an aspect that so far has been largely ignored. Our results suggest that, without accompanying measures, any improvement regarding control-flow faults is dominated by the increase in data faults caused by the increased attack surface in terms of memory and runtime overhead. Moreover, CFC performance less depended on the detection capabilities than on general aspects of the concrete binary compilation and execution.
In conclusion, incorporating CFC is not as straightforward as often assumed and the vulnerability of systems with hardened control-flow may in many cases even be increased by the schemes themselves.
- R. Alexandersson and J. Karlsson. 2011. Fault injection-based assessment of aspect-oriented implementation of fault tolerance. In 2011 IEEE/IFIP 41st International Conference on Dependable Systems Networks (DSN). 303--314. Google Scholar
Digital Library
- Z. Alkhalifa, V. S. S. Nair, N. Krishnamurthy, and J. A. Abraham. 1999. Design and evaluation of system-level checks for on-line control flow error detection. IEEE Trans. Parallel Distrib. Syst. 10, 6 (June 1999), 627--641. Google Scholar
Digital Library
- S. A. Asghari, H. Taheri, H. Pedram, and O. Kaynak. 2014. Software-Based control flow checking against transient faults in Industrial Environments. IEEE Transactions on Industrial Informatics 10, 1 (Feb. 2014), 481--490.Google Scholar
Cross Ref
- R. Baumann. 2005. Soft errors in advanced computer systems. IEEE Design Test of Computers 22, 3 (May 2005), 258--266. Google Scholar
Digital Library
- S. Y. Borkar. 2005. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro 25, 6 (2005), 10--16. Google Scholar
Digital Library
- P. Cheynet, B. Nicolescu, R. Velazco, M. Rebaudengo, M. Sonza Reorda, and M. Violante. 2000. Experimentally evaluating an automatic approach for generating safety-critical software with respect to transient errors. IEEE Transactions on Nuclear Science 47 (2000), 2231--2236.Google Scholar
Cross Ref
- J.-D. Choi, M. Gupta, M. J. Serrano, V. C. Sreedhar, and S. P. Midkiff. 2003. Stack allocation and synchronization optimizations for java using escape analysis. ACM Trans. Program. Lang. Syst. 25, 6 (Nov. 2003), 876--910. Google Scholar
Digital Library
- C. Dietrich, M. Hoffmann, and D. Lohmann. 2017. Global optimization of fixed-Priority real-Time systems by RTOS-Aware control-Flow analysis. ACM Trans. Embed. Comput. Syst. 16, 2 (Jan. 2017), 35:1--35:25. Google Scholar
Digital Library
- R. Feldt and A. Magazinius. 2010. Validity threats in empirical software engineering research-An Initial Survey. In SEKE. 374--379.Google Scholar
- R. R. Ferreira, R. B. Parizi, L. Carro, and Á. F. Moreira. 2013. Compiler optimizations impact the reliability of the control-Flow of radiation-Hardened software. Journal of Aerospace Technology and Management 5, 3 (Aug. 2013), 323--334.Google Scholar
Cross Ref
- P. Forin. 1989. Vital coded microprocessor principles and application for various transit systems. In Symp. on Control, Computers, Communication in Transportation (CCCT’89). 79--84.Google Scholar
- P. Gawkowski, J. Sosnowski, and B. Radko. 2005. Analyzing the effectiveness of fault hardening procedures. In 11th IEEE International On-Line Testing Symposium. 14--19. Google Scholar
Digital Library
- O. Goloubeva, M. Rebaudengo, M. S. Reorda, and M. Violante. 2003. Soft-error detection using control flow assertions. In 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 2003. Proceedings. 581--588. Google Scholar
Digital Library
- O. Goloubeva, M. Rebaudengo, M. S. Reorda, and M. Violante. 2005. Improved software-based processor control-flow errors detection technique. In Annual Reliability and Maintainability Symposium, 2005. Proceedings. 583--589.Google Scholar
- O. Goloubeva, M. Rebaudengo, M. S. Reorda, and M. Violante. 2006. Software-Implemented Hardware Fault Tolerance. Springer US. Google Scholar
Digital Library
- R. W. Hamming. 1950. Error detecting and error correcting codes. Bell System Technical Journal 29, 2 (1950), 147--160.Google Scholar
Cross Ref
- F. Irom and D. Nguyen. 2007. IEEE Transactions on Nuclear Science 54, 6 (Dec 2007), 2547--2553.Google Scholar
- ISO 26262-9. 2011. ISO 26262-9:2011: Road vehicles -- Functional safety -- Part 9: Automotive Safety Integrity Level (ASIL)-oriented and safety-oriented analyses. ISO, Geneva, Switzerland.Google Scholar
- S. Kim and M. A. Rouf. 2010. Modeling and evaluation of control flow vulnerability in the Embedded System. In 18th IEEE/ACM International Symposium on Modelling, Analysis 8 Simulation of Computer and Telecommunication Systems (MASCOTS 2010). IEEE Computer Society, Los Alamitos, CA, USA, 430--433. Google Scholar
Digital Library
- V. Kleeberger, C. Gimmler-Dumont, C. Weis, A. Herkersdorf, D. Mueller-Gritschneder, S. Nassif, U. Schlichtmann, and N. Wehn. 2013. A cross-layer technology-based study of how memory errors impact system resilience. IEEE Micro 33, 4 (July 2013), 46--55. Google Scholar
Digital Library
- X. Li, K. Shen, M. C. Huang, and L. Chu. 2007. A memory soft error measurement on production systems. In Proceedings of the USENIX Annual Technical Conference (ATC’07). USENIX Association, Berkeley, CA, USA, Article 21, 6 pages. http://dl.acm.org/citation.cfm?id=1364385.1364406. Google Scholar
Digital Library
- A. Mahmood and E. J. McCluskey. 1988. Concurrent error detection using watchdog processors-A survey. IEEE TC 37 (February 1988), 160--174. Issue 2. Google Scholar
Digital Library
- J. Maiz, S. Hareland, K. Zhang, and P. Armstrong. 2003. Characterization of multi-bit soft error events in advanced SRAMs. In Intern. Electron Devices Meeting (IEDM’03). IEEE Press, New York, NY, USA, 21.4.1--21.4.4.Google Scholar
- N. Oh, P. Shirvani, and E. McCluskey. 2002. Control-flow checking by software signatures. IEEE Transactions on Reliability 51, 1 (2002), 111--122.Google Scholar
Cross Ref
- T. Santini, C. Borchert, C. Dietrich, H. Schirmeier, M. Hoffmann, O. Spinczyk, D. Lohmann, F. R. Wagner, and P. Rech. 2017. Effectiveness of software-based hardening for radiation-induced soft errors in real-time operating systems. Lecture Notes in Computer Science (LNCS) (2017), 3--15.Google Scholar
- U. Schiffel, A. Schmitt, M. Süßkraut, and C. Fetzer. 2010. ANB- and ANBDmem-Encoding: Detecting hardware errors in software. In 29th Int. Conf. on Comp. Safety, Reliability, and Security (SAFECOMP’10), Erwin Schoitsch (Ed.). Springer, Heidelberg, Germany, 169--182. Google Scholar
Digital Library
- H. Schirmeier, C. Borchert, and O. Spinczyk. 2015. Avoiding pitfalls in fault-Injection based comparison of program susceptibility to soft errors. In 45th Int. Conf. on Dep. Systems 8 Networks (DSN’15). IEEE, Washington, DC, USA, 12. Google Scholar
Digital Library
- H. Schirmeier, M. Hoffmann, C. Dietrich, M. Lenz, D. Lohmann, and O. Spinczyk. 2015. FAIL*: An open and versatile fault-injection framework for the assessment of software-implemented hardware fault tolerance. In 12th Int. Conf. on Eur. Dep. Computing Conf. (EDCC’15), Pierre Sens (Ed.). 245--255. Google Scholar
Digital Library
- A. Shrivastava, A. Rhisheekesan, R. Jeyapaul, and C. J. Wu. 2014. Quantitative analysis of control flow checking mechanisms for soft errors. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC). 1--6. Google Scholar
Digital Library
- V. Sridharan, N. DeBardeleben, S. Blanchard, K. B. Ferreira, J. Stearley, J. Shalf, and S. Gurumurthi. 2015. Memory errors in modern systems: The good, the bad, and the ugly. In 20th Int. Conf. on Arch. Support for Programming Languages 8 Operating Systems (ASPLOS’15). ACM, New York, NY, USA. Google Scholar
Digital Library
- I. Stilkerich, C. Lang, C. Erhardt, C. Bay, and M. Stilkerich. 2017. The perfect getaway: Using escape analysis in embedded real-time systems. ACM Trans. Embed. Comp. Syst. 16, Article 99 (2017), 99:1--99:30 pages. Issue 4. Google Scholar
Digital Library
- I. Stilkerich, M. Strotz, C. Erhardt, M. Hoffmann, D. Lohmann, F. Scheler, and W. Schröder-Preikschat. 2013. A JVM for soft-error-prone embedded systems. In 2013 ACM SIGPLAN/SIGBED Conf. on Languages, Compilers and Tools for Embedded Systems (LCTES’13). ACM, New York, NY, USA, 21--32. Google Scholar
Digital Library
- M. Stilkerich, I. Thomm, C. Wawersich, and W. Schröder-Preikschat. 2012. Tailor-made JVMs for statically configured embedded systems. Concurrency and Computation: Practice and Experience 24, 8 (2012), 789--812. Google Scholar
Digital Library
- N. Theißing, D. Merli, M. Smola, F. Stumpf, and G. Sigl. 2013. Comprehensive analysis of software countermeasures against fault attacks. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’13). EDA Consortium, San Jose, CA, USA, 404--409. Google Scholar
Digital Library
- I. Thomm, M. Stilkerich, R. Kapitza, D. Lohmann, and W. Schröder-Preikschat. 2011. Automated application of fault tolerance mechanisms in a component-based system. In JTRES’11: 9th Int. W’shop on Java Technologies for real-time 8 embedded systems. ACM, New York, NY, USA, 87--95. Google Scholar
Digital Library
- I. Thomm, M. Stilkerich, C. Wawersich, and W. Schröder-Preikschat. 2010. KESO: An open-source multi-JVM for deeply embedded systems. In JTRES’10: 8th Int. W’shop on Java Technologies for real-time 8 embedded systems. ACM, New York, NY, USA, 109--119. Google Scholar
Digital Library
- P. Ulbrich, R. Kapitza, C. Harkort, R. Schmid, and W. Schröder-Preikschat. 2011. I4Copter: An adaptable and modular quadrotor platform. In 26th ACM Symp. on Applied Computing (SAC’11). ACM, New York, NY, USA, 380--396. Google Scholar
Digital Library
- N. J. Wang, J. Quek, T. M. Rafacz, and S. J. patel. 2004. Characterizing the effects of transient faults on a high-performance processor pipeline. In 34th Int. Conf. on Dep. Systems 8 Networks (DSN’04). IEEE, Washington, DC, USA, 61--70. Google Scholar
Digital Library
Index Terms
Demystifying Soft-Error Mitigation by Control-Flow Checking -- A New Perspective on its Effectiveness
Recommendations
A Practitioner's Guide to Software-Based Soft-Error Mitigation Using AN-Codes
HASE '14: Proceedings of the 2014 IEEE 15th International Symposium on High-Assurance Systems EngineeringArithmetic error coding schemes (AN codes) are a well known and effective technique for soft error mitigation. Although coding theory being a rich area of mathematics, their implementation seems to be fairly easy. However, compliance with the theory can ...
A survey of circuit-level soft error mitigation methodologies
Soft errors created due to propagation of single event transients are a significant reliability challenge in modern VLSI. With advances in CMOS technology scaling, circuits become increasingly more sensitive to transient pulses caused by energetic ...
Control-Flow Checking Using Branch Instructions
EUC '08: Proceedings of the 2008 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing - Volume 01This paper presents a hardware control-flow checking scheme for RISC processor-based systems. This Scheme combines two error detection mechanisms to provide high coverage. The first mechanism uses parity bits to detect faults occurring in the opcodes ...






Comments