Abstract
To avoid catastrophic events like unrecoverable system failures on mobile and embedded systems caused by soft-errors, software-based error detection and compensation techniques have been proposed. Methods like error-correction codes or redundant execution can offer high flexibility and allow for application-specific fault-tolerance selection without the needs of special hardware supports. However, such software-based approaches may lead to system overload due to the execution time overhead. An adaptive deployment of such techniques to meet both application requirements and system constraints is desired. From our case study, we observe that a control task can tolerate limited errors with acceptable performance loss. Such tolerance can be modeled as a (m,k) constraint which requires at least m correct runs out of any k consecutive runs to be correct. In this paper, we discuss how a given (m,k) constraint can be satisfied by adopting patterns of task instances with individual error detection and compensation capabilities. We introduce static strategies and provide a formal feasibility analysis for validation. Furthermore, we develop an adaptive scheme that extends our initial approach with online awareness that increases efficiency while preserving analysis results. The effectiveness of our method is shown in a real-world case study as well as for synthesized task sets.
- R. C. Baumann. Radiation-induced soft errors in advanced semiconductor technologies. IEEE Transactions on Device and Materials Reliability, 5(3):305–316, Sept 2005.Google Scholar
- J. S. Hu, F. Li, V. Degalahal, M. Kandemir, N. Vijaykrishnan, and M. J. Irwin. Compiler-directed instruction duplication for soft error detection. In Design, Automation and Test in Europe, 2005. Proceedings, pages 1056–1057 Vol. 2, March 2005. Google Scholar
Digital Library
- N. Oh, P. P. Shirvani, and E. J. McCluskey. Error detection by duplicated instructions in super-scalar processors. IEEE Transactions on Reliability, 51(1):63–75, Mar 2002.Google Scholar
Cross Ref
- S. Rehman, M. Shafique, P. V. Aceituno, F. Kriebel, J. J. Chen, and J. Henkel. Leveraging variable function resilience for selective software reliability on unreliable hardware. In Design, Automation Test in Europe Conference Exhibition (DATE), 2013, pages 1759–1764, March 2013. Google Scholar
Digital Library
- D. Zhu, H. Aydin, and J. J. Chen. Optimistic reliability aware energy management for real-time tasks with probabilistic execution times. In Real-Time Systems Symposium, 2008, pages 313–322, Nov 2008. Google Scholar
Digital Library
- B. Nicolescu, R. Velazco, M. Sonza-Reorda, M. Rebaudengo, and M. Violante. A software fault tolerance method for safety-critical systems: effectiveness and drawbacks. In Integrated Circuits and Systems Design, 2002. Proceedings. 15th Symposium on, pages 101– 106, 2002. Google Scholar
Digital Library
- Parameswaran Ramanathan. Overload management in real-time control applications using m,k $(m,k)$-firm guarantee. IEEE Trans. Parallel Distrib. Syst., 10(6):549–559, June 1999. Google Scholar
Digital Library
- P. Kumar, D. Goswami, S. Chakraborty, A. Annaswamy, K. Lampka, and L. Thiele. A hybrid approach to cyber-physical systems verification. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE, pages 688–696, June 2012. Google Scholar
Digital Library
- E. Henriksson, H. Sandberg, and K. H. Johansson. Predictive compensation for communication outages in networked control systems. In Decision and Control, 2008. CDC 2008. 47th IEEE Conference on, pages 2063–2068, Dec 2008.Google Scholar
Cross Ref
- T. Bund and F. Slomka. Sensitivity analysis of dropped samples for performance-oriented controller design. In Real-Time Distributed Computing (ISORC), 2015 IEEE 18th International Symposium on, pages 244–251, April 2015. Google Scholar
Digital Library
- A. K. Mok and D. Chen. A multiframe model for real-time tasks. In Real-Time Systems Symposium, 1996., 17th IEEE, pages 22–29, Dec 1996. Google Scholar
Digital Library
- Ute Schiffel, Martin Süßkraut, and Christof Fetzer. An-encoding compiler: Building safety-critical systems with commodity hardware. In SAFECOMP ’09: Proceedings of the 28th International Conference on Computer Safety, Reliability, and Security, pages 283–296, Berlin, Heidelberg, 2009. Springer-Verlag. Google Scholar
Digital Library
- George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, and David I. August. Swift: Software implemented fault tolerance. Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 0:243–254, 2005. Google Scholar
Digital Library
- J. Chang, G. A. Reis, and D. I. August. Automatic instruction-level software-only recovery. In Dependable Systems and Networks, 2006. DSN 2006. International Conference on, pages 83–92, June 2006. Google Scholar
Digital Library
- Michael Engel, Florian Schmoll, Andreas Heinig, and Peter Marwedel. Unreliable yet useful – reliability annotations for data in cyberphysical systems. In Proceedings of the 2011 Workshop on Software Language Engineering for Cyber-physical Systems (WS4C), Berlin / Germany, oct 2011.Google Scholar
- Ayswarya Sundaram, Ameen Aakel, Derek Lockhart, Darshan Thaker, and Diana Franklin. Efficient fault tolerance in multi-media applications through selective instruction replication. In Proceedings of the 2008 Workshop on Radiation Effects and Fault Tolerance in Nanometer Technologies, WREFT ’08, pages 339–346, New York, NY, USA, 2008. ACM. Google Scholar
Digital Library
- C. L. Liu and James W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM, 20(1):46–61, January 1973. Google Scholar
Digital Library
- J. Lehoczky, L. Sha, and Y. Ding. The rate monotonic scheduling algorithm: exact characterization and average case behavior. In Real Time Systems Symposium, 1989., Proceedings., pages 166–171, Dec 1989.Google Scholar
Cross Ref
- Gang Quan and Xiaobo Hu. Enhanced fixed-priority scheduling with (m,k)-firm guarantee. In Proceedings of the 21st IEEE Conference on Real-time Systems Symposium, RTSS’10, pages 79–88, Washington, DC, USA, 2000. IEEE Computer Society. Google Scholar
Digital Library
- Linwei Niu and Gang Quan. Energy minimization for real-time systems with (m,k)-guarantee. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(7):717–729, July 2006. Google Scholar
Digital Library
- Y.Yamamoto. Two wheeled self-balancing r/c robot controlled with a hitechnic gyro sensor, 2010.Google Scholar
- Enrico Bini and Giorgio C. Buttazzo. Measuring the performance of schedulability tests. Real-Time Syst., 30(1-2):129–154, May 2005. Google Scholar
Digital Library
- R. I. Davis, A. Zabos, and A. Burns. Efficient exact schedulability tests for fixed priority real-time systems. IEEE Transactions on Computers, 57(9):1261–1276, Sept 2008. Google Scholar
Digital Library
- George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August, and Shubhendu S. Mukherjee. Software-controlled fault tolerance. ACM Trans. Archit. Code Optim., 2(4):366–396, December 2005. Google Scholar
Digital Library
Index Terms
Compensate or ignore? meeting control robustness requirements through adaptive soft-error handling
Recommendations
Compensate or ignore? meeting control robustness requirements through adaptive soft-error handling
LCTES 2016: Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded SystemsTo avoid catastrophic events like unrecoverable system failures on mobile and embedded systems caused by soft-errors, software-based error detection and compensation techniques have been proposed. Methods like error-correction codes or redundant ...
Matching Detection and Correction Schemes for Soft Error Handling in Sequential Logic
DSD '15: Proceedings of the 2015 Euromicro Conference on Digital System DesignThis paper addresses two common problems of soft error handling schemes for sequential logic. The first issue is a race condition between the correction of soft errors and their propagation to following stages. The second issue concerns erroneous write ...
Handling Soft Error in Embedded Software for Networking System
ISSREW '14: Proceedings of the 2014 IEEE International Symposium on Software Reliability Engineering WorkshopsSingle event upset (SEU) is a well known and documented phenomenon that affects electronic circuitry. These events are caused by either atmospheric neutrons or alpha particles emitted by trace impurities in the silicon processing and packaging ...







Comments