Abstract
Fault tolerance rapidly evolves into one of the most significant design objectives for embedded systems due to reduced semiconductor structures and supply voltages. However, resource-constrained systems cannot afford traditional error correction for overhead and cost reasons. New methods are required to sustain acceptable service quality in case of errors while avoiding crashes.
We present a flexible fault-tolerance approach that is able to select correction actions depending on error semantics using application annotations and static analysis approaches. We verify the validity of our approach by analyzing the vulnerability and improving the reliability of an H.264 decoder using flexible error handling.
- Afonso, F., Silva, C. A., Tavares, A., and Montenegro, S. 2008. Application-level fault tolerance in real-time embedded systems. In Proceedings of SIES'08. IEEE, 126--133. Google Scholar
Digital Library
- Benso, A., Chiusano, S., and Prinetto, P. 2000. A software development kit for dependable applications in embedded systems. In Proceedings of International Test Conference. IEEE, 170--178. Google Scholar
Digital Library
- Carter, N. P., Naeimi, H., and Gardner, D. S. 2010. Design techniques for cross-layer resilience. In Proceedings of DATE'10. IEEE, 1023--1028. Google Scholar
Digital Library
- Chalin, P. 2006. Towards Support for Non-null Types and Non-null-by-default in Java. In Proceedings of the 8th Workshop on Formal Techniques for Java-like Programs (FTfJP'06).Google Scholar
- Chin, B., Markstrum, S., and Millstein, T. 2005. Semantic type qualifiers. In Proceedings of PLDI'05. ACM, New York, 85--95. Google Scholar
Digital Library
- Engel, M., Schmoll, F., Heinig, A., and Marwedel, P. 2011. Unreliable yet useful—Reliability annotations for data in cyber-physical systems. In Proceedings of WS4C.Google Scholar
- Foster, J. S., Fähndrich, M., and Aiken, A. 1999. A theory of type qualifiers. In Proceedings of PLDI'99. ACM, New York, NY, 192--203. Google Scholar
Digital Library
- Graham, S. L., Kessler, P. B., and McKusick, M. K. 2004. GPROF: A call graph execution profiler. SIGPLAN Notices 39, 4, 49--57. Google Scholar
Digital Library
- Heinig, A., Engel, M., Schmoll, F., and Marwedel, P. 2010. Improving transient memory fault resilience of an H.264 decoder. In Proceedings of ESTIMedia'10. IEEE, 121--130.Google Scholar
- ICD. 2008. ICD-C Compiler framework. http://www.icd.de/es/icd-c/icd-c.html.Google Scholar
- ITRS. 2009. International Technology Roadmap for Semiconductors, 2009 Edi., Executive Summary. http://www.itrs.net/Links/2009ITRS/2009Chapters_2009Tables/2009_ExecSum.pdf.Google Scholar
- Kwak, S. W., Choi, B. J., and Kim, B. K. 2000. Checkpointing strategy for multiple real-time tasks. In Proceedings of RTCSA'00. IEEE, 517--521. Google Scholar
Digital Library
- Lee, K., Shrivastava, A., Issenin, I., Dutt, N., and Venkatasubramanian, N. 2006. Mitigating soft error failures for multimedia applications by selective data protection. In Proceedings of CASES'06. ACM, New York, 411--420. Google Scholar
Digital Library
- Li, C.-C. J., Stewart, E. M., and Fuchs, W. K. 1994. Compiler-assisted full checkpointing. Softw. Pract. Exper. 24, 10, 871--886. Google Scholar
Digital Library
- Li, X. and Yeung, D. 2007. Application-level correctness and its impact on fault tolerance. In Proceedings of the 13th International Symposium on High Performance Computer Architecture. IEEE, 181--192. Google Scholar
Digital Library
- Lyons, R. E. and Vanderkulk, W. 1962. The use of triple-modular redundancy to improve computer reliability. IBM J. Res. Devel. 6, 2, 200--209. Google Scholar
Digital Library
- Mehrara, M. and Austin, T. 2008. Exploiting selective placement for low-cost memory protection. ACM Trans. Architect. Code Optimiz. 5, 3, 1--24. Google Scholar
Digital Library
- Mitra, S., Brelsford, K., and Sanda, P. N. 2010. Cross-layer resilience challenges: Metrics and optimization. In Proceedings of DATE'10. IEEE, DC, 1029--1034. Google Scholar
Digital Library
- Perry, F., Mackey, L., Reis, G. A., Ligatti, J., August, D. I., and Walker, D. 2007. Fault-tolerant typed assembly language. SIGPLAN Notices 42, 42--53. Google Scholar
Digital Library
- Polian, I., Becker, B., Nakasato, M., Ohtake, S., and Fujiwara, H. 2006. Low-cost hardening of image processing applications against soft errors. In Proceedings of DFT'06. IEEE, 274--279. Google Scholar
Digital Library
- Polian, I., Reddy, S. M., Pomeranz, I., Tang, X., and Becker, B. 2008. No free lunch in soft error protection? In Proceedings of the 2nd Workshop on Dependable and Secure Nanocomputing. IEEE.Google Scholar
- Pradhan, D. K. and Vaidya, N. H. 1994. Roll-forward checkpointing scheme: A novel fault-tolerant architecture. IEEE Trans. Comput. 43, 10, 1163--1174. Google Scholar
Digital Library
- Punnekkat, S., Burns, A., and Davis, R. 2001. Analysis of checkpointing for real-time systems. Real-Time Syst. 20, 1, 83--102. Google Scholar
Digital Library
- Richardson, P., Sieh, L., and Elkateeb, A. 2001. Fault-tolerant adaptive scheduling for embedded real-time systems. Micro, IEEE 21, 5, 41--51. Google Scholar
Digital Library
- Sampson, A., Dietl, W., Fortuna, E., Gnanapragasam, D., and Grossma, L. C. D. 2011. EnerJ: Approximate data types for safe and general low-power computation. In Proceedings of PLDI'11. ACM, New York, 164--174. Google Scholar
Digital Library
- Sundaram, A., Aakel, A., Lockhart, D., Thaker, D., and Franklin, D. 2008. Efficient fault tolerance in multi-media applications through selective instruction replication. In Proceedings of WREFT'08. ACM, New York, 339--346. Google Scholar
Digital Library
- Synopsys. 2012. CoMET, Virtual Prototyping Solution. http://www.synopsys.com.Google Scholar
- Zhang, Y. and Chakrabarty, K. 2003. Fault recovery based on checkpointing for hard real-time embedded systems. In Proceedings of DFT'03. IEEE, 320--327. Google Scholar
Digital Library
Index Terms
Improving the fault resilience of an H.264 decoder using static analysis methods
Recommendations
Susceptible Workload Evaluation and Protection using Selective Fault Tolerance
Low power fault tolerance design techniques trade reliability to reduce the area cost and the power overhead of integrated circuits by protecting only a subset of their workload or their most vulnerable parts. However, in the presence of faults not all ...
Systematic t-Error Correcting/All Unidirectional Error Detecting Codes
In this paper we give methods for the construction of systematic t-random error correcting and all unidirectional error detecting codes. Also we give the encoding/decoding algorithms and discuss their implementation.
Reliability Measure of Hardware Redundancy Fault-Tolerant Digital Systems with Intermittent Faults
While significant results are available which allow estimation of reliability measure for systems with permanent faults, no generally applicable results are available for intermittent (transient) faults. Methods are presented here which allow ...






Comments