skip to main content
research-article

Compiler-assisted soft error detection under performance and energy constraints in embedded systems

Authors Info & Claims
Published:24 July 2009Publication History
Skip Abstract Section

Abstract

Soft errors induced by terrestrial radiation are becoming a significant concern in architectures designed in newer technologies. If left undetected, these errors can result in catastrophic consequences or costly maintenance problems in different embedded applications. In this article, we focus on utilizing the compiler's help in duplicating instructions for error detection in VLIW datapaths. The instruction duplication mechanism is further supported by a hardware enhancement for efficient result verification, which avoids the need of additional comparison instructions. In the proposed approach, the compiler determines the instruction schedule by balancing the permissible performance degradation and the energy constraint with the required degree of duplication. Our experimental results show that our algorithms allow the designer to perform trade-off analysis between performance, reliability, and energy consumption.

References

  1. Austin, T. 1999. Diva: A reliable substrate for deep submicron microarchitecture design. In Proceedings of the 32nd Annual IEEE/ACM International Symposium on Micro-Architecture. IEEE, Los Alamitos, 196--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Avizienis, A. 1973. Arithmetic algorithms for error-coded operands. IEEE Trans. Comput. C 22, 6, 567--572.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Baze, M. and Buchner, S. 1997. Attenuation of single event induced pulses in cmos combinational logic. IEEE Trans. Nucl. Sci. 44, 6.Google ScholarGoogle ScholarCross RefCross Ref
  4. Bolchini, C. and Salice, F. 2001. A software methodology for detecting hardware faults in vliw data path. In Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems. IEEE, Los Alamitos, 170--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gomaa, M. A. and Vijaykumar, T. N. 2005. Opportunistic transient-fault detection. In Proceedings of the 32nd Annual International Symposium on Computer Architecture. ACM, New York, 172--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hazucha, P. and Svensson, C. 2000. Impact of cmos technology scaling on the atmospheric neutron soft error rate. IEEE Trans. Nucl. Sci. 47, 6, 2586--2594.Google ScholarGoogle ScholarCross RefCross Ref
  7. Holm, J. and Banerjee, P. 1992. Low cost concurrent error detection in a vliw architecture using replicated instructions. In Proceedings of the International Conference on Parallel Processing. IEEE, Los Alamitos, 192--195.Google ScholarGoogle Scholar
  8. Hp Nonstop Himalaya. http://nonstop.compaq.com/.Google ScholarGoogle Scholar
  9. Ishihara, F., Sheikh, F., and Nikolic, B. 2004. Level conversion for dual-supply systems. IEEE Trans. VLSI 12, 2, 185--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Lee, C., Potkonjak, M., and Mangione-Smith, W. 1997. Media-bench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Micro-Architecture. IEEE, Los Alamitos, 330--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lo, J.-C., Thanawastien, S., Rao, T. R. N., and Nicolaidis, M. 1992. An sfs berger check prediction alu and its application to self-checking processor designs. IEEE Trans.Comput. Aid. Des. Integr. Circ. Syst. 11, 4, 525--540.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mendelson, A. and Suri, N. 2000. Designing high-performance and reliable superscalar architectures: The out of order reliable superscalar (o3rs) approach. In Proceedings of the International Conference on Dependable Systems and Networks. IEEE, Los Alamitos. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mukherjee, S. S., Kontz, M., and Reinhardt, S. K. 2002. Detailed design and evaluation of redundant multi-threading alternatives. In Proceedings of the 29th Annual International Symposium on Computer Architecture. ACM, New York, 99--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mukherjee, S. S., Weaver, C. T., Emer, J., Reinhardt, S. K., and Austin, T. 2003. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, Los Alamitos. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Namjoo, M. and McCluskey, E. 1981. Watchdog processors and detection of malfunctions at the system level. Tech. rep. 81-17, CRC.Google ScholarGoogle Scholar
  16. Nicolaidis, M. 1993. Efficient implementation of self-checking adders and alus. In Proceedings of the 23th Fault Tolerant Computing Symposium.Google ScholarGoogle ScholarCross RefCross Ref
  17. Oh, N., Shirvani, P., and McCluskey, E. 2002a. Control-flow checking by software signatures. IEEE Trans. Reliab. 51, 1, 111--122.Google ScholarGoogle ScholarCross RefCross Ref
  18. Oh, N., Shirvani, P., and McCluskey, E. 2002b. Error detection by duplicated instructions in super-scalarprocessors. IEEE Trans. Reliab. 51, 1, 63--75.Google ScholarGoogle ScholarCross RefCross Ref
  19. Parashar, A., Gurumurthi, S., and Sivasubramaniam, A. 2004. A complexity-effective approach to alu bandwidth enhancement for instruction-level temporal redundancy. In Proceedings of the 31st Annual International Symposium on Computer Architecture. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Patel, J. H. and Fung, L. Y. 1982. Concurrent error detection on alu's by recomputing with shifted operands. IEEE Trans. Comput. 31, 7, 589--595. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ray, J., Hoe, J., and Falsafi, B. 2001. Dual use of superscalar data-path for transient-fault detection and recovery. In Proceedings of the 34th Annual IEEE/ACM International Symposium on Micro-architecture. IEEE, Los Alamitos, 214--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Reinhardt, S. and Mukherjee, S. 2000. Transient fault detection via simultaneous multi-threading. In Proceedings of the 27th Annual International Symposium on Computer Architecture. ACM, New York, 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Reis, G. A., Chang, J., Vachharajani, N., Rangan, R., and August, D. I. 2005. Swift: Software implemented fault tolerance. In Proceedings of the 3rd International Symposium on Code Generation and Optimization (CGO). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Reis, G. A., Chang, J., Vachharajani, N., Rangan, R., August, D. I., and Mukherjee, S. S. 2005. Design and evaluation of hybrid fault-detection systems. In Proceedings of the 32nd Annual International Symposium on Computer Architecture. ACM, New York, 148--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rotenberg, E. 1999. Ar-smt: A micro-architectural approach to fault tolerance in micro-processors. In Proceedings of the International Symposium on Fault-Tolerant Computing. IEEE, Los Alamitos, 84--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Schuette, M. and Shen, J. 1994. Exploiting instruction-level parallelism for integrated control-flow monitoring. IEEE Trans. Comput. 43, 2, 129--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Shivakumar, P., Kistler, M., Keckler, S. W., Burger, D., Alvisi, L. 2002. Modeling the effect of technology trends on the soft error rate of combinational logic. In Proceedings of the International Conference on Dependable Systems and Networks. IEEE, Los Alamitos, 389--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Slegel, T. J., Averill, R. M., III Check, M. A., Giamei, B. C., Krumm, B. W., Krygowski, C. A., Li, W. H., Liptay, J. S., MacDougall, J. D., et al. 1999. IBM's S/390 G5 micro-processor design. IEEE Micro 19, 2, 12--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Smolens, J., Kim, J., Hoe, J. C., and Falsafi, B. 2004. Efficient resource sharing in concurrent error detecting superscalar micro-architecture. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture (MICRO). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sundaramoorthy, K., Purser, Z., and Rotenburg, E. 2000. Slipstream processors: Improving both performance and fault tolerance. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating systems. ACM, New York, 257--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Trimaran. http://www.trimaran.org.Google ScholarGoogle Scholar
  32. Vijaykumar, T., Pomeranz, I., and Cheng, K. 2002. Transient-fault recovery via simultaneous multi-threading. In Proceedings of the 29th Annual International Symposium on Computer Architecture. ACM, New York, 87--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Weaver, C., Emer, J., Mukherjee, S. S., and Reinhardt, S. K. 2004. Techniques to reduce the soft error rate of a high-performance micro-processor. In Proceedings of the 31st Annual International Symposium on Computer Architecture. ACM, New York, 264--275. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Compiler-assisted soft error detection under performance and energy constraints in embedded systems

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!