Abstract
Soft errors induced by terrestrial radiation are becoming a significant concern in architectures designed in newer technologies. If left undetected, these errors can result in catastrophic consequences or costly maintenance problems in different embedded applications. In this article, we focus on utilizing the compiler's help in duplicating instructions for error detection in VLIW datapaths. The instruction duplication mechanism is further supported by a hardware enhancement for efficient result verification, which avoids the need of additional comparison instructions. In the proposed approach, the compiler determines the instruction schedule by balancing the permissible performance degradation and the energy constraint with the required degree of duplication. Our experimental results show that our algorithms allow the designer to perform trade-off analysis between performance, reliability, and energy consumption.
- Austin, T. 1999. Diva: A reliable substrate for deep submicron microarchitecture design. In Proceedings of the 32nd Annual IEEE/ACM International Symposium on Micro-Architecture. IEEE, Los Alamitos, 196--207. Google Scholar
Digital Library
- Avizienis, A. 1973. Arithmetic algorithms for error-coded operands. IEEE Trans. Comput. C 22, 6, 567--572.Google Scholar
Digital Library
- Baze, M. and Buchner, S. 1997. Attenuation of single event induced pulses in cmos combinational logic. IEEE Trans. Nucl. Sci. 44, 6.Google Scholar
Cross Ref
- Bolchini, C. and Salice, F. 2001. A software methodology for detecting hardware faults in vliw data path. In Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems. IEEE, Los Alamitos, 170--175. Google Scholar
Digital Library
- Gomaa, M. A. and Vijaykumar, T. N. 2005. Opportunistic transient-fault detection. In Proceedings of the 32nd Annual International Symposium on Computer Architecture. ACM, New York, 172--183. Google Scholar
Digital Library
- Hazucha, P. and Svensson, C. 2000. Impact of cmos technology scaling on the atmospheric neutron soft error rate. IEEE Trans. Nucl. Sci. 47, 6, 2586--2594.Google Scholar
Cross Ref
- Holm, J. and Banerjee, P. 1992. Low cost concurrent error detection in a vliw architecture using replicated instructions. In Proceedings of the International Conference on Parallel Processing. IEEE, Los Alamitos, 192--195.Google Scholar
- Hp Nonstop Himalaya. http://nonstop.compaq.com/.Google Scholar
- Ishihara, F., Sheikh, F., and Nikolic, B. 2004. Level conversion for dual-supply systems. IEEE Trans. VLSI 12, 2, 185--195. Google Scholar
Digital Library
- Lee, C., Potkonjak, M., and Mangione-Smith, W. 1997. Media-bench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Micro-Architecture. IEEE, Los Alamitos, 330--335. Google Scholar
Digital Library
- Lo, J.-C., Thanawastien, S., Rao, T. R. N., and Nicolaidis, M. 1992. An sfs berger check prediction alu and its application to self-checking processor designs. IEEE Trans.Comput. Aid. Des. Integr. Circ. Syst. 11, 4, 525--540.Google Scholar
Digital Library
- Mendelson, A. and Suri, N. 2000. Designing high-performance and reliable superscalar architectures: The out of order reliable superscalar (o3rs) approach. In Proceedings of the International Conference on Dependable Systems and Networks. IEEE, Los Alamitos. Google Scholar
Digital Library
- Mukherjee, S. S., Kontz, M., and Reinhardt, S. K. 2002. Detailed design and evaluation of redundant multi-threading alternatives. In Proceedings of the 29th Annual International Symposium on Computer Architecture. ACM, New York, 99--110. Google Scholar
Digital Library
- Mukherjee, S. S., Weaver, C. T., Emer, J., Reinhardt, S. K., and Austin, T. 2003. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, Los Alamitos. Google Scholar
Digital Library
- Namjoo, M. and McCluskey, E. 1981. Watchdog processors and detection of malfunctions at the system level. Tech. rep. 81-17, CRC.Google Scholar
- Nicolaidis, M. 1993. Efficient implementation of self-checking adders and alus. In Proceedings of the 23th Fault Tolerant Computing Symposium.Google Scholar
Cross Ref
- Oh, N., Shirvani, P., and McCluskey, E. 2002a. Control-flow checking by software signatures. IEEE Trans. Reliab. 51, 1, 111--122.Google Scholar
Cross Ref
- Oh, N., Shirvani, P., and McCluskey, E. 2002b. Error detection by duplicated instructions in super-scalarprocessors. IEEE Trans. Reliab. 51, 1, 63--75.Google Scholar
Cross Ref
- Parashar, A., Gurumurthi, S., and Sivasubramaniam, A. 2004. A complexity-effective approach to alu bandwidth enhancement for instruction-level temporal redundancy. In Proceedings of the 31st Annual International Symposium on Computer Architecture. ACM, New York. Google Scholar
Digital Library
- Patel, J. H. and Fung, L. Y. 1982. Concurrent error detection on alu's by recomputing with shifted operands. IEEE Trans. Comput. 31, 7, 589--595. Google Scholar
Digital Library
- Ray, J., Hoe, J., and Falsafi, B. 2001. Dual use of superscalar data-path for transient-fault detection and recovery. In Proceedings of the 34th Annual IEEE/ACM International Symposium on Micro-architecture. IEEE, Los Alamitos, 214--224. Google Scholar
Digital Library
- Reinhardt, S. and Mukherjee, S. 2000. Transient fault detection via simultaneous multi-threading. In Proceedings of the 27th Annual International Symposium on Computer Architecture. ACM, New York, 25--36. Google Scholar
Digital Library
- Reis, G. A., Chang, J., Vachharajani, N., Rangan, R., and August, D. I. 2005. Swift: Software implemented fault tolerance. In Proceedings of the 3rd International Symposium on Code Generation and Optimization (CGO). ACM, New York. Google Scholar
Digital Library
- Reis, G. A., Chang, J., Vachharajani, N., Rangan, R., August, D. I., and Mukherjee, S. S. 2005. Design and evaluation of hybrid fault-detection systems. In Proceedings of the 32nd Annual International Symposium on Computer Architecture. ACM, New York, 148--159. Google Scholar
Digital Library
- Rotenberg, E. 1999. Ar-smt: A micro-architectural approach to fault tolerance in micro-processors. In Proceedings of the International Symposium on Fault-Tolerant Computing. IEEE, Los Alamitos, 84--91. Google Scholar
Digital Library
- Schuette, M. and Shen, J. 1994. Exploiting instruction-level parallelism for integrated control-flow monitoring. IEEE Trans. Comput. 43, 2, 129--140. Google Scholar
Digital Library
- Shivakumar, P., Kistler, M., Keckler, S. W., Burger, D., Alvisi, L. 2002. Modeling the effect of technology trends on the soft error rate of combinational logic. In Proceedings of the International Conference on Dependable Systems and Networks. IEEE, Los Alamitos, 389--398. Google Scholar
Digital Library
- Slegel, T. J., Averill, R. M., III Check, M. A., Giamei, B. C., Krumm, B. W., Krygowski, C. A., Li, W. H., Liptay, J. S., MacDougall, J. D., et al. 1999. IBM's S/390 G5 micro-processor design. IEEE Micro 19, 2, 12--23. Google Scholar
Digital Library
- Smolens, J., Kim, J., Hoe, J. C., and Falsafi, B. 2004. Efficient resource sharing in concurrent error detecting superscalar micro-architecture. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture (MICRO). ACM, New York. Google Scholar
Digital Library
- Sundaramoorthy, K., Purser, Z., and Rotenburg, E. 2000. Slipstream processors: Improving both performance and fault tolerance. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating systems. ACM, New York, 257--268. Google Scholar
Digital Library
- Trimaran. http://www.trimaran.org.Google Scholar
- Vijaykumar, T., Pomeranz, I., and Cheng, K. 2002. Transient-fault recovery via simultaneous multi-threading. In Proceedings of the 29th Annual International Symposium on Computer Architecture. ACM, New York, 87--98. Google Scholar
Digital Library
- Weaver, C., Emer, J., Mukherjee, S. S., and Reinhardt, S. K. 2004. Techniques to reduce the soft error rate of a high-performance micro-processor. In Proceedings of the 31st Annual International Symposium on Computer Architecture. ACM, New York, 264--275. Google Scholar
Digital Library
Index Terms
Compiler-assisted soft error detection under performance and energy constraints in embedded systems
Recommendations
SIMD-based soft error detection
CF '16: Proceedings of the ACM International Conference on Computing FrontiersSoft error rates in processors have been increasing with decreasing feature size and larger chips. Software-only solutions have been proposed to deal with this problem, for instance via instruction duplication. However, this leads to significant ...
Compiler-guided register reliability improvement against soft errors
EMSOFT '05: Proceedings of the 5th ACM international conference on Embedded softwareWith the scaling of technology, transient errors caused by external particle strikes have become a critical challenge for microprocessor design. As embedded processors are widely used in reliability-sensitive environments, it becomes increasingly ...
Dynamic code duplication with vulnerability awareness for soft error detection on VLIW architectures
Special Issue on High-Performance Embedded Architectures and CompilersSoft errors are becoming a critical concern in embedded system designs. Code duplication techniques have been proposed to increase the reliability in multi-issue embedded systems such as VLIW by exploiting empty slots for duplicated instructions. ...






Comments