ABSTRACT
Transient faults are expected a be a major design consideration in future microprocessors. Recent proposals for transient fault detection in processor cores have revolved around the idea of redundant threading, which involves redundant execution of a program across multiple execution contexts. This paper presents a new approach to redundant threading by bringing together the concepts of slice-level execution and value and control-flow locality into a novel partial redundant threading mechanism called SlicK.The purpose of redundant execution is to check the integrity of the outputs propagating out of the core (typically through stores). SlicK implements redundancy at the granularity of backward-slices of these output instructions and exploits value and control-flow locality to avoid redundantly executing slices that lead to predictable outputs, thereby avoiding redundant execution of a significant fraction of instructions while maintaining extremely low vulnerabilities for critical processor structures.We propose the microarchitecture of a backward-slice extractor called SliceEM that is able to identify backward slices without interrupting the instruction flow, and show how this extractor and a set of predictors can be integrated into a redundant threading mechanism to form SlicK. Detailed simulations with SPEC CPU2000 benchmarks show that SlicK can provide around 10.2% performance improvement over a well known redundant threading mechanism, buying back over 50% of the loss suffered due to redundant execution. SlicK can keep the Architectural Vulnerability Factors of processor structures to typically 0%-2%. More importantly, SlicK's slice-based mechanisms provide future opportunities for exploring interesting points in the performance-reliability design space based on market segment needs.
- T. Austin. DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design. In Proceedings of the International ymposium on Microarchitecture (MICRO), pages 196--207, November 1999. Google Scholar
Digital Library
- M. Brown, J. Stark, and Y. Patt. Select-Free Instruction Scheduling Logic. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 204--213, December 2001. Google Scholar
Digital Library
- D. Burger and T. Austin. The SimpleScalar Toolset, Version 3.0. http://www.simplescalar.com.Google Scholar
- M. Burtscher. An Improved Index Function for (D)FCM Predictors. ACM SIGARCH Computer Architecture News, 30(3):19--24, June 2002. Google Scholar
Digital Library
- J. Collins, D. Tullsen, H. Wang, and J. Shen. Dynamic Speculative Precomputation. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 306--317, December 2001. Google Scholar
Digital Library
- E. Duesterwald, R. Gupta, and M.L. Soffa. Distributed slicing and partial re-execution for distributed programs. In Languages and Compilers for Parallel Computing, pages 497--511, 1992. Google Scholar
Digital Library
- M.A. Gomaa and T.N. Vijaykumar. Opportunistic transient-fault detection. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 172--183, 2005. Google Scholar
Digital Library
- D. Grunwald, A. Klauser, S. Manne, and A.R. Pleszkun. Confidence estimation for speculation control. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 122--131, 1998. Google Scholar
Digital Library
- S. Gurumurthi, A. Parashar, and A. Sivasubramaniam. SOS: Using Speculation for Memory Error Detection. In Proceedings of the Workshop on High Performance Computing Reliability Issues (held in conjunction with HPCA), February 2005.Google Scholar
- HP NonStop Himalaya. http://nonstop.compaq.com/.Google Scholar
- J.J. Koppanalil and E. Rotenberg. A simple mechanism for detecting ineffectual instructions in slipstream processors. IEEE Transactions on Computers, 53(4):399--413, 2004. Google Scholar
Digital Library
- K. Lepak, G. Bell, and M. Lipasti. Silent Stores and Store Value Locality. IEEE Transactions on Computers, 50 11):1174--1190, November 2001. Google Scholar
Digital Library
- X. Li, S. V. Adve, P. Bose, and J.A. Rivers. Softarch: An architecture level tool for modeling and analyzing soft errors. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), pages 496--505, 2005. Google Scholar
Digital Library
- E. Morancho, J. Labia, and A. Olive. Recovery mechanism for latency misprediction. In Proceedings of the 2001 ACM/IEEE nternational Conference on Parallel Architectures and Compilation Techniques, 2001. Google Scholar
Digital Library
- A. Moshovos, D.N. Pnevmatikatos, and A. Baniasadi. Slice-processors: an implementation of operation-based prediction. n ICS '01: Proceedings of the 15th international conference on Supercomputing, pages 321--334, 2001. Google Scholar
Digital Library
- S. Mukherjee, M. Kontz, and S. Reinhardt. Detailed Design and Evaluation of Redundant Multithreading Alternatives. In roceedings of the International Symposium on Computer Architecture (ISCA), pages 99--110, May 2002. Google Scholar
Digital Library
- S. Mukherjee, C. Weaver, J. Emer, S. Reinhardt, and T. Austin. A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 29--40, December 2003. Google Scholar
Digital Library
- A. Parashar, S. Gurumurthi, and A. Sivasubramaniam. A Complexity-Effective Approach to ALU Bandwidth Enhancement or Instruction-Level Temporal Redundancy. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 376--386, June 2004. Google Scholar
Digital Library
- M.K. Qureshi, O. Mutlu, and Y.N. Patt. Microarchitecture-based introspection: A technique for transient-fault tolerance in microprocessors. In Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN'05), pages 434--443, 2005. Google Scholar
Digital Library
- S. Reinhardt and S. Mukherjee. Transient Fault Detection via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 25--36, June 2000. Google Scholar
Digital Library
- S. R. Sarangi, J. T. Wei Liu, and Y. Zhou. Reslice: Selective re-execution of long-retired misspeculated instructions using forward slicing. In Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, pages 257--270, 2005. Google Scholar
Digital Library
- T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In roceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2002. Google Scholar
Digital Library
- P. Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi. Modeling the Effect of Technology Trends on Soft Error Rate of Combinational Logic. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), June 2002. Google Scholar
Digital Library
- T. Slegel et al. IBM's S/390 G5 Microprocessor Design. IEEE Micro, 19(2), March 1999. Google Scholar
Digital Library
- J. Smolens, B. Gold, J. Kim, B. Falsafi, J. Hoe, and A. Nowatzyk. Fingerprinting: Bounding Soft-Error Detection Latency and Bandwidth. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 224--234, October 2004. Google Scholar
Digital Library
- J. Smolens, J. Kim, J. Hoe, and B. Falsafi. Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 257--268, December 2004. Google Scholar
Digital Library
- K. Sundaramoorthy, Z. Purser, and E. Rotenburg. Slipstream processors: improving both performance and fault tolerance. In ASPLOS-IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, pages 257--268, 2000. Google Scholar
Digital Library
- D. Tullsen, S. Eggers, and H. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. In Proceedings of the nternational Symposium on Computer Architecture (ISCA), pages 392--403, June 1995. Google Scholar
Digital Library
- T. Vijaykumar, I. Pomeranz, and K. Cheng. Transient-Fault Recovery via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 87--98, May 2002. Google Scholar
Digital Library
- N.J. Wang and S.J. Patel. Restore: Symptom based soft error detection in microprocessors. In Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN'05), pages 30--39, 2005. Google Scholar
Digital Library
- T.-Y. Yeh and Y. Patt. Alternative Implementations of Two-Level Adaptive Branch Prediction. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 124--134, May 1992. Google Scholar
Digital Library
- C. Zilles and G. Sohi. Understanding the Backward Slices of Performance Degrading Instructions. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 172--181, June 2000. Google Scholar
Digital Library
Index Terms
SlicK: slice-based locality exploitation for efficient redundant multithreading
Recommendations
Understanding prediction-based partial redundant threading for low-overhead, high- coverage fault tolerance
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systemsRedundant threading architectures duplicate all instructions to detect and possibly recover from transient faults. Several lighter weight Partial Redundant Threading (PRT) architectures have been proposed recently. (i) Opportunistic Fault Tolerance ...
SlicK: slice-based locality exploitation for efficient redundant multithreading
Proceedings of the 2006 ASPLOS ConferenceTransient faults are expected a be a major design consideration in future microprocessors. Recent proposals for transient fault detection in processor cores have revolved around the idea of redundant threading, which involves redundant execution of a ...
SlicK: slice-based locality exploitation for efficient redundant multithreading
Proceedings of the 2006 ASPLOS ConferenceTransient faults are expected a be a major design consideration in future microprocessors. Recent proposals for transient fault detection in processor cores have revolved around the idea of redundant threading, which involves redundant execution of a ...








Comments