Abstract
On account of their large footprint, on-chip last-level caches in multi-core systems are one of the most vulnerable components to soft errors. However, vulnerability to soft errors highly depends on the configuration and parameters of the last-level cache, especially when executing different applications concurrently. In this article we propose a novel reliability-aware reconfigurable last-level cache architecture (R2Cache) and cache vulnerability model for multi-cores. R2Cache supports various reliability-wise efficient cache configurations (i.e., cache parameter selection and cache partitioning) for different concurrently executing applications. The proposed vulnerability model takes into account the vulnerability of both the data and tag arrays as well as the active cache area for applications in different execution phases. To enable runtime adaptations, we introduce a lightweight online vulnerability predictor that exploits the knowledge of performance metrics like number of L2 misses to accurately estimate the cache vulnerability to soft errors. Based on the predicted vulnerabilities of different concurrently executing applications in the current execution epoch, our runtime reliability manager reconfigures the cache such that, for the next execution epoch, the total vulnerability for all concurrently executing applications is minimized under user-provided tolerable performance/energy overheads. In scenarios where single-bit error correction for cache lines may be afforded, vulnerability-aware reconfigurations can be leveraged to increase the reliability of the last-level cache against multi-bit errors. Compared to state-of-the-art vulnerability-minimizing and reconfigurable caches, the proposed architecture provides 35.27% and 23.42% vulnerability savings, respectively, when averaged across numerous experiments, while reducing the vulnerability by more than 65% and 60%, respectively, for selected applications and application phases.
- A. R. Alameldeen, I. Wagner, Z. Chishti, W. Wu, C. Wilkerson, and S.-L. Lu. 2011. Energy-efficient cache design using variable-strength error-correcting codes. In International Symposium on Computer Architecture (ISCA). 461--472. Google Scholar
Digital Library
- R. C. Baumann. 2005. Radiation-induced soft errors in advanced semiconductor technologies. IEEE Trans. Device Mater. Reliab. 5, 3 (2005), 305--316.Google Scholar
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In International Conference on Parallel Architecture and Compilation Techniques (PACT). 72--81. Google Scholar
Digital Library
- N. L. Binkert, B. M. Beckmann, G. Black, S. K. Reinhardt, A. G. Saidi, A. Basu, J. Hestness, D. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. 2011. The gem5 simulator. SIGARCH Comput. Arch. News 39, 2 (2011), 1--7. Google Scholar
Digital Library
- T. E. Carlson, W. Heirman, K. Van Craeynest, and L. Eeckhout. 2014. BarrierPoint: Sampled simulation of multi-threaded applications. In International Symposium on Performance Analysis of Systems and Software (ISPASS). 2--12.Google Scholar
- C.-L. Chen and M. Y. (Ben) Hsiao. 1984. Error-correcting codes for semiconductor memory applications: A state-of-the-art review. IBM J. Res. Dev. 28, 2 (1984), 124--134. Google Scholar
Digital Library
- A. Dixit and A. Wood. 2011. The impact of new technology on soft error rates. In IEEE International Reliability Physics Symposium (IRPS). 5B.4.1--5B.4.7.Google Scholar
- L. Duan, B. Li, and L. Peng. 2009. Versatile prediction and fast estimation of architectural vulnerability factor from processor performance metrics. In International Conference on High-Performance Computer Architecture (HPCA). 129--140.Google Scholar
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. IEEE International Workshop on Workload Characterization (IISWC) (2001), 3--14. Google Scholar
Digital Library
- A. Haghdoost, H. Asadi, and A. Baniasadi. 2010. System-level vulnerability estimation for data caches. In IEEE Pacific Rim International Symposium on Dependable Computing (PRDC). 157--164. Google Scholar
Digital Library
- J. Henkel, L. Bauer, J. Becker, O. Bringmann, U. Brinkschulte, S. Chakraborty, M. Engel, R. Ernst, H. Härtig, L. Hedrich, A. Herkersdorf, R. Kapitza, D. Lohmann, P. Marwedel, M. Platzner, W. Rosenstiel, U. Schlichtmann, O. Spinczyk, M. Tahoori, J. Teich, N. Wehn, and H.-J. Wunderlich. 2011. Design and architectures for dependable embedded systems. In International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 365--374. Google Scholar
Digital Library
- IBM. 2015. Power Servers. http://www-03.ibm.com/systems/power/hardware/. (2015).Google Scholar
- Intel. 2015. Itanium Processor. http://ark.intel.com/products/family/451/Intel-Itanium-Processor. (2015).Google Scholar
- R. Jeyapaul and A. Shrivastava. 2013. Enabling energy efficient reliability in embedded systems through smart cache cleaning. ACM Trans. Des. Automat. Electron. Syst. 18, 4 (2013), 53. Google Scholar
Digital Library
- R. E. Kessler, E. J. McLellan, and D. A. Webb. 1998. The alpha 21264 microprocessor architecture. In International Conference on Computer Design (ICCD). 90--95. Google Scholar
Digital Library
- J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J. C. Hoe. 2007. Multi-bit error tolerant caches using two-dimensional error coding. In International Symposium on Microarchitecture (MICRO). 197--209. Google Scholar
Digital Library
- F. Kriebel, A. Subramaniyan, S. Rehman, S. J. B. Ahandagbe, M. Shafique, and J. Henkel. 2015. R2Cache: Reliability-aware reconfigurable last-level cache architecture for multi-cores. In International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 1--10. Google Scholar
Digital Library
- L. Li, V. Degalahal, N. Vijaykrishnan, M. T. Kandemir, and M. J. Irwin. 2004. Soft error and energy consumption interactions: A data cache perspective. In International Symposium on Low Power Electronics and Design (ISLPED). 132--137. Google Scholar
Digital Library
- S. S. Mukherjee, J. S. Emer, and S. K. Reinhardt. 2005. The soft error problem: An architectural perspective. In International Conference on High-Performance Computer Architecture (HPCA). 243--247. Google Scholar
Digital Library
- S. S. Mukherjee, C. T. Weaver, J. S. Emer, S. K. Reinhardt, and T. M. Austin. 2003. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In International Symposium on Microarchitecture (MICRO). 29--42. Google Scholar
Digital Library
- N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi. 2008. Architecting efficient interconnects for large caches with CACTI 6.0. IEEE Micro 28, 1 (2008), 69--79. Google Scholar
Digital Library
- M. K. Qureshi and Z. Chishti. 2013. Operating SECDED-based caches at ultra-low voltage with FLAIR. In International Conference on Dependable Systems and Networks (DSN). 1--11. Google Scholar
Digital Library
- M. K. Qureshi and Y. N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In International Symposium on Microarchitecture (MICRO). 423--432. Google Scholar
Digital Library
- M. Rawlins and A. Gordon-Ross. 2013. A cache tuning heuristic for multicore architectures. IEEE Trans. Comput. 62, 8 (2013), 1570--1583. Google Scholar
Digital Library
- S. Rehman, M. Shafique, F. Kriebel, and J. Henkel. 2011. Reliable software for unreliable hardware: Embedded code generation aiming at reliability. In International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 237--246. Google Scholar
Digital Library
- A. Sembrant, D. Black-Schaffer, and E. Hagersten. 2012. Phase behavior in serial and parallel applications. In International Symposium on Workload Characterization (IISWC). 47--58. Google Scholar
Digital Library
- T. Sherwood, E. Perelman, G. Hamerly, S. Sair, and B. Calder. 2003. Discovering and exploiting program phases. IEEE Micro 23, 6 (2003), 84--93. Google Scholar
Digital Library
- S. Srikantaiah, E. Kultursay, T. Zhang, M. T. Kandemir, M. J. Irwin, and Y. Xie. 2011. MorphCache: A reconfigurable adaptive multi-level cache hierarchy. In International Conference on High-Performance Computer Architecture (HPCA). 231--242. Google Scholar
Digital Library
- K. T. Sundararajan, T. M. Jones, and N. P. Topham. 2013a. RECAP: Region-aware cache partitioning. In International Conference on Computer Design (ICCD). 294--301.Google Scholar
- K. T. Sundararajan, T. M. Jones, and N. P. Topham. 2013b. The smart cache: An energy-efficient cache architecture through dynamic adaptation. Int. J. Parallel Program. 41, 2 (2013), 305--330.Google Scholar
Cross Ref
- S. Wang, J. S. Hu, and S. G. Ziavras. 2009. On the characterization and optimization of on-chip cache reliability against soft errors. IEEE Transactions on Computers (TC) 58, 9 (2009), 1171--1184. Google Scholar
Digital Library
- W. Wang, P. Mishra, and S. Ranka. 2011. Dynamic cache reconfiguration and partitioning for energy optimization in real-time multi-core systems. In Design Automation Conference (DAC). 948--953. Google Scholar
Digital Library
- M. Wilkening, V. Sridharan, S. Li, F. Previlon, S. Gurumurthi, and D. R. Kaeli. 2014. Calculating architectural vulnerability factors for spatial multi-bit transient faults. In International Symposium on Microarchitecture (MICRO). 293--305. Google Scholar
Digital Library
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In International Symposium on Computer Architecture (ISCA). 24--36. Google Scholar
Digital Library
- D. H. Yoon and M. Erez. 2009. Memory mapped ECC: Low-cost error protection for last level caches. In International Symposium on Computer Architecture (ISCA). 116--127. Google Scholar
Digital Library
- C. Zhang, F. Vahid, and R. L. Lysecky. 2004. A self-tuning cache architecture for embedded systems. ACM Transactions on Embedded Computing Systems (TECS) 3, 2 (2004), 407--425. Google Scholar
Digital Library
- C. Zhang, F. Vahid, and W. A. Najjar. 2003. A highly-configurable cache architecture for embedded systems. In International Symposium on Computer Architecture (ISCA). 136--146. Google Scholar
Digital Library
- W. Zhang. 2005. Computing cache vulnerability to transient errors and its implication. In International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT). 427--435. Google Scholar
Digital Library
- W. Zhang, S. Gurumurthi, M. T. Kandemir, and A. Sivasubramaniam. 2003. ICR: In-cache replication for enhancing data cache reliability. In International Conference on Dependable Systems and Networks (DSN). 291--300.Google Scholar
- Y. Zou and S. Pasricha. 2014. HEFT: A hybrid system-level framework for enabling energy-efficient fault-tolerance in NoC based MPSoCs. In International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 4:1--4:10. Google Scholar
Digital Library
Index Terms
Reliability-Aware Adaptations for Shared Last-Level Caches in Multi-Cores
Recommendations
A new cache replacement algorithm for last-level caches by exploiting tag-distance correlation of cache lines
Cache memory plays a crucial role in determining the performance of processors, especially for embedded processors where area and power are tightly constrained. It is necessary to have effective management mechanisms, such as cache replacement policies, ...
Adaptive Cache Bypassing for Inclusive Last Level Caches
IPDPS '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed ProcessingCache hierarchy designs, including bypassing, replacement, and the inclusion property, have significant performance impact. Recent works on high performance caches have shown that cache bypassing is an effective technique to enhance the last level cache ...
An application-aware cache replacement policy for last-level caches
ARCS'13: Proceedings of the 26th international conference on Architecture of Computing SystemsCurrent day multicore processors employ multi-level cache hierarchy with one or two levels of private caches and a shared last-level cache (LLC). Efficient cache replacement policies at LLC are essential for reducing the off-chip memory traffic as well ...






Comments