ABSTRACT
We present a general scheme for virtualizing main memory error-correction mechanisms, which map redundant information needed to correct errors into the memory namespace itself. We rely on this basic idea, which increases flexibility to increase error protection capabilities, improve power efficiency, and reduce system cost; with only small performance overheads. We augment the virtual memory system architecture to detach the physical mapping of data from the physical mapping of its associated ECC information. We then use this mechanism to develop two-tiered error protection techniques that separate the process of detecting errors from the rare need to also correct errors, and thus save energy. We describe how to provide strong chipkill and double-chip kill protection using existing DRAM and packaging technology. We show how to maintain access granularity and redundancy overheads, even when using ×8 DRAM chips. We also evaluate error correction for systems that do not use ECC DIMMs. Overall, analysis of demanding SPEC CPU 2006 and PARSEC benchmarks indicates that performance overhead is only 1% with ECC DIMMs and less than 10% using standard Non-ECC DIMM configurations, that DRAM power savings can be as high as 27%, and that the system energy-delay product is improved by 12% on average.
- Calculating memory system power for DDR2. Technical Report TN-47-04, Micron Technology, 2005.Google Scholar
- N. Aggarwal, J. E. Smith, K. K. Saluja, N. P. Jouppi, and P. Ranganathan. Implementing high availability memory with a duplication cache. In Proc. the 41st IEEE/ACM Int'l Symp. Microarchitecture (MICRO), Nov. 2008. Google Scholar
Digital Library
- J. H. Ahn, N. P. Jouppi, C. Kozyrakis, J. Leverich, and R. S. Schreiber. Future scaling of processor-memmory interfaces. In Proc. the Int'l Conf. High Performance Computing, Networking, Storage and Analysis (SC), Nov. 2009. Google Scholar
Digital Library
- J. H. Ahn, J. Leverich, R. Schreiber, and N. P. Jouppi. Multicore DIMM: An energy efficient memory module with independently controlled DRAMs. IEEE Computer Architecture Letters, 8(1):5--8, Jan. -- Jun. 2009. Google Scholar
Digital Library
- AMD. BIOS and kernel developer's guide for AMD NPT family 0Fh processors, Jul. 2007. URL http://support.amd.com/us/Processor_TechDocs/32559.pdf.Google Scholar
- S. Ankireddi and T. Chen. Challenges in thermal management of memory modules. URL http://electronics-cooling.com/html/2008_feb_a3.php.Google Scholar
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. Technical Report TR-811-08, Princeton Univ., Jan. 2008.Google Scholar
Digital Library
- D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proc. the 27th Ann. Int'l Sump. Computer Architecure (ISCA), Jun. 2000. Google Scholar
Digital Library
- C. L. Chen. Symbol error correcting codes for memory applications. In Proc. the 26th Ann. Int'l Symp. Fault-Tolerant Computing (FTCS), Jun. 1996. Google Scholar
Digital Library
- C. L. Chen and M. Y. Hsiao. Error-correcting codes for semiconductor memory applications: A state-of-the-art review. IBM J. Research and Development, 28: 124--134, Mar. 1984. Google Scholar
Digital Library
- R. Danilak. Transparent error correction code memory system and method. US Patent, US 7,117,421, Oct. 2006.Google Scholar
- T. J. Dell. A white paper on the benefits of chipkill-correct ECC for PC server main memory. IBM Microelectronics Division, Nov. 1997.Google Scholar
- T. J. Dell. System RAS implications of DRAM soft errors. IBM J. Research and Development, 52(3):307--314, 2008. Google Scholar
Digital Library
- Earl Joseph II. GUPS (giga-updates per second) benchmark. URL http://www.dgate.org/~brg/files/dis/gups/.Google Scholar
- M. J. Haertel, R. S. Polzin, A. Kocev, and M. B. Steinman. ECC implementation in non-ECC components. US Patent Pending, Serial No. 725,922, Sep. 2008.Google Scholar
- G. Hamerly, E. Perelman, J. Lau, and B. Calder. SimPoint 3.0: Faster and more exible program analysis. In Proc. the Workshop on Modeling, Benchmarking and Simulation, Jun. 2005.Google Scholar
- R. W. Hamming. Error correcting and error detecting codes. Technical J., 29:147--160, Apr. 1950.Google Scholar
Cross Ref
- HP. Server power calculators. comconfigurator/powercalcs.asp.Google Scholar
- Bell System URL http://h30099.www3.hp.Google Scholar
- M. Y. Hsiao. A class of optimal minimum odd-weight-column SEC-DED codes. IBM J. Research and Development, 14:395--301, 1970. Google Scholar
Digital Library
- IBM. Enhancing IBM Netfinity server reliability, 1999.Google Scholar
- B. Jacob, S. Ng, and D. Wang. Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann, 2007. Google Scholar
Digital Library
- R. Kuppuswamy, S. R. Sawant, S. Balasubramanian, P. Kaushik, N. Natarajan,Google Scholar
- and J. D. Gilbert. Over one million TPCC with a 45nm 6-core Xeon CPU. In Proc. Int'l Solid State Circuits Conf. (ISSCC), Feb. 2009.Google Scholar
- H.-H. S. Lee, G. S. Tyson, and M. K. Farrens. Eager writeback -- a technique for improving bandwidth utilization. In Proc. the 33rd IEEE/ACM Int'l Symp. Microarchitecture (MICRO), Nov.-Dec. 2000. Google Scholar
Digital Library
- K. Lim, P. Ranganathan, J. Chang, C. Patel, T. Mudge, and S. Reinhardt. Understanding and designing new server architectures for emerging warehouse-computing environments. In Proc. the 35th Ann. Int'l Symp. Computer Architecture (ISCA), Jun. 2008. Google Scholar
Digital Library
- S. Lin and D. J. C. Jr. Error Control Coding: Fundamentals and Applications. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1983.Google Scholar
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. PIN: Building customized program analysis tools with dynamic instrumentation. In Proc. the ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI), Jun. 2005. Google Scholar
Digital Library
- P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hog-berg, F. Larsson, A. Moestedt, and B. Werner. SIMICS: A full system simulation platform. IEEE Computer, 35:50--58, Feb. 2002. Google Scholar
Digital Library
- M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Computer Architecture News (CAN), 33:92--99, Nov. 2005. Google Scholar
Digital Library
- J. D. McCalpin. STREAM: Sustainable memory bandwidth in high performance computers. URL http://www.cs.virginia.edu/stream/.Google Scholar
- U. Nawathe, M.Hassan, L. Warriner, K. Yen, B. Upputuri, D.Greenhill, A.Kumar, and H. Park. An 8-core, 64-thread, 64-bit, power efficient SPARC SoC. In Proc. the Int'l Solid State Circuits Conf. (ISSCC), Feb. 2007. Google Scholar
Digital Library
- NVIDIA. Fermi architecture. fermi_architecture.html. http://www.nvidia.com/object/Google Scholar
- I. S. Reed and G. Solomon. Polynomial codes over certain finite fields. J. Soc. for Industrial and Applied Math., 8:300--304, Jun. 1960.Google Scholar
Cross Ref
- N. N. Sadler and D. J. Sorin. Choosing an error protection scheme for a microprocessor's L1 data cache. In Proc. the Int'l Conf. Computer Design (ICCD), Oct. 2006.Google Scholar
Cross Ref
- B. Schroeder, E. Pinheiro, and W.-D. Weber. DRAM errors in the wild: A large-scale field study. In Proc. the 11th Int'l Joint Conf. Measurement and Modeling of Computer Systems (SIGMETRICS), Jun. 2009. Google Scholar
Digital Library
- A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts. Wiley, Dec. 2004. Google Scholar
Digital Library
- C. Slayman. Impact of error correction code and dynamic memory reconfiguration on high-reliability/low-cost server memory. In Proc. IEEE Int'l Integrated Reliability Workshop (IIRW), Oct. 2006.Google Scholar
Cross Ref
- Standard Performance Evaluation Corporation. SPEC CPU 2006, 2006. URL http://www.spec.org/cpu2006/.Google Scholar
- J. Standards. JESD 79-2e DDR2 SDRAM specification, 2008.Google Scholar
- J. Standards. JESD 79-3b DDR3 SDRAM specification, 2008.Google Scholar
- OpenSPARC T2 System-On-Chip (SOC) Microarchitecture Specification. Sun Microsystems Inc., May 2008.Google Scholar
- UltraSPARC R III Cu. Sun Microsystems Inc., Jan. 2004.Google Scholar
- M. Talluri and M. D. Hill. Surpassing the TLB performance of superpages with less operating system support. In Proc. the 6th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 1994. Google Scholar
Digital Library
- S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical report, HP Laboratories, Apr. 2008.Google Scholar
- Violin Memory Inc. Scalable memory applicance. violin-memory.com/DRAM.Google Scholar
- D. Wang, B. Ganesh, N. Tuaycharoen, K. Baynes, A. Jaleel, and B. Jacob. DRAMsim: A memory-system simulator. SIGARCH Computer Architecture News (CAN), 33:100--107, Sep. 2005. Google Scholar
Digital Library
- P. M. Wells, K. Chakraborty, and G. S. Sohi. Mixed-mode multicore reliability. In Proc. the 14th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2009. Google Scholar
Digital Library
- E. Witchel, J. Cates, and K. Asanovic. Mondrian memory protection. In Proc. the 10th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 2002. Google Scholar
Digital Library
- D. H. Yoon and M. Erez. Flexible cache error protection using an ECC FIFO. In Proc. the Int'l Conf. High Performance Computing, Networking, Storage, and Analysis (SC), Nov. 2009. Google Scholar
Digital Library
- D. H. Yoon and M. Erez. Memory mapped ECC: Low-cost error protection for last level caches. In Proc. the 36th Ann. Int'l Symp. Computer Architecture (ISCA), Jun. 2009. Google Scholar
Digital Library
- Z. Zhang, Z. Zhu, and X. Zhang. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In Proc. the 33rd IEEE/ACM Int'l Symp. Microarchitecture (MICRO), Dec. 2000. Google Scholar
Digital Library
- H. Zheng, J. Lin, Z. Zhang, E. Gorbatov, H. David, and Z. Zhu. Mini-rank: Adaptive DRAM architecture for improving memory power efficiency. In Proc. the 41st IEEE/ACM Int'l Symp. Microarchitecture (MICRO), Nov. 2008. Google Scholar
Digital Library
Index Terms
Virtualized and flexible ECC for main memory
Recommendations
Virtualized and flexible ECC for main memory
ASPLOS '10We present a general scheme for virtualizing main memory error-correction mechanisms, which map redundant information needed to correct errors into the memory namespace itself. We rely on this basic idea, which increases flexibility to increase error ...
Virtualized and flexible ECC for main memory
ASPLOS '10We present a general scheme for virtualizing main memory error-correction mechanisms, which map redundant information needed to correct errors into the memory namespace itself. We rely on this basic idea, which increases flexibility to increase error ...
Virtualized ECC: Flexible Reliability in Main Memory
Virtualized error checking and correcting (ECC) is a scheme that virtualizes memory-error correction. Unlike traditional uniform ECC, which provides a fixed level of error tolerance, virtualized ECC enables flexible memory protection by mapping ...








Comments