Abstract
The dual effects of larger die sizes and technology scaling, combined with aggressive voltage scaling for power reduction, increase the error rates for on-chip memories. Traditional on-chip memory reliability techniques (e.g., ECC) incur significant power and performance overheads. In this article, we propose a low-power-and-performance-overhead Embedded RAID (E-RAID) strategy and present Embedded RAIDs-on-Chip (E-RoC), a distributed dynamically managed reliable memory subsystem for bus-based Chip-Multiprocessors. E-RoC achieves reliability through redundancy by optimizing RAID-like policies tuned for on-chip distributed memories. We achieve on-chip reliability of memories through the use of Distributed Dynamic ScratchPad Allocatable Memories (DSPAMs) and their allocation policies. We exploit aggressive voltage scaling to reduce power consumption overheads due to parallel DSPAM accesses, and rely on the E-RoC Manager to automatically handle any resulting voltage-scaling-induced errors. We demonstrate how E-RAIDs can further enhance the fault tolerance of traditional memory reliability approaches by designing E-RAID levels that exploit ECC. Finally, we show the power and flexibility of the E-RoC concept by showing the benefits of having a heterogeneous E-RAID levels that fit each application's needs (fault tolerance, power/energy, performance).
Our experimental results on CHStone/Mediabench II benchmarks show that our E-RAID levels converge to 100% error-free data rates much faster than traditional ECC approaches. Moreover, E-RAID levels that exploit ECC can guarantee 99.9% error-free data rates at ultra low Vdd on average, where as traditional ECC approaches were able to attain at most 99.1% error-free data rates. We observe an average of 22% dynamic power consumption increase by using traditional ECC approaches with respect to the baseline (non-voltage scaled SPMs), whereas our E-RAID levels are able to save dynamic power consumption by an average of 27% (w.r.t. the same non-voltage scaled SPMs baseline), while incurring worst-case 2% higher performance overheads than traditional ECC approaches. By voltage scaling the memories, we see that traditional ECC approaches are able to save static energy by 6.4% (average), where as our E-RAID approaches achieve 23.4% static energy savings (average). Finally, we observe that mixing E-RAID levels allows us to further reduce the dynamic power consumption by up to 55.5% at the cost of an average 5.6% increase in execution time over traditional approaches.
- F. Angiolini, D. Atienza, S. Murali, L. Benini, and G. De Micheli. 2006. Reliability support for on-chip memories using networks-on-chip. In Proceedings of the International Conference on Computer Design.Google Scholar
- A. Ansari, S. Feng, S. Gupta, and S. Mahlke. 2009a. Enabling ultra low voltage system operation by tolerating on-chip cache failures. In Proceedings of the 14th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'09). Google Scholar
Digital Library
- A. Ansari, S. Gupta, S. Feng, and S. Mahlke. 2009b. ZerehCache: Armoring cache architectures in high defect density technologies. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture Google Scholar
Digital Library
- T. Austin, E. Larson, and D. Ernst. 2002. Simplescalar: an infrastructure for computer system modeling. Computer 35, 2, 59--67. Google Scholar
Digital Library
- K. Bai and A. Shrivastava. 2010. Heap data management for limited local memory (llm) multi-core processors. In Proceedings of the 8th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES/ISSS'10). 317--326. Google Scholar
Digital Library
- A. BanaiyanMofrad, H. Houman, and N. Dutt. 2011. Fft-cache: A flexible fault-tolerant cache architecture for ultra low voltage operation. In Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'11). ACM, New York, 95--104. Google Scholar
Digital Library
- R. Banakar, S. Steinke, Bo-Sik Lee, M. Balakrishnan, and Peter Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES'02). Google Scholar
Digital Library
- L. A. D. Bathen, Yongjin Ahn, N. D. Dutt, and S. Pasricha. 2009. Inter-kernel data reuse and pipelining on chip multiprocessors for multimedia applications. In Proceedings of the IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia'09).Google Scholar
- L. Bathen and N. Dutt. 2010. Towards embedded raids-on-chip. Tech. Rep. #10-12, UCI Center for Embedded Computer Systems.Google Scholar
- L. A. D. Bathen and N. D. Dutt. 2011. E-RoC: Embedded raids-on-chip for low power distributed dynamically managed reliable memories. In Proceedings of the Design, Automation Test in Europe Conference and Exhibition.Google Scholar
- Luis Angel Bathen and Nikil Dutt. 2012. Havoc: a hybrid memory-aware virtualization layer for on-chip distributed scratchpad and non-volatile memories. In Proceedings of the 49th Annual Design Automation Conference (DAC'12). ACM, New York, 447--452. Google Scholar
Digital Library
- L. A D. Bathen, Y. Ahn, S. Pasricha, and Nikil D. Dutt. 2009. A methodology for power-aware pipelining via high-level performance model evaluations. In Proceedings of the 10th International Workshop on Microprocessor Test and Verification (MTV'09). 19--24. Google Scholar
Digital Library
- S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De. 2003. Parameter variations and impact on circuits and microarchitecture. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'03). Google Scholar
Digital Library
- B. Calhoun and A. P. Chandrakasan. 2007. A 256-kb 65-nm sub-threshold SRAM design for ultra-low-voltage operation. IEEE J. Solid-State Circuits 42, 3.Google Scholar
Cross Ref
- A. Chakraborty, H. Homayoun, A. Khajeh, N. Dutt, A. Eltawil, and F. Kurdahi. 2010. E < MC2: Less energy through multi-copy cache. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'10). 237--246. Google Scholar
Digital Library
- L. Chang, D. M. Fried, et al. 2005. Stable SRAM cell design for the 32 nm node and beyond. In Digest of Technical Papers, Symposium on VLSI Technology (VLSIT'05). 128--129.Google Scholar
- C. L. Chen and M. Y. Hsiao. 1984. Error-correcting codes for semiconductor memory applications: A state-of-the-art review. IBM J. Res. Dev. 28, 124--134. Google Scholar
Digital Library
- A. K. Djahromi, A. M. Eltawil, F. J. Kurdahi, and R. Kanj. 2007. Cross layer error exploitation for aggressive voltage scaling. In Proceedings of the 8th International Symposium on Quality Electronic Design (ISQED'07). IEEE, 192--197. Google Scholar
Digital Library
- B. Egger, S. Kim, C. Jang, J. Lee, S. L. Min, and H. Shin. 2010. Scratchpad memory management techniques for code in embedded systems without an MMU. IEEE Trans. Comput. 59, 8. Google Scholar
Digital Library
- B. Egger, J. Lee, and H. Shin. 2008. Dynamic scratchpad memory management for code in portable systems with an MMU. ACM Trans. Embed. Comput. Syst. 7. Google Scholar
Digital Library
- P. Francesco, P. Marchal, D. Atienza, L. Benini, F. Catthoor, and J. M. Mendias. 2004. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the 41st Annual Design Automation Conference (DAC'04). Google Scholar
Digital Library
- K. Fritts., W. H. Wolfe, and B. Liu. 1999. Understanding multimedia application characteristics for designing programmable media processors. InSPIE Proceedings, Vol. 3655.Google Scholar
- L. Gauthier, T. Ishihara, H. Takase, H. Tomiyama, and H. Takada. 2010. Minimizing inter-task interferences in scratch-pad memory usage for reducing the energy consumption of multi-task systems. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'10). 157--166. Google Scholar
Digital Library
- S. Ghosh, S. Basu, and N. A. Touba. 2004. Reducing power consumption in memory ecc checkers. In Proceedings of the International Test Conference. Google Scholar
Digital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization. IEEE, 3--14. Google Scholar
Digital Library
- Y. Hara, H. Tomiyama, S. Honda, H. Takada, and K. Ishii. 2008. Chstone: A benchmark program suite for practical c-based high-level synthesis. In Proceedings of the IEEE International Symposium on Circuits and Systems. 1192--1195.Google Scholar
- IBM. 2005. The cell project. IBM, http://www.research.ibm.com/cell/.Google Scholar
- Intel. 2007. Teraflops research chip. Intel, http://techresearch.intel.com/ProjectDetails.aspx?Id=151.Google Scholar
- Intel. 2009. Single-chip cloud computer. Intel, http://techresearch.intel.com/ProjectDetails.aspx?Id=1.Google Scholar
- ITRS. 2007. Process integration, device and structures. http://www.itrs.net/.Google Scholar
- S. Jahinuzzaman, T. Shakir, S. Lubana, J. S. Shah, and M. Sachdev. 2008. A multiword based high speed ecc scheme for low-voltage embedded srams. In Proceedings of the 34th European Solid-State Circuits Conference (ESSCIRC'08).Google Scholar
- S. C. Jung, A. Shrivastava, and K. Bai. 2010. Dynamic code mapping for limited local memory systems. In Proceedings of the 21st IEEE International Conference on Application-Specific Systems Architectures and Processors. 13--20.Google Scholar
- H. L. Kalter, C. H. Stapper, J. E. Barth, J. Dilorenzo, C. E. Drake, J. A. Fifield, G. A. Kelley, S. C. Lewis, W. B. van der Hoeven, and J. A. Yankosky. 1990. A 50-ns 16-mb DRAM with a 10-ns data rate and on-chip ecc. IEEE J. Solid-state Circuits 25, 1118--1128.Google Scholar
Cross Ref
- M. Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the 38th Annual Design Automation Conference (DAC'01). Google Scholar
Digital Library
- S. Kaneko, K. Sawai, et al. 2003. A 600mhz single-chip multiprocessor with 4.8gb/s internal shared pipelined bus and 512kb internal memory. In Proceedings of the International Solid-State Circuits Conference (ISSCC'03).Google Scholar
- J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J. C. Hoe. 2007. Multi-bit error tolerant caches using two-dimensional error coding. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture. 197--209. Google Scholar
Digital Library
- S. Kim. 2006. Area-efficient error protection for caches. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'06). European Design and Automation Association, 1282--1287. Google Scholar
Digital Library
- J. Kulkarni, K. Kim, and K. Roy. 2007. A 160 mV, fully differential, robust schmitt trigger based sub-threshold SRAM. In Proceedings of the 14th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'07). Google Scholar
Digital Library
- F. J. Kurdahi, A. Eltawil, Kang Yi, S. Cheng, and A. Khajeh. 2010. Low-power multimedia system design by aggressive voltage scaling. IEEE Trans. VLSI Syst. 18, 5. Google Scholar
Digital Library
- C. Lee, M. Potkonjak, and W. H. Mangione-Smith. 1997. Mediabench: a tool for evaluating and synthesizing multimedia and communicatons systems. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 330--335. Google Scholar
Digital Library
- K. Lee, A. Shrivastava, I. Issenin, N. Dutt, and N. Venkatasubramanian. 2006. Mitigating soft error failures for multimedia applications by selective data protection. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'06). Google Scholar
Digital Library
- F. Li, G. Chen, M. Kandemir, and I. Kolcu. 2005. Improving scratch-pad memory reliability through compiler-guided data block duplication. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD'05). IEEE, 1002--1005. Google Scholar
Digital Library
- X. Liang, R. Canal, G.-Y. Wei, and D. Brooks. 2007. Process variation tolerant 3t1d-based cache architectures. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture. Google Scholar
Digital Library
- M. Lucente, C. Harris, and R. Muir. 1990. Memory system reliability improvement through associative cache redundancy. In Proceedings of the IEEE Custom Integrated Circuits Conference.Google Scholar
- G. Madl, S. Pasricha, L. A. D. Bathen, N. Dutt, and Q. Zhu. 2006. Formal performance evaluation of AMBA-based system-on-chip designs. In Proceedings of the 6th ACM/IEEE International Conference on Embedded Software (EMSOFT'06). ACM, New York, 311--320. Google Scholar
Digital Library
- M. A. Makhzan, A. Khajeh, A. Eltawil, and F. Kurdahi. 2007. Limits on voltage scaling for caches utilizing fault tolerant techniques. In Proceedings of the 25th International Conference on Computer Design (ICCD'07). 488--495.Google Scholar
- R. Mastipuram and E. C. Wee. 2004. Soft errors impact on system reliability. http://www.edn.com/article/CA454636.Google Scholar
- F. Moradi, D. T. Wisland, S. Aunet, H. Mahmoodi, and T. V. Cao. 2008. 65NM sub-threshold 11T-SRAM for ultra low voltage applications. In Proceedings of the IEEE International SOC Conference (SOCC'08). 113--118.Google Scholar
- R. J. T. Morris and B. J. Truskowski. 2003. The evolution of storage systems. IBM Syst. J. 42. Google Scholar
Digital Library
- M. Mutyam and V. Narayanan. 2007. Working with process variation aware caches. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe. Google Scholar
Digital Library
- K. Osada, Y. Saitoh, E. Ibe, and K. Ishibashi. 2003. 16.7 fa/cell tunnel-leakage-suppressed 16 mb SRAM for handling cosmic-ray-induced multi-errors. In Digest of Technical Papers, IEEE International Solid-State Circuits Conference (ISSCC'03). 302--494.Google Scholar
- OSCI. 2005. Systemc lrm (ver2.1). http://www.systemc.org.Google Scholar
- P. R. Panda, N. D. Dutt, and A. Nicolau. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the European Conference on Design and Test (EDTC'97). Google Scholar
Digital Library
- A. Pant, P. Gupta, and M. van der Schaar. 2010. Software adaptation in quality sensitive applications to deal with hardware variability. In Proceedings of the 20th Great Lakes Symposium on VLSI (GLSVLSI'10). Google Scholar
Digital Library
- V. Papirla and C. Chakrabarti. 2009. Energy-aware error control coding for flash memories. In Proceedings of the 46th Annual Design Automation Conference (DAC'09). ACM, New York, 658--663. Google Scholar
Digital Library
- Y.-H. Park, S. Pasricha, F. J. Kurdahi, and N. Dutt. 2011. A multi-granularity power modeling methodology for embedded processors. IEEE Trans. VLSI Syst. 19, 668--681. Google Scholar
Digital Library
- S. Pasricha. 2002. Transaction level modeling of SoC with SystemC 2.0. In Proceedings of the Synopsys User Group Conference.Google Scholar
- S. Pasricha and N. Dutt. 2006. Cosmeca: application specific co-synthesis of memory and communication architectures for mpsoc. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'06). European Design and Automation Association, 700--705. Google Scholar
Digital Library
- S. Pasricha, N. Dutt, and M. Ben-Romdhane. 2004. Fast exploration of bus-based on-chip communication architectures. In Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign And System Synthesis (CODES+ISSS'04). ACM, New York, 242--247. Google Scholar
Digital Library
- S. Pasricha, Y.-H. Park, F. J. Kurdahi, and N. Dutt. 2010. Capps: A framework for power-performance tradeoffs in bus-matrix-based on-chip communication architecture synthesis. IEEE Trans. VLSI Syst. 18, 209--221. Google Scholar
Digital Library
- D. Patterson, G. Gibson, and R. H. Katz. 1988. A case for redundant arrays of inexpensive disks (raid). In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'88). Google Scholar
Digital Library
- R. Pyka, C. Fassbach, M. Verma, H. Falk, and P. Marwedel. 2007. Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications. In Proceedings of the 10th International Workshop on Software & Compilers for Embedded Systems (SCOPES'07). Google Scholar
Digital Library
- S. Ramaswamy and S. Yalamanchili. 2007. Improving cache efficiency via resizing + remapping. In Proceedings of the IEEE International Conference on Computer Design. IEEE, 47--54.Google Scholar
- J. Sartori, A. Pant, K. Rakesh, and P. Gupta. 2010. Variation-aware speed binning of multi-core processors. In Proceedings of the International Symposium on Quality Electronic Design (ISQED'10).Google Scholar
- A. Sasan, H. Homayoun, A. Eltawil, and F. Kurdahi. 2009a. A fault tolerant cache architecture for sub 500mV operation: resizable data composer cache (rdc-cache). In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'09). 251--260. Google Scholar
Digital Library
- A. Sasan, H. Homayoun, A. Eltawil, and F. Kurdahi. 2009b. Process variation aware sram/cache for aggressive voltage-frequency scaling. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'09). Google Scholar
Digital Library
- M. Shalan and V. J. Mooney. 2000. A dynamic memory management unit for embedded real-time system-on-a-chip. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'00). Google Scholar
Digital Library
- P. Shirvani and E. J. McCluskey. 1999. Padded cache: A new fault-tolerance technique for cache memories. In Proceedings of the 17th IEEE VLSI Test Symposium (VTS'99). Google Scholar
Digital Library
- V. Suhendra, C. Raghavan, and T. Mitra. 2006. Integrated scratchpad memory optimization and task scheduling for mpsoc architectures. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'06). 401--410. Google Scholar
Digital Library
- V. Suhendra, A. Roychoudhury, and T. Mitra. 2008. Scratchpad allocation for concurrent embedded software. In Proceedings of the 6th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'08). 37--42. Google Scholar
Digital Library
- H. Takase, H. Tomiyama, and H. Takada. 2010. Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'10). Google Scholar
Digital Library
- S. Thoziyoor, N. Muralimanohar, J. Ho Ahn, and N. P. Jouppi. 2004. Hp labs Cacti v5.3. CACTI 5.1, Tech. rep., http://www.hpl.hp.com/techreports/2008/HPL-2008-20.html.Google Scholar
- Tilera. 2010. Tile gx family. http://www.tilera.com/products/processors/TILE-Gx Family.Google Scholar
- H. T. Vergos and D. Nikolos. 1995. Efficient fault tolerant cache memory design. Microprocess. Microprogram. 41, 153--169. Google Scholar
Digital Library
- M. Verma, S. Steinke, and P. Marwedel. 2003. Data partitioning for maximal scratchpad usage. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'03). 77--83. Google Scholar
Digital Library
- L. Wanner, C. Apte, R. Balani, P. Gupta, and M. Srivastava. 2010. A case for opportunistic embedded sensing in presence of hardware power variability. In Proceedings of the International Conference on Power Aware Computing and Systems (HotPower'10). Google Scholar
Digital Library
- C. Wilkerson, H. Gao, A. R. Alameldeen, Z. Chishti, M. Khellah, and S.-L. Lu. 2008. Trading off cache capacity for reliability to enable low voltage operation. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). Google Scholar
Digital Library
- W. Zhang. 2004. Enhancing data cache reliability by the addition of a small fully-associative replication cache. In Proceedings of the 18th annual International Conference on Supercomputing (ICS'04). Google Scholar
Digital Library
- W. Zhang, S. Gurumurthi, M. Kandemir, and A. Sivasubramaniam. 2003. ICR: In-cache replication for enhancing data cache reliability. In Proceedings of the International Conference on Dependable Systems and Networks.Google Scholar
Index Terms
Embedded RAIDs-on-chip for bus-based chip-multiprocessors
Recommendations
Virtualizing on-chip distributed ScratchPad memories for low power and trusted application execution
Emerging multicore platforms are increasingly deploying distributed scratchpad memories to achieve lower energy and area together with higher predictability; but this requires transparent and efficient software management of these critical resources. In ...
SPMVisor: dynamic scratchpad memory virtualization for secure, low power, and high performance distributed on-chip memories
CODES+ISSS '11: Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesisEmerging multicore platforms are increasingly deploying distributed scratchpad memories to achieve lower energy and area together with higher predictability; but this requires transparent and efficient software management of these critical resources. In ...
Write activity reduction on non-volatile main memories for embedded chip multiprocessors
Recent advances in circuit and semiconductor technologies have pushed Non-Volatile Memory (NVM) technologies into a new era. These technologies exhibit appealing properties such as low power consumption, non-volatility, shock-resistivity, and high ...






Comments