skip to main content
research-article

NoC-based fault-tolerant cache design in chip multiprocessors

Published:28 March 2014Publication History
Skip Abstract Section

Abstract

Advances in technology scaling increasingly make emerging Chip MultiProcessor (CMP) platforms more susceptible to failures that cause various reliability challenges. In such platforms, error-prone on-chip memories (caches) continue to dominate the chip area. Also, Network-on-Chip (NoC) fabrics are increasingly used to manage the scalability of these architectures. We present a novel solution for efficient implementation of fault-tolerant design of Last-Level Cache (LLC) in CMP architectures. The proposed approach leverages the interconnection network fabric to protect the LLC cache banks against permanent faults in an efficient and scalable way. During an LLC access to a faulty block, the network detects and corrects the faults, returning the fault-free data to the requesting core. Leveraging the NoC interconnection fabric, designers can implement any cache fault-tolerant scheme in an efficient, modular, and scalable manner for emerging multicore/manycore platforms. We propose four different policies for implementing a remapping-based fault-tolerant scheme leveraging the NoC fabric in different settings. The proposed policies enable design trade-offs between NoC traffic (packets sent through the network) and the intrinsic parallelism of these communication mechanisms, allowing designers to tune the system based on design constraints. We perform an extensive design space exploration on NoC benchmarks to demonstrate the usability and efficacy of our approach. In addition, we perform sensitivity analysis to observe the behavior of various policies in reaction to improvements in the NoC architecture. The overheads of leveraging the NoC fabric are minimal: on an 8-core, 16-cache-bank CMP we demonstrate reliable access to LLCs with additional overheads of less than 3% in area and less than 7% in power.

References

  1. A. Agarwal, B. C. Paul, H. Mahmoodi-Meimand, A. Datta, and K. Roy. 2005. A process-tolerant cache architecture for improved yield in nanoscale technologies. IEEE Trans. VLSI Syst. 13, 1, 27--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Aggarwal, P. Ranganathan, N. P. Jouppi, and J. E. Smith. 2007. Configurable isolation: Building high availability systems with commodity multi-core processors. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA'07). 470--481. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Alameldeen, I. Wagner, Z. Chishti, W. Wu, and S.-L. Lu. 2011. Energy-efficient cache design using variable-strength error-correcting codes. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA'11). 461--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Angiolini, D. Atienza, S. Murali, L. Benini, and Micheli, G. D. 2006. Reliability support for on-chip memories using networks-on-chip. In Proceedings of the International Conference on Computer Design (ICCD'06).Google ScholarGoogle Scholar
  5. A. Ansari, S. Feng, S. Gupta, and S. Mahlke. 2011. Archipelago: A polymorphic cache design for enabling robust near-threshold operation. In Proceedings of the 17th International Symposium on High Performance Computer Architecture (HPCA'11). 539--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. ASU. 2012. Predictive technology model (ptm). http://ptm.asu.edu.Google ScholarGoogle Scholar
  7. A. Banaiyanmofrad, H. Homayoun, and N. Dutt. 2011. FFT-cache: A flexible fault-tolerant cache architecture for ultra low voltage operation. In Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'11). 95--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Banaiyanmofrad, G. Girao, and N. Dutt. 2012. A novel noc--based design for fault-tolerance of last-level caches in cmps. In Proceedings of the 8th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES/ISSS'12). 63--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. M. Beckmann and D. A. Wood. 2004. Managing wire delay in large chip-multiprocessor caches. In Proceedings of the 37th International Symposium on Microarhitecture (MICRO'04). 319--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Bertozzi, L. Benini, and G. D. Micheli. 2000. Error control schemes for on-chip communication links: The energy--reliability tradeoff. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 24, 6, 818--831. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The parsec benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). 72--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Bogdan, T. Dumitras, and R. Marculescu. 2007. Stochastic communication: A new paradigm for fault-tolerant networks-on-chip. http://www.hindawi.com/journals/vlsi/2007/095348/abs/.Google ScholarGoogle Scholar
  13. B. Calhoun and A. Chandrakasan. 2006. A 256 kb sub-threshold sram in 65nm cmos. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC'06).Google ScholarGoogle Scholar
  14. C. Chen and M. Hsiao. 1984. Error-correcting codes for semiconductor memory applications: A state of the art review. IBM J. Res. Devel. 28, 2, 124--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Das, A. K. Mishra, C. Nicopoulos, P. Dongkook, V. Narayanan, et al. 2008. Performance and power optimization through data compression in network-on-chip architectures. In Proceedings of the 14th International Symposium on High Performance Computer Architecture (HPCA'08). 215--225.Google ScholarGoogle ScholarCross RefCross Ref
  16. A. Eghbal, H. Pedram, P. M. Yaghini, and H. R. Zarandi. 2010. Designing a fault-tolerant noc router architecture. Int. J. Electron. 97, 10, 1181--1192.Google ScholarGoogle ScholarCross RefCross Ref
  17. N. Enright-Jerger, L.-S. Peh, and M. Lipasti. 2008. Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'08). 35--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. Fu, T. Li, and J. A. B. Fortes. 2010. Architecting reliable multi-core network-on-chip for small scale processing technology. In Proceedings of the Design Automation Conference (DSN'10).Google ScholarGoogle ScholarCross RefCross Ref
  19. G. Girao, D. Barcelos, and F. R. Wagner. 2009. Performance and energy evaluation of memory organizations in noc-based mpsocs under latency and task migration. In Proceedings of the 17th IFIP WG 10.5/IEEE International Conference on Very Large Scale Integration (VLSI-SoC'09).Google ScholarGoogle Scholar
  20. S. M. Z. Iqbal, Y. Liang, and H. Grahn. 2010. ParMiBench: An open source benchmark for embedded multiprocessor systems. In Proceedings of Computer Architecture Letters. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Li, F. Kandemir, M. J. Irwin, and S. W. SON. 2008. A novel migration-based nuca design for chip multiprocessors. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. B. Kahng, B. Li, L. S. Peh, and K. Samadi. 2009. ORION 2.0: A fast and accurate noc power and area model for early-stage design space exploration. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'09). 423--428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Kim, D. Burger, and S. W. Keckler. 2002. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'02). 211--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Kim, D. Park, T. Theocharides, N. Vijaykrishnan, and C. R. Das. 2005. A low latency router supporting adaptivity for on-chip interconnects. In Proceedings of the 42nd Annual Design Automation Conference (DAC'05). 559--564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Kim, K. Kim, J.-Y. Kim, S.-J. Lee, and H.-J. Yoo. 2007a. Solutions for real chip implementation issues of noc and their application to memory-centric noc. In Proceedings of the 1st International Symposium on Networks-on-Chip (NOCS'07). 30--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Kim, C. Nicopoulos, and D. Park. 2006. A gracefully degrading and energy-efficient modular router architecture for on-chip networks. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA'06). 4--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J. Hoe. 2007b. Multi-bit error tolerant caches using two-dimensional error coding. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'07). 197--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. K. Koh, W. F. Wong, Y. Chen, and H. Li. 2009. Tolerating process variations in large, set associative caches: The buddy cache. ACM Trans. Archit. Code Optim. 6, 2, 1--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Kunz, G. Girao, and F. R. Wagner. 2011. Improving the efficiency of a hardware transactional memory on an noc-based mpsoc. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'11). 1--4.Google ScholarGoogle Scholar
  30. P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, Al. E. 2002. Simics: A full system simulation platform. IEEE Comput. 35, 2, 50--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Manolache, P. Eles, and Z. Peng. 2005. Fault and energy-aware communication mapping with guaranteed latency for applications implemented on noc. In Proceedings of the 42nd Design Automation Conference (DAC'05). 266--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Marescaux, E. Brockmeyer, and H. Corporaal. 2007. The impact of higher communication layers on noc supported mpsocs. In Proceedings of the International Symposium on Networks-on-Chips (NOCS'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Marculescu, U. Y. Ogras, L.-S. Peh, N. E. Jerger, and Y. Hoskote. 2009. Outstanding research problems in noc design: System, microarchitecture, and circuit perspectives. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 28, 1, 3--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Monchiero, G. Palermo, C. Silvano, and O. Villa. 2006. Exploration of distributed shared memory architectures for noc-based multiprocessors. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (IC-SAMOS'06). 144--151.Google ScholarGoogle Scholar
  35. N. Muralimanohar, R. Balasubramonian, and N. Jouppi. 2009. Cacti 6.5. Tech. rep., HP Laboratories. http://www.hpl.hp.com/research/cacti/.Google ScholarGoogle Scholar
  36. S. R. Nassif, N. Mehta, and Y. Cao. 2010. A resilience roadmap. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'10). 1011--1016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Ozdemir, D. Sinha, G. Memik, J. Adams, and H. Zhou. 2006. Yield-aware cache architectures. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06). 15--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. V. Puente, J. A. Gregorio, F. Vallejo, and R. Beivide. 2004. Immunet: A cheap and robust fault-tolerant packet routing mechanism. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA'04). 198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Pirretti, G. M. Link, R. R. Brooks, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin. 2004. Fault tolerant algorithms for network-on-chip interconnect. In Proceedings of the IEEE Symposium on VLSI. 46--51.Google ScholarGoogle Scholar
  40. D. Roberts, N. S. Kim, and T. Mudge. 2007. On-chip cache device scaling limits and effective fault repair techniques in future nanoscale technology. In Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. SUN/ORACLE. 2010. SPARC T3 processor data sheet. http://www.oracle.com/us/products/servers-storage/servers/sparc-enterprise/t-series/sparc-t3-chip-ds-173097.pdf.Google ScholarGoogle Scholar
  42. T. Thomas and B. Anthony. 1999. Area, performance, and yield implications of redundancy in on-chip caches. In Proceedings of the International Conference on Computer Design (ICCD'99). 291--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. P. M. Yaghini, A. Eghbal, H. Pedram, and H. R. Zarandi. 2010. Investigation of transient fault effects in an asynchronous noc router. In Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network Based Processing (PDP'10). 540--545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Y. Wang, L. Zhang, Y. Han, H. Li, and X. Li. 2010. Address remapping for static nuca in noc-based degradable chip-multiprocessors. In Proceedings of the 16th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. A. G. Wassal, H. H. Sarhan, A. Elsherief. 2011. Novel 3d memory-centric noc architecture for transaction-based soc applications. In Proceedings of the Saudi International Electronics, Communications and Photonics Conference (SIECPC'11). 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  46. C. Wilkerson, H. Gao, A. R. Alamelden, Z. Chishti, M. Khellah, and S.-L. Lu. 2008. Trading off cache capacity for reliability to enable low voltage operation. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). 203--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. C. Wilkerson, A. R. Alamelden, Z. Chishti, W. Wu, D. Somasekhar, and S.-L. Lu. 2010. Reducing cache power with low-cost, multi-bit error-correcting codes. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The splash-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA'95). Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. C. A. Zeferino and A. A. Susin. 2003. SoCIN: A parametric and scalable network-on-chip. In Proceedings of the 16th Symposium on Integrated Circuits and Systems Design (SBCCI'03). 169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. M. Zhang, V. M. Stojanovic, and P. Ampadu. 2012. Reliable ultra-low-voltage cache design for many-core systems. IEEE Trans. Circ. Syst. II: Express Briefs 59, 12, 858--862.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. NoC-based fault-tolerant cache design in chip multiprocessors

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!