skip to main content
research-article

Enhancing FPGAs with Magnetic Tunnel Junction-Based Block RAMs

Published:26 January 2018Publication History
Skip Abstract Section

Abstract

While plentiful on-chip memory is necessary for many designs to fully utilize an FPGA’s computational capacity, SRAM scaling is becoming more difficult because of increasing device variation. An alternative is to build FPGA block RAM (BRAM) from magnetic tunnel junctions (MTJ), as this emerging embedded memory has a small cell size, low energy usage, and good scalability. We conduct a detailed comparison study of SRAM and MTJ BRAMs that includes cell designs that are robust with device variation, transistor-level design and optimization of all the required BRAM-specific circuits, and variation-aware simulation at the 22nm node. At a 256Kb block size, MTJ-BRAM is 3.06× denser and 55% more energy efficient and its Fmax is 274MHz, which is adequate for most FPGA system clock domains. We also detail further enhancements that allow these 256 Kb MTJ BRAMs to operate at a higher speed of 353MHz for the streaming FIFOs, which are very common in FPGA designs and describe how the non-volatility of MTJ BRAM enables novel on-chip configuration and power-down modes. For a RAM architecture similar to the latest commercial FPGAs, MTJ-BRAMs could expand FPGA memory capacity by 2.95× with no die size increase.

References

  1. M. Abdelfattah and V. Betz. 2014. The case for embedded networks on chip on field-programmable gate arrays. IEEE Micro 34, 1 (2014), 80--89.Google ScholarGoogle ScholarCross RefCross Ref
  2. C. Auth, C. Allen, A. Blattner, D. Bergstrom, M. Brazier, M. Bost, M. Buehler, V. Chikarmane, T. Ghani, T. Glassman, and others. 2012. A 22nm high performance and low-power CMOS technology featuring fully-depleted tri-gate transistors, self-aligned contacts and high density MIM capacitors. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’12). 131--132.Google ScholarGoogle ScholarCross RefCross Ref
  3. X. Bi, M. Weldon, and H. Li. 2013. STT-RAM designs supporting dual-port accesses. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’13). 853--858. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Boland. 2016. Reducing memory requirements for high-performance and numerically stable gaussian elimination. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’16). 244--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Bsoul and S. Wilton. 2012. An FPGA with power-gated switch blocks. In Proceedings of the International Conference on Field-Programmable Technology (FPT’12). 87--94.Google ScholarGoogle Scholar
  6. C. Chiasson. 2013. Optimization and Modeling of FPGA Circuitry in Advanced Process Technology. Master’s thesis. University of Toronto.Google ScholarGoogle Scholar
  7. C. Chiasson and V. Betz. 2013. Should FPGAs abandon the pass-gate? In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’13). 1--8.Google ScholarGoogle Scholar
  8. K. Ikegami, H. Noguchi, C. Kamata, M. Amano, K. Abe, K. Kushida, E. Kitagawa, T. Ochiai, N. Shimomura, S. Itai, and others. 2014. Low power and high density STT-MRAM for embedded cache memory using advanced perpendicular MTJ integrations and asymmetric compensation techniques. In Proceedings of the International Electron Devices Meeting (IEDM’14). 650--653.Google ScholarGoogle ScholarCross RefCross Ref
  9. Intel Corporation. 2016. Stratix 10 MX (DRAM dystem-in-package) product table and other product data sheets. Retrieved from https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/pt/stratix-10-mx-product-table.pdf.Google ScholarGoogle Scholar
  10. Intel Corporation. 2017. Arria 10 device datasheet. Retrieved from https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/hb/arria-10/a10_datasheet.pdf.Google ScholarGoogle Scholar
  11. Intel Corporation. 2017. Intel FPGA buy online. Retrieved from https://www.altera.com/buy.html.Google ScholarGoogle Scholar
  12. ITRS. 2011. Interconnect chapter. Retrieved from http://www.itrs2.net.Google ScholarGoogle Scholar
  13. E. Kadric, D. Lakata, and A. DeHon. 2015. Impact of memory architecture on FPGA energy consumption. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’15). 146--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. H. Kang. 2014. Embedded STT-MRAM for energy-efficient and cost-effective mobile systems. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’14). 1--2.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Kittl, A. Lauwers, O. Chamirian, M. Van Dal, A. Akheyar, O. Richard, J. Lisoni, M. De Potter, R. Lindsay, and K. Maex. 2003. Ni based silicides: Material issues for advanced CMOS applications. In Proceedings of the International Symposium Advanced Short-time Thermal Processing for Si-based CMOS Devices. 177.Google ScholarGoogle Scholar
  16. D. Lewis, D. Cashman, M. Chan, J. Chromczak, G. Lai, A. Lee, T. Vanderhoek, and H. Yu. 2013. Architectural enhancements in stratix VTM. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’13). 147--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Muralimanohar, R. Balasubramonian, and N. Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories Technical Report HPL-2009 (2009), 85.Google ScholarGoogle Scholar
  18. G. Nallapati, J. Zhu, J. Wang, J. Sheu, K. Cheng, C. Gan, D. Yang, M. Cai, J. Cheng, L. Ge, and others. 2014. Cost and power/performance optimized 20nm SoC technology for advanced mobile devices. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’14). 1--2.Google ScholarGoogle ScholarCross RefCross Ref
  19. T. Ngai, J. Rose, and S. Wilton. 1995. An SRAM-programmable field-configurable memory. In Proceedings of the Custom Integrated Circuits Conference (CICC’95). 499--502.Google ScholarGoogle Scholar
  20. H. Nii, T. Sanuki, Y. Okayama, K. Ota, T. Iwamoto, T. Fujimaki, T. Kimura, R. Watanabe, T. Komoda, A. Eiho, and others. 2006. A 45nm high performance bulk logic platform technology (CMOS6) using ultra high NA (1.07) immersion lithography with hybrid dual-damascene structure and porous low-k BEOL. In Proceedings of the International Electron Devices Meeting (IEDM’06). 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  21. H. Noguchi, K. Kushida, K. Ikegami, K. Abe, E. Kitagawa, S. Kashiwada, C. Kamata, A. Kawasumi, H. Hara, and S. Fujita. 2013. A 250-MHz 256b-I/O 1-Mb STT-MRAM with advanced perpendicular MTJ based dual cell for nonvolatile magnetic caches to reduce active power of processors. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’13). C108--C109.Google ScholarGoogle Scholar
  22. M. O’Connor. 2014. Highlights of the high-bandwidth memory (HBM) standard. In Proceedings of the Memory Forum, a Workshop at the International Symposium on Computer Architecture (ISCA’14).Google ScholarGoogle Scholar
  23. A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, Gopi P. Gopal, Jan Gray, and others. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the International Symposium on Computer Architecture (ISCA’14). 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Rashid, G. Steffan, and V. Betz. 2014. Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS. In Proceedings of the International Conference on Field-Programmable Technology (FPT’14). 20--27.Google ScholarGoogle Scholar
  25. D. Saida, N. Shimomura, E. Kitagawa, C. Kamata, M. Yakabe, Yu. Osawa, S. Fujita, and J. Ito. 2014. Low-current high-speed spin-transfer switching in a perpendicular magnetic tunnel junction for cache memory in mobile processors. IEEE Trans. Magn. 50, 11 (2014), 3401105.Google ScholarGoogle ScholarCross RefCross Ref
  26. R. Stefan and S. Cotofana. 2008. Bitstream compression techniques for Virtex 4 FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’08). 323--328.Google ScholarGoogle Scholar
  27. K. Tatsumura, S. Yazdanshenas, and V. Betz. 2016. High density, low energy, magnetic tunnel junction based block RAMs for memory-rich FPGAs. In Proceedings of the International Conference on Field-Programmable Technology (FPT’16). 4--11.Google ScholarGoogle Scholar
  28. R. Tessier, V. Betz, D. Neto, A. Egier, and T. Gopalsamy. 2007. Power-efficient RAM mapping algorithms for FPGA embedded memory blocks. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 26, 2 (2007), 278--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Thomas, G. Jan, J. Zhu, H. Liu, Y.-J Lee, S. Le, R. Tong, K. Pi, Y. Wang, D. Shen, and others. 2014. Perpendicular spin transfer torque magnetic random access memories with high spin torque efficiency and thermal stability for embedded applications. J. Appl. Phys. 115, 17 (2014), 172615.Google ScholarGoogle ScholarCross RefCross Ref
  30. K. Tsuchida, T. Inaba, K. Fujita, Y. Ueda, T. Shimizu, Y. Asao, T. Kajiyama, M. Iwayama, K. Sugiura, S. Ikegawa, T. Kishi, T. Kai, M. Amano, N. Shimomura, H. Yoda, and Y. Watanabe. 2010. A 64Mb MRAM with clamped-reference and adequate-reference schemes. In Proceedings of the International Solid-State Circuits Conference (ISSCC’10). 258--259.Google ScholarGoogle Scholar
  31. S. Williams, A. Waterman, and D. Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Wilton, J. Rose, and Z. Vranesic. 1995. Architecture of centralized field-configurable memory. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’95). 97--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Wong, V. Betz, and J. Rose. 2011. Comparing FPGA vs. custom CMOS and the impact on processor microarchitecture. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’11). 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xilinx Incorporated. 2015. Ultrascale architecture and product overview and other product data sheets. Retrieved from https://www.xilinx.com/support/documentation/data_sheets/ds890-ultrascale-overview.pdf.Google ScholarGoogle Scholar
  35. S. Yazdanshenas, K. Tatsumura, and V. Betz. 2017. Don’t forget the memory: Automatic block RAM modelling, optimization, and architecture exploration. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’17). 115--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’15). 161--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. W. Zhao and Y. Cao. 2007. Predictive technology model for nano-CMOS design exploration. ACM J. Emerg. Technol. Comput. Syst. 3, 1 (2007), 1. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Enhancing FPGAs with Magnetic Tunnel Junction-Based Block RAMs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 11, Issue 1
          Special Section on FCCM 2016 and Regular Papers
          March 2018
          183 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/3178391
          • Editor:
          • Steve Wilton
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 January 2018
          • Accepted: 1 October 2017
          • Received: 1 June 2017
          Published in trets Volume 11, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!