Abstract
While plentiful on-chip memory is necessary for many designs to fully utilize an FPGA’s computational capacity, SRAM scaling is becoming more difficult because of increasing device variation. An alternative is to build FPGA block RAM (BRAM) from magnetic tunnel junctions (MTJ), as this emerging embedded memory has a small cell size, low energy usage, and good scalability. We conduct a detailed comparison study of SRAM and MTJ BRAMs that includes cell designs that are robust with device variation, transistor-level design and optimization of all the required BRAM-specific circuits, and variation-aware simulation at the 22nm node. At a 256Kb block size, MTJ-BRAM is 3.06× denser and 55% more energy efficient and its Fmax is 274MHz, which is adequate for most FPGA system clock domains. We also detail further enhancements that allow these 256 Kb MTJ BRAMs to operate at a higher speed of 353MHz for the streaming FIFOs, which are very common in FPGA designs and describe how the non-volatility of MTJ BRAM enables novel on-chip configuration and power-down modes. For a RAM architecture similar to the latest commercial FPGAs, MTJ-BRAMs could expand FPGA memory capacity by 2.95× with no die size increase.
- M. Abdelfattah and V. Betz. 2014. The case for embedded networks on chip on field-programmable gate arrays. IEEE Micro 34, 1 (2014), 80--89.Google Scholar
Cross Ref
- C. Auth, C. Allen, A. Blattner, D. Bergstrom, M. Brazier, M. Bost, M. Buehler, V. Chikarmane, T. Ghani, T. Glassman, and others. 2012. A 22nm high performance and low-power CMOS technology featuring fully-depleted tri-gate transistors, self-aligned contacts and high density MIM capacitors. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’12). 131--132.Google Scholar
Cross Ref
- X. Bi, M. Weldon, and H. Li. 2013. STT-RAM designs supporting dual-port accesses. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’13). 853--858. Google Scholar
Digital Library
- D. Boland. 2016. Reducing memory requirements for high-performance and numerically stable gaussian elimination. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’16). 244--253. Google Scholar
Digital Library
- A. Bsoul and S. Wilton. 2012. An FPGA with power-gated switch blocks. In Proceedings of the International Conference on Field-Programmable Technology (FPT’12). 87--94.Google Scholar
- C. Chiasson. 2013. Optimization and Modeling of FPGA Circuitry in Advanced Process Technology. Master’s thesis. University of Toronto.Google Scholar
- C. Chiasson and V. Betz. 2013. Should FPGAs abandon the pass-gate? In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’13). 1--8.Google Scholar
- K. Ikegami, H. Noguchi, C. Kamata, M. Amano, K. Abe, K. Kushida, E. Kitagawa, T. Ochiai, N. Shimomura, S. Itai, and others. 2014. Low power and high density STT-MRAM for embedded cache memory using advanced perpendicular MTJ integrations and asymmetric compensation techniques. In Proceedings of the International Electron Devices Meeting (IEDM’14). 650--653.Google Scholar
Cross Ref
- Intel Corporation. 2016. Stratix 10 MX (DRAM dystem-in-package) product table and other product data sheets. Retrieved from https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/pt/stratix-10-mx-product-table.pdf.Google Scholar
- Intel Corporation. 2017. Arria 10 device datasheet. Retrieved from https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/hb/arria-10/a10_datasheet.pdf.Google Scholar
- Intel Corporation. 2017. Intel FPGA buy online. Retrieved from https://www.altera.com/buy.html.Google Scholar
- ITRS. 2011. Interconnect chapter. Retrieved from http://www.itrs2.net.Google Scholar
- E. Kadric, D. Lakata, and A. DeHon. 2015. Impact of memory architecture on FPGA energy consumption. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’15). 146--155. Google Scholar
Digital Library
- S. H. Kang. 2014. Embedded STT-MRAM for energy-efficient and cost-effective mobile systems. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’14). 1--2.Google Scholar
Cross Ref
- J. Kittl, A. Lauwers, O. Chamirian, M. Van Dal, A. Akheyar, O. Richard, J. Lisoni, M. De Potter, R. Lindsay, and K. Maex. 2003. Ni based silicides: Material issues for advanced CMOS applications. In Proceedings of the International Symposium Advanced Short-time Thermal Processing for Si-based CMOS Devices. 177.Google Scholar
- D. Lewis, D. Cashman, M. Chan, J. Chromczak, G. Lai, A. Lee, T. Vanderhoek, and H. Yu. 2013. Architectural enhancements in stratix VTM. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’13). 147--156. Google Scholar
Digital Library
- N. Muralimanohar, R. Balasubramonian, and N. Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories Technical Report HPL-2009 (2009), 85.Google Scholar
- G. Nallapati, J. Zhu, J. Wang, J. Sheu, K. Cheng, C. Gan, D. Yang, M. Cai, J. Cheng, L. Ge, and others. 2014. Cost and power/performance optimized 20nm SoC technology for advanced mobile devices. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’14). 1--2.Google Scholar
Cross Ref
- T. Ngai, J. Rose, and S. Wilton. 1995. An SRAM-programmable field-configurable memory. In Proceedings of the Custom Integrated Circuits Conference (CICC’95). 499--502.Google Scholar
- H. Nii, T. Sanuki, Y. Okayama, K. Ota, T. Iwamoto, T. Fujimaki, T. Kimura, R. Watanabe, T. Komoda, A. Eiho, and others. 2006. A 45nm high performance bulk logic platform technology (CMOS6) using ultra high NA (1.07) immersion lithography with hybrid dual-damascene structure and porous low-k BEOL. In Proceedings of the International Electron Devices Meeting (IEDM’06). 1--4.Google Scholar
Cross Ref
- H. Noguchi, K. Kushida, K. Ikegami, K. Abe, E. Kitagawa, S. Kashiwada, C. Kamata, A. Kawasumi, H. Hara, and S. Fujita. 2013. A 250-MHz 256b-I/O 1-Mb STT-MRAM with advanced perpendicular MTJ based dual cell for nonvolatile magnetic caches to reduce active power of processors. In Proceedings of the Symposium on VLSI Technology and Circuits (VLSI’13). C108--C109.Google Scholar
- M. O’Connor. 2014. Highlights of the high-bandwidth memory (HBM) standard. In Proceedings of the Memory Forum, a Workshop at the International Symposium on Computer Architecture (ISCA’14).Google Scholar
- A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, Gopi P. Gopal, Jan Gray, and others. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the International Symposium on Computer Architecture (ISCA’14). 13--24. Google Scholar
Digital Library
- R. Rashid, G. Steffan, and V. Betz. 2014. Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS. In Proceedings of the International Conference on Field-Programmable Technology (FPT’14). 20--27.Google Scholar
- D. Saida, N. Shimomura, E. Kitagawa, C. Kamata, M. Yakabe, Yu. Osawa, S. Fujita, and J. Ito. 2014. Low-current high-speed spin-transfer switching in a perpendicular magnetic tunnel junction for cache memory in mobile processors. IEEE Trans. Magn. 50, 11 (2014), 3401105.Google Scholar
Cross Ref
- R. Stefan and S. Cotofana. 2008. Bitstream compression techniques for Virtex 4 FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’08). 323--328.Google Scholar
- K. Tatsumura, S. Yazdanshenas, and V. Betz. 2016. High density, low energy, magnetic tunnel junction based block RAMs for memory-rich FPGAs. In Proceedings of the International Conference on Field-Programmable Technology (FPT’16). 4--11.Google Scholar
- R. Tessier, V. Betz, D. Neto, A. Egier, and T. Gopalsamy. 2007. Power-efficient RAM mapping algorithms for FPGA embedded memory blocks. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 26, 2 (2007), 278--290. Google Scholar
Digital Library
- L. Thomas, G. Jan, J. Zhu, H. Liu, Y.-J Lee, S. Le, R. Tong, K. Pi, Y. Wang, D. Shen, and others. 2014. Perpendicular spin transfer torque magnetic random access memories with high spin torque efficiency and thermal stability for embedded applications. J. Appl. Phys. 115, 17 (2014), 172615.Google Scholar
Cross Ref
- K. Tsuchida, T. Inaba, K. Fujita, Y. Ueda, T. Shimizu, Y. Asao, T. Kajiyama, M. Iwayama, K. Sugiura, S. Ikegawa, T. Kishi, T. Kai, M. Amano, N. Shimomura, H. Yoda, and Y. Watanabe. 2010. A 64Mb MRAM with clamped-reference and adequate-reference schemes. In Proceedings of the International Solid-State Circuits Conference (ISSCC’10). 258--259.Google Scholar
- S. Williams, A. Waterman, and D. Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76. Google Scholar
Digital Library
- S. Wilton, J. Rose, and Z. Vranesic. 1995. Architecture of centralized field-configurable memory. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’95). 97--103. Google Scholar
Digital Library
- H. Wong, V. Betz, and J. Rose. 2011. Comparing FPGA vs. custom CMOS and the impact on processor microarchitecture. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’11). 5--14. Google Scholar
Digital Library
- Xilinx Incorporated. 2015. Ultrascale architecture and product overview and other product data sheets. Retrieved from https://www.xilinx.com/support/documentation/data_sheets/ds890-ultrascale-overview.pdf.Google Scholar
- S. Yazdanshenas, K. Tatsumura, and V. Betz. 2017. Don’t forget the memory: Automatic block RAM modelling, optimization, and architecture exploration. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’17). 115--124. Google Scholar
Digital Library
- C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’15). 161--170. Google Scholar
Digital Library
- W. Zhao and Y. Cao. 2007. Predictive technology model for nano-CMOS design exploration. ACM J. Emerg. Technol. Comput. Syst. 3, 1 (2007), 1. Google Scholar
Digital Library
Index Terms
Enhancing FPGAs with Magnetic Tunnel Junction-Based Block RAMs
Recommendations
Energy Efficient Magnetic Tunnel Junction Based Hybrid LSI Using Multi-Threshold UTBB-FD-SOI Device
GLSVLSI '17: Proceedings of the on Great Lakes Symposium on VLSI 2017The energy scalability of ultra-low power nonvolatile (NV) large-scale integration (LSI) is explored in this paper. Multi-threshold computing (super/near/sub-$V_t$) in hybrid CMOS/ magnetic tunnel junction (MTJ) circuits are investigated based on SPICE-...
Design of a spintronic arithmetic and logic unit using magnetic tunnel junctions
CF '08: Proceedings of the 5th conference on Computing frontiersConventional electronics technology uses an electron's charge to store information and a current of electrons to transfer information. Spintronics technology, in contrast, uses an electron's 'spin' in addition to its charge to transfer and store ...
A time-multiplexed FPGA
FCCM '97: Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing MachinesThis paper describes the architecture of a time-multiplexed FPGA. Eight configurations of the FPGA are stored in on-chip memory. This inactive on-chip memory is distributed around the chip, and accessible so that the entire configuration of the FPGA can ...






Comments