Abstract
Multiported RAMs are essential for high-performance parallel computation systems. VLIW and vector processors, CGRAs, DSPs, CMPs, and other processing systems often rely upon multiported memories for parallel access. Although memories with a large number of read and write ports are important, their high implementation cost means that they are used sparingly. As a result, FPGA vendors only provide dual-ported block RAMs (BRAMs) to handle the majority of usage patterns. Furthermore, recent attempts to create FPGA-based multiported memories suffer from low storage utilization. Whereas most approaches provide simple unidirectional ports with a fixed read or write, others propose true bidirectional ports where each port dynamically switches read and write. True RAM ports are useful for systems with transceivers and provide high RAM flexibility; however, this flexibility incurs high BRAM consumption. In this article, a novel, modular, and BRAM-based switched multiported RAM architecture is proposed. In addition to unidirectional ports with fixed read/write, this switched architecture allows a group of write ports to switch with another group of read ports dynamically, hence altering the number of active ports. The proposed switched-ports architecture is less flexible than a true-multiported RAM where each port is switched individually. Nevertheless, switched memories can dramatically reduce BRAM consumption compared to true ports for systems with alternating port requirements. Previous live-value-table (LVT) and XOR approaches are merged and optimized into a generalized and modular structure that we call an invalidation-based live-value-table (I-LVT). Like a regular LVT, the I-LVT determines the correct bank to read from, but it differs in how updates to the table are made; the LVT approach requires multiple write ports, often leading to an area-intensive register-based implementation, whereas the XOR approach suffers from excessive storage overhead since wider memories are required to accommodate the XOR-ed data. Two specific I-LVT implementations are proposed and evaluated: binary and thermometer coding. The I-LVT approach is especially suitable for deep memories because the table is implemented only in SRAM cells. The I-LVT method gives higher performance while occupying fewer BRAMs than earlier approaches: for several configurations, BRAM usage is reduced by greater than 44% and clock speed is improved by greater than 76%. The I-LVT can be used with fixed ports, true ports, or the proposed switched ports architectures. Formal proofs for the suggested methods, resources consumption analysis, usage guidelines, and analytic comparison to other methods are provided. A fully parameterized Verilog implementation is released as an open source library. The library has been extensively tested using Altera’s EDA tools.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Modular Switched Multiported SRAM-Based Memories
- Ameer M. S. Abdelhadi and Guy G. F. Lemieux. 2014. Modular multi-ported SRAM-based memories. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’14). 35--44. Google Scholar
Digital Library
- Ameer M. S. Abdelhadi and Guy G. F. Lemieux. 2015. Switched Multi-Ported RAM Verilog Source Code. Retrieved April 18, 2016, from https://github.com/AmeerAbdelhadi/Switched-Multiported-RAM.Google Scholar
- D. Alpert and D. Avnon. 1993. Architecture of the Pentium microprocessor. IEEE Micro 13, 3, 11--21. Google Scholar
Digital Library
- Altera Corp. 2013. Stratix V Device Handbook. Available at https://www.altera.com.Google Scholar
- H. Bajwa and X. Chen. 2007. Low-power high-performance and dynamically configured multi-port cache memory architecture. In Proceedings of the International Conference on Electrical Engineering (ICEE’07). 1--6.Google Scholar
- A. Brant, A. Abdelhadi, A. Severance, and G. G. F. Lemieux. 2012. Pipeline frequency boosting: Hiding dual-ported block RAM latency using intentional clock skew. In Proceedings of the 2012 International Conference on Field-Programmable Technology (FPT’12). 235--238.Google Scholar
Cross Ref
- B. A. Chappell, T. I. Chappell, M. K. Ebcioglu, and S. E. Schuster. 1996. Virtual multi-port RAM employing multiple accesses during single machine cycle. US Patent 5,542,067.Google Scholar
- J. Choi, K. Nam, A. Canis, J. Anderson, S. Brown, and T. Czajkowski. 2012. Impact of cache architecture and interface on performance and area of FPGA-based processor/parallel-accelerator systems. In Proceedings of the 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines (FCCM’12). 17--24. Google Scholar
Digital Library
- G. S. Ditlow, R. K. Montoye, S. N. Storino, S. M. Dance, S. Ehrenreich, B. M. Fleischer, T. W. Fox, K. M. Holmes, J. Mihara, Y. Nakamura, S. Onishi, R. Shearer, D. Wendel, and L. Chang. 2011. A 4R2W register file for a 2.3GHz wire-speed POWER™ processor with double-pumped write operation. In Proceedings of the 2011 IEEE International Solid-State Circuits Conference (ISSCC’11). 256--258.Google Scholar
- E. S. Fetzer and J. T. Orton. 2002. A fully-bypassed 6-issue integer datapath and register file on an Itanium microprocessor. In Proceedings of the 2002 IEEE International Solid-State Circuits Conference (ISSCC’02), Vol. 1. 420--478.Google Scholar
- Joseph A. Fisher. 1983. Very long instruction word architectures and the ELI-512. In Proceedings of the 10th Annual International Symposium on Computer Architecture (ISCA’83). ACM, New York, NY, 140--150. Google Scholar
Digital Library
- Weixing Ji, Feng Shi, Baojun Qiao, and Hong Song. 2007. Multi-port memory design methodology based on block read and write. In Proceedings of the IEEE International Conference on Control and Automation (ICCA’07). 256--259.Google Scholar
- R. E. Kessler. 1999. The Alpha 21264 microprocessor. IEEE Micro 19, 2, 24--36. Google Scholar
Digital Library
- Z. Kwok and S. J. E. Wilton. 2005. Register file architecture optimization in a coarse-grained reconfigurable architecture. In Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. 35--44. Google Scholar
Digital Library
- C. E. LaForest, Z. Li, T. O’Rourke, M. G. Liu, and J. G. Steffan. 2014. Composing multi-ported memories on FPGAs. ACM Transactions on Reconfigurable Technology and Systems 7, 3, Article No. 16. Google Scholar
Digital Library
- C. E. LaForest, M. G. Liu, E. R. Rapati, and J. G. Steffan. 2012. Multi-ported memories for FPGAs via XOR. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’12). 209--218. Google Scholar
Digital Library
- C. E. LaForest and J. G. Steffan. 2010. Efficient multi-ported memories for FPGAs. In Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’10). 41--50. Google Scholar
Digital Library
- H. J. Mattausch. 1997. Hierarchical N-port memory architecture based on 1-port memory cells. In Proceedings of the 23rd European Solid-State Circuits Conference (ESSCIRC’97). 348--351.Google Scholar
- J. H. Tseng and K. Asanović. 2003. Banked multiported register files for high-frequency superscalar microprocessors. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA’03). 62--71. Google Scholar
Digital Library
- H. Yokota. 1990. Multiport memory system. US Patent 4,930,066.Google Scholar
- Wang Zuo, Wang Zuo, and Li Jiaxing. 2008. An intelligent multi-port memory. In Proceedings of the International Symposium on Intelligent Information Technology Application Workshops (IITAW’08). 251--254. Google Scholar
Digital Library
Index Terms
Modular Switched Multiported SRAM-Based Memories
Recommendations
Modular multi-ported SRAM-based memories
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arraysMulti-ported RAMs are essential for high-performance parallel computation systems. VLIW and vector processors, CGRAs, DSPs, CMPs and other processing systems often rely upon multi-ported memories for parallel access, hence higher performance. Although ...
Embedded non-volatile memories
SBCCI '07: Proceedings of the 20th annual conference on Integrated circuits and systems designThis tutorial covers trends in embedded non-volatile memories including details of issues for scaling NAND and NOR flash and descriptions of scaled flash memory technologies and various evolutionary flash memory technologies such as trapping site ...
Embedded Memories: Progress and a Look into the Future
Memories are categorized as embedded memories (e-memories) and stand-alone memories. E-memories favor high speed rather than low cost. In addition, they must maintain compatibility with the logic process, because they must be cofabricated on the same ...






Comments