skip to main content
research-article

An Efficient Memory Partitioning Approach for Multi-Pattern Data Access via Data Reuse

Published:05 February 2019Publication History
Skip Abstract Section

Abstract

Memory bandwidth has become a bottleneck that impedes performance improvement during the parallelism optimization of the datapath. Memory partitioning is a practical approach to reduce bank-level conflicts and increase the bandwidth on a field-programmable gate array. In this work, we propose a memory partitioning approach for multi-pattern data access. First, we propose to combine multiple patterns into a single pattern to reduce the complexity of multi-pattern. Then, we propose to perform data reuse analysis on the combined pattern to find data reuse opportunities and the non-reusable data pattern. Finally, an efficient bank mapping algorithm with low complexity and low overhead is proposed to find the optimal memory partitioning solution. Experimental results demonstrated that compared to the state-of-the-art method, our proposed approach can reduce the number of block RAMS by 58.9% on average, with 79.6% reduction in SLICEs, 85.3% reduction in LUTs, 67.9% in reduction Flip-Flops, 54.6% reduction in DSP48Es, 83.9% reduction in SRLs, 50.0% reduction in storage overhead, 95.0% reduction in execution time, and 77.3% reduction in dynamic power consumption on average. Meanwhile, the performance can be improved by 14.0% on average.

References

  1. Cadence. 2018. C-to-Silicon Compiler. Retrieved June 19, 2018 from http://www.cadence.com/products/.Google ScholarGoogle Scholar
  2. Mentor. 2018. Catapult C. Retrieved June 19, 2018 from http://calypto.com/.Google ScholarGoogle Scholar
  3. Cadence. 2018. Cynthesizer. Retrieved June 19, 2018 from http://www.forteds.com/.Google ScholarGoogle Scholar
  4. Synopsys. 2018. Synphony C Compiler. Retrieved June 19, 2018 from http://www.synopsys.com/.Google ScholarGoogle Scholar
  5. Xilinx. 2018. Vivado High-Level Synthesis. Retrieved June 19, 2018 from http://www.xilinx.com/.Google ScholarGoogle Scholar
  6. Xilinx. 2018. Xilinx Virtex-7 VC707 Evaluation Kit. Retrieved June 19, 2018 from http://www.xilinx.com/products/boards-and-kits/ek-v7-vc707-g.html.Google ScholarGoogle Scholar
  7. Xilinx. 2018. Xilinx Vivado Design Suite 2016.4. Retrieved June 19, 2018 from http://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vivado-design-tools/2016-4.html.Google ScholarGoogle Scholar
  8. Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2007. Compilers: Principles, Techniques and Tools. Pearson Education. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan. 1995. Software pipelining. ACM Computing Surveys 27, 3 (1995), 367--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yosi Ben Asher and Nadav Rotem. 2010. Automatic memory partitioning: Increasing memory parallelism via data structure partitioning. In Proceedings of the 8th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. 155--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Cilardo and L. Gallo. 2015. Improving multibank memory access parallelism with lattice-based partitioning. ACM Transactions on Architecture and Code Optimization 11, 45 (2015), 1--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alessandro Cilardo and Luca Gallo. 2015. Interplay of loop unrolling and multidimensional memory partitioning in HLS. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’15). 163--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Cong, W. Jiang, B. Liu, and Y. Zou. 2011. Automatic memory partitioning and scheduling for throughput and power optimization. ACM Transaction on Design Automation of Electronic Systems16, 2 (2011), Article 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Cong, P. Li, B. Xiao, and P. Zhang. 2014. An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioning of data reuse buffers. In Proceedings of the 51st Annual Design Automation Conference (DAC’14). 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Cong, P. Zhang, and Y. Zou. 2012. Optimizing memory hierarchy allocation with loop transformations for high-level synthesis. In Proceedings of the 49th Annual Design Automation Conference (DAC’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Juan Escobedo and Mingjie Lin. 2016. Tessellation-based multi-block memory mapping scheme for high-level synthesis with FPGA. In Proceedings of the 2016 International Conference on Field-Programmable Technology (FPT’16). IEEE, Los Alamitos, CA, 125--132.Google ScholarGoogle ScholarCross RefCross Ref
  17. M. Fingeroff. 2010. High-level Synthesis Blue Book. Xlibris Corporation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. Issenin, E. Brockmeyer, M. Miranda, and N. Dutt. 2007. DRDU: A data reuse analysis technique for efficient scratch-pad memory management. ACM Transactions on Design Automation of Electronic Systems 12, 2(2007), Article 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. K. Jeong, D. H. Yoon, D. Sunwoo, M. Sullivan, I. Lee, and M. Erez. 2012. Balancing DRAM locality and parallelism in shared memory CMP systems. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Li, Y. Wang, P. Zhang, G. Luo, T. Wang, and J. Cong. 2012. Memory partitioning and scheduling co-optimization in behavioral synthesis. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’12). 488--495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Liu, Z. Cui, M. Xing, Y. Bao, M. Chen, and C. Wu. 2012. A software memory partition approach for eliminating bank-level interference in multicore systems. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 367--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Liu, Y. Li, C. Ding, H. Yang, and C. Wu. 2016. Rethinking memory management in modern operating system: Horizontal, vertical or random? IEEE Transactions on Computers 65, 6 (2016), 1921--1935.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Q. Liu, T. Todman, and W. Luk. 2010. Combining optimizations in automated low power design. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’10). 1791--1796. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Meng, S. Yin, P. Ouyang, L. Liu, and S. Wei. 2015. Efficient memory partitioning for parallel data access in multidimensional arrays. In Proceedings of the 52th Annual Design Automation Conference (DAC’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. Mi, X. Feng, J. Xue, and Y. Jia. 2010. Software-hardware cooperative DRAM bank partitioning for chip multiprocessors. In Proceedings of the IFIP International Conference on Network and Parallel Computing. 329--343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda. 2011. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of the International Symposium on Microarchitecture. 374--385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. 2013. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the 2013 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. M. S. Prewitt. 1970. Picture Processing and Psychopictorics. Academic Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Su, F. Yang, X. Zeng, and D. Zhou. 2016. Efficient memory partitioning for parallel data access via data reuse. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’16). 138--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Tatsumi and H. Mattausch. 1999. Fast quadratic increase of multiport-storage-cell area with port number. Electronic Letters 35, 25 (1999), 2185--2187.Google ScholarGoogle ScholarCross RefCross Ref
  31. Y. Wang, P. Li, and J. Cong. 2014. Theory and algorithm for generalized memory partitioning in high-level synthesis. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Wang, P. Li, P. Zhang, C. Zhang, and J. Cong. 2013. Memory partitioning for multidimensional arrays in high-level synthesis. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. Wang, P. Zhang, X. Cheng, and J. Cong. 2012. An integrated and automated memory optimization flow for FPGA behavioral synthesis. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP--DAC’12). 257--262.Google ScholarGoogle Scholar
  34. M. Xie, D. Tong, Y. Feng, K. Huang, and X. Cheng. 2013. Page policy control with memory partitioning for DRAM performance and power efficiency. In Proceedings of the International Symposium on Low Power Electronics and Design. 298--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Xie, D. Tong, K. Huang, and X. Cheng. 2014. Improving system throughput and fairness simultaneously in shared memory CMP systems via dynamic bank partitioning. In Proceedings of the International Symposium on High Performance Computer Architecture. 344--355.Google ScholarGoogle Scholar
  36. Shouyi Yin, Zhicong Xie, Chenyue Meng, Leibo Liu, and Shaojun Wei. 2016. Multibank memory optimization for parallel data access in multiple data arrays. In Proceedings of the 35th International Conference on Computer-Aided Design (ICCAD’16). ACM, New York, NY, 32:1--32:8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Shouyi Yin, Zhicong Xie, Chenyue Meng, Peng Ouyang, Leibo Liu, and Shaojun Wei. 2017. Memory partitioning for parallel multi-pattern data access in multiple data arrays. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 2 (2017), 431--444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yuan Zhou, Khalid Al-Hawaj, and Zhiru Zhang. 2017. A new approach to automatic memory banking using trace-baced address mining. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’17). 179--188. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Efficient Memory Partitioning Approach for Multi-Pattern Data Access via Data Reuse

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 12, Issue 1
      March 2019
      115 pages
      ISSN:1936-7406
      EISSN:1936-7414
      DOI:10.1145/3310278
      • Editor:
      • Deming Chen
      Issue’s Table of Contents

      Copyright © 2019 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 February 2019
      • Accepted: 1 November 2018
      • Revised: 1 October 2018
      • Received: 1 December 2017
      Published in trets Volume 12, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!