Abstract
Memory bandwidth has become a bottleneck that impedes performance improvement during the parallelism optimization of the datapath. Memory partitioning is a practical approach to reduce bank-level conflicts and increase the bandwidth on a field-programmable gate array. In this work, we propose a memory partitioning approach for multi-pattern data access. First, we propose to combine multiple patterns into a single pattern to reduce the complexity of multi-pattern. Then, we propose to perform data reuse analysis on the combined pattern to find data reuse opportunities and the non-reusable data pattern. Finally, an efficient bank mapping algorithm with low complexity and low overhead is proposed to find the optimal memory partitioning solution. Experimental results demonstrated that compared to the state-of-the-art method, our proposed approach can reduce the number of block RAMS by 58.9% on average, with 79.6% reduction in SLICEs, 85.3% reduction in LUTs, 67.9% in reduction Flip-Flops, 54.6% reduction in DSP48Es, 83.9% reduction in SRLs, 50.0% reduction in storage overhead, 95.0% reduction in execution time, and 77.3% reduction in dynamic power consumption on average. Meanwhile, the performance can be improved by 14.0% on average.
- Cadence. 2018. C-to-Silicon Compiler. Retrieved June 19, 2018 from http://www.cadence.com/products/.Google Scholar
- Mentor. 2018. Catapult C. Retrieved June 19, 2018 from http://calypto.com/.Google Scholar
- Cadence. 2018. Cynthesizer. Retrieved June 19, 2018 from http://www.forteds.com/.Google Scholar
- Synopsys. 2018. Synphony C Compiler. Retrieved June 19, 2018 from http://www.synopsys.com/.Google Scholar
- Xilinx. 2018. Vivado High-Level Synthesis. Retrieved June 19, 2018 from http://www.xilinx.com/.Google Scholar
- Xilinx. 2018. Xilinx Virtex-7 VC707 Evaluation Kit. Retrieved June 19, 2018 from http://www.xilinx.com/products/boards-and-kits/ek-v7-vc707-g.html.Google Scholar
- Xilinx. 2018. Xilinx Vivado Design Suite 2016.4. Retrieved June 19, 2018 from http://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vivado-design-tools/2016-4.html.Google Scholar
- Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2007. Compilers: Principles, Techniques and Tools. Pearson Education. Google Scholar
Digital Library
- V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan. 1995. Software pipelining. ACM Computing Surveys 27, 3 (1995), 367--432. Google Scholar
Digital Library
- Yosi Ben Asher and Nadav Rotem. 2010. Automatic memory partitioning: Increasing memory parallelism via data structure partitioning. In Proceedings of the 8th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. 155--162. Google Scholar
Digital Library
- A. Cilardo and L. Gallo. 2015. Improving multibank memory access parallelism with lattice-based partitioning. ACM Transactions on Architecture and Code Optimization 11, 45 (2015), 1--25. Google Scholar
Digital Library
- Alessandro Cilardo and Luca Gallo. 2015. Interplay of loop unrolling and multidimensional memory partitioning in HLS. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’15). 163--168. Google Scholar
Digital Library
- J. Cong, W. Jiang, B. Liu, and Y. Zou. 2011. Automatic memory partitioning and scheduling for throughput and power optimization. ACM Transaction on Design Automation of Electronic Systems16, 2 (2011), Article 15. Google Scholar
Digital Library
- J. Cong, P. Li, B. Xiao, and P. Zhang. 2014. An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioning of data reuse buffers. In Proceedings of the 51st Annual Design Automation Conference (DAC’14). 1--6. Google Scholar
Digital Library
- J. Cong, P. Zhang, and Y. Zou. 2012. Optimizing memory hierarchy allocation with loop transformations for high-level synthesis. In Proceedings of the 49th Annual Design Automation Conference (DAC’12). Google Scholar
Digital Library
- Juan Escobedo and Mingjie Lin. 2016. Tessellation-based multi-block memory mapping scheme for high-level synthesis with FPGA. In Proceedings of the 2016 International Conference on Field-Programmable Technology (FPT’16). IEEE, Los Alamitos, CA, 125--132.Google Scholar
Cross Ref
- M. Fingeroff. 2010. High-level Synthesis Blue Book. Xlibris Corporation. Google Scholar
Digital Library
- I. Issenin, E. Brockmeyer, M. Miranda, and N. Dutt. 2007. DRDU: A data reuse analysis technique for efficient scratch-pad memory management. ACM Transactions on Design Automation of Electronic Systems 12, 2(2007), Article 15. Google Scholar
Digital Library
- M. K. Jeong, D. H. Yoon, D. Sunwoo, M. Sullivan, I. Lee, and M. Erez. 2012. Balancing DRAM locality and parallelism in shared memory CMP systems. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture. 1--12. Google Scholar
Digital Library
- P. Li, Y. Wang, P. Zhang, G. Luo, T. Wang, and J. Cong. 2012. Memory partitioning and scheduling co-optimization in behavioral synthesis. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’12). 488--495. Google Scholar
Digital Library
- L. Liu, Z. Cui, M. Xing, Y. Bao, M. Chen, and C. Wu. 2012. A software memory partition approach for eliminating bank-level interference in multicore systems. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 367--376. Google Scholar
Digital Library
- L. Liu, Y. Li, C. Ding, H. Yang, and C. Wu. 2016. Rethinking memory management in modern operating system: Horizontal, vertical or random? IEEE Transactions on Computers 65, 6 (2016), 1921--1935.Google Scholar
Digital Library
- Q. Liu, T. Todman, and W. Luk. 2010. Combining optimizations in automated low power design. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’10). 1791--1796. Google Scholar
Digital Library
- C. Meng, S. Yin, P. Ouyang, L. Liu, and S. Wei. 2015. Efficient memory partitioning for parallel data access in multidimensional arrays. In Proceedings of the 52th Annual Design Automation Conference (DAC’15). Google Scholar
Digital Library
- W. Mi, X. Feng, J. Xue, and Y. Jia. 2010. Software-hardware cooperative DRAM bank partitioning for chip multiprocessors. In Proceedings of the IFIP International Conference on Network and Parallel Computing. 329--343. Google Scholar
Digital Library
- S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda. 2011. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of the International Symposium on Microarchitecture. 374--385. Google Scholar
Digital Library
- L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. 2013. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the 2013 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’13). Google Scholar
Digital Library
- J. M. S. Prewitt. 1970. Picture Processing and Psychopictorics. Academic Press. Google Scholar
Digital Library
- J. Su, F. Yang, X. Zeng, and D. Zhou. 2016. Efficient memory partitioning for parallel data access via data reuse. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’16). 138--147. Google Scholar
Digital Library
- Y. Tatsumi and H. Mattausch. 1999. Fast quadratic increase of multiport-storage-cell area with port number. Electronic Letters 35, 25 (1999), 2185--2187.Google Scholar
Cross Ref
- Y. Wang, P. Li, and J. Cong. 2014. Theory and algorithm for generalized memory partitioning in high-level synthesis. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’14). Google Scholar
Digital Library
- Y. Wang, P. Li, P. Zhang, C. Zhang, and J. Cong. 2013. Memory partitioning for multidimensional arrays in high-level synthesis. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). Google Scholar
Digital Library
- Y. Wang, P. Zhang, X. Cheng, and J. Cong. 2012. An integrated and automated memory optimization flow for FPGA behavioral synthesis. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP--DAC’12). 257--262.Google Scholar
- M. Xie, D. Tong, Y. Feng, K. Huang, and X. Cheng. 2013. Page policy control with memory partitioning for DRAM performance and power efficiency. In Proceedings of the International Symposium on Low Power Electronics and Design. 298--303. Google Scholar
Digital Library
- M. Xie, D. Tong, K. Huang, and X. Cheng. 2014. Improving system throughput and fairness simultaneously in shared memory CMP systems via dynamic bank partitioning. In Proceedings of the International Symposium on High Performance Computer Architecture. 344--355.Google Scholar
- Shouyi Yin, Zhicong Xie, Chenyue Meng, Leibo Liu, and Shaojun Wei. 2016. Multibank memory optimization for parallel data access in multiple data arrays. In Proceedings of the 35th International Conference on Computer-Aided Design (ICCAD’16). ACM, New York, NY, 32:1--32:8. Google Scholar
Digital Library
- Shouyi Yin, Zhicong Xie, Chenyue Meng, Peng Ouyang, Leibo Liu, and Shaojun Wei. 2017. Memory partitioning for parallel multi-pattern data access in multiple data arrays. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 2 (2017), 431--444. Google Scholar
Digital Library
- Yuan Zhou, Khalid Al-Hawaj, and Zhiru Zhang. 2017. A new approach to automatic memory banking using trace-baced address mining. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’17). 179--188. Google Scholar
Digital Library
Index Terms
An Efficient Memory Partitioning Approach for Multi-Pattern Data Access via Data Reuse
Recommendations
An Efficient Data Reuse Strategy for Multi-Pattern Data Access
2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)Memory partitioning has been widely adopted to increase the memory bandwidth. Data reuse is a hardware-efficient way to improve data access throughput by exploiting locality in memory access patterns. We found that for many applications in image and video ...
Theory and algorithm for generalized memory partitioning in high-level synthesis
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arraysThe significant development of high-level synthesis tools has greatly facilitated FPGAs as general computing platforms. During the parallelism optimization for the data path, memory becomes a crucial bottleneck that impedes performance enhancement. ...
Efficient Memory Partitioning for Parallel Data Access via Data Reuse
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysIn this paper, we propose an efficient memory partitioning algorithm for parallel data access via data reuse. We found that for most of the applications in image and video processing, a large amount of data can be reused among different iterations in a ...






Comments