Abstract
High-Level Synthesis (HLS) promises a significant shortening of the FPGA design cycle by raising the abstraction level of the design entry to high-level languages such as C/C++. However, applications using dynamic, pointer-based data structures and dynamic memory allocation remain difficult to implement well, yet such constructs are widely used in software. Automated optimizations that leverage the memory bandwidth of FPGAs by distributing the application data over separate banks of on-chip memory are often ineffective in the presence of dynamic data structures due to the lack of an automated analysis of pointer-based memory accesses. In this work, we take a step toward closing this gap. We present a static analysis for pointer-manipulating programs that automatically splits heap-allocated data structures into disjoint, independent regions. The analysis leverages recent advances in separation logic, a theoretical framework for reasoning about heap-allocated data that has been successfully applied in recent software verification tools. Our algorithm focuses on dynamic data structures accessed in loops and is accompanied by automated source-to-source transformations that enable automatic loop parallelization and memory partitioning by off-the-shelf HLS tools. We demonstrate the successful loop parallelization and memory partitioning by our tool flow using three real-life applications that build, traverse, update, and dispose of dynamically allocated data structures. Our case studies, comparing the automatically parallelized to the direct HLS implementations, show an average latency reduction by a factor of 2 × across our benchmarks.
- Jonathan Babb, Martin Rinard, Andras Moritz, Walter Lee, Matthew Frank, Rajeev Barua, and Saman Amarasinghe. 1999. Parallelizing applications into silicon. In Proceedings of the Symposium on Field-Programmable Custom Computing Machines. IEEE, Napa Valley, CA, 70--80. Google Scholar
Digital Library
- BDTI. 2010. An Independent Evaluation of: The AutoESL AutoPilot High-Level Synthesis Tool. Retrieved March 2, 2014 from http://www.bdti.com/Resources/BenchmarkResults/HLSTCP/AutoPilot.Google Scholar
- Mohamed-Walid Benabderrahmane, Louis-Noel Pouchet, Albert Cohen, and Cédric Bastoul. 2010. The polyhedral model is more widely applicable than you think. In Proceedings of the International Conference on Compiler Construction. Springer-Verlag, Paphos, Cyprus, 283--303. Google Scholar
Digital Library
- Josh Berdine, Cristiano Calcagno, and Peter O’Hearn. 2005. Symbolic execution with separation logic. In Proceedings of the Asian Conference on Programming Languages and Systems. Tsukuba, Japan, 52--68. Google Scholar
Digital Library
- Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. SIGPLAN Notices 43, 6 (June 2008), 101--113. Google Scholar
Digital Library
- Matko Botinčan, Dino Distefano, Mike Dodds, Radu Grigore, and Matthew J. Parkinson. 2011. coreStar: The core of jStar. Boogie (2011), 65--77.Google Scholar
- Matko Botinčan, Mike Dodds, and Suresh Jagannathan. 2013. Proof-directed parallelization synthesis by separation logic. ACM Transactions on Programming Languages and Systems 35, 2 (July 2013), 1--60. Google Scholar
Digital Library
- Cristiano Calcagno and Dino Distefano. 2011. Infer: An automatic program verifier for memory safety of C programs. In Proceedings of the International Conference on NASA Formal Methods. Springer-Verlag, Pasadena, CA, 459--465. Google Scholar
Digital Library
- Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, Monterey, CA, 33--36. Google Scholar
Digital Library
- Jason Cong, Wei Jiang, Bin Liu, and Yi Zou. 2011. Automatic memory partitioning and scheduling for throughput and power optimization. ACM Transactions on Design Automation of Electronic Systems 16, 2 (March 2011), 1--25. Google Scholar
Digital Library
- Byron Cook, A. Gupta, S. Magill, A. Rybalchenko, J. Simsa, S. Singh, and V. Vafeiadis. 2009. Finding heap-bounds for hardware synthesis. In Proceedings of the Conference on Formal Methods in Computer-Aided Design. IEEE, Austin, TX, 205--212.Google Scholar
- Byron Cook, Stephen Magill, Mohammad Raza, Jiri Simsa, and Satnam Singh. 2010. Making Fast Hardware with Separation Logic. Retrieved from http://www.cs.cmu.edu/∼smagill/papers/fast-hardware.pdf.Google Scholar
- Paul Feautrier. 1991. Dataflow analysis of array and scalar references. International Journal of Parallel Programming 20, 1 (1991), 23--53.Google Scholar
Cross Ref
- Rakesh Ghiya, L Hendren, and Yingchun Zhu. 1998. Detecting parallelism in C programs with recursive data structures. IEEE Transactions on Parallel and Distributed Systems 1 (1998), 35--47. Google Scholar
Digital Library
- Bolei Guo, Neil Vachharajani, and David I. August. 2007. Shape analysis with inductive recursion synthesis. ACM SIGPLAN Notices 42, 6 (June 2007), 256. Google Scholar
Digital Library
- Qijing Huang, Ruolong Lian, A. Canis, Jongsok Choi, R. Xi, S. Brown, and J. Anderson. 2013. The effect of compiler optimizations on high-level synthesis for FPGAs. In Proceedings of Field-Programmable Custom Computing Machines. IEEE, Seattle, WA, 89--96. Google Scholar
Digital Library
- Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. 2002. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Journal of PAMI 24, 7 (July 2002), 881--892. Google Scholar
Digital Library
- Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis transformation. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, Palo Alto, CA, 75--86. Google Scholar
Digital Library
- Qiang Liu, George A. Constantinides, Konstantinos Masselos, and Peter Y. K. Cheung. 2007. Automatic on-chip memory minimization for data reuse. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines. IEEE, Napa, CA, 251--260. Google Scholar
Digital Library
- LLNL. 2014. ROSE Compiler Infrastructure. Retrieved from http://rosecompiler.org/.Google Scholar
- Stephen Magill, A Nanevski, Edmund Clarke, and Peter Lee. 2006. Inferring invariants in separation logic for imperative list-processing programs. In Proceedings of the Third Workshop on Semantics, Program Analysis, and Computing Environments for Memory Management (SPACE). ACM, Charlotte, SC, 47--60.Google Scholar
- Wim Meeus, Kristof Van Beeck, Toon Goedemé, Jan Meel, and Dirk Stroobandt. 2012. An overview of todays high-level synthesis tools. Design Automation for Embedded Systems (Aug. 2012), 1--21. Google Scholar
Digital Library
- Peter O’Hearn, John Reynolds, and Hongseok Yang. 2001. Local reasoning about programs that alter data structures. In Proceedings of the 15th International Workshop on Computer Science Logic (CSL’01). Springer-Verlag, Paris, France, 1--19. Google Scholar
Digital Library
- Louis-Noel Pouchet, Peng Zhang, P. Sadayappan, and Jason Cong. 2013. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the International Symposium on Field Programmable Gate Arrays. ACM, Monterey, CA, 29--38. Google Scholar
Digital Library
- Mohammad Raza, Cristiano Calcagno, and Philippa Gardner. 2009. Automatic parallelization with separation logic. In Proceedings of the International Symposium on Programming Languages and Systems. Springer-Verlag, York, UK, 348--362. Google Scholar
Digital Library
- Luc Séméria, Koichi Sato, and Giovanni De Micheli. 2000. Resolution of dynamic memory allocation and pointers for the behavioral synthesis form C. In Proceedings of the Design, Automation, and Testing in Europe Conference. ACM, Paris, France, 312--319. Google Scholar
Digital Library
- Jason Villarreal, Adrian Park, Walid Najjar, and Robert Halstead. 2010. Designing modular hardware accelerators in C with ROCCC 2.0. In Proceedings of the Symposium on Field-Programmable Custom Computing Machines. IEEE, Charlotte, SC, 127--134. Google Scholar
Digital Library
- Robert P. Wilson and Monica S. Lam. 1995. Efficient context-sensitive pointer analysis for C programs. In Proceedings of the Conference on Programming Language Design and Implementation. ACM, La Jolla, CA, 1--12. Google Scholar
Digital Library
- Felix Winterstein, Samuel Bayliss, and George A. Constantinides. 2013a. FPGA-based K-means clustering using tree-based data structures. In Proceedings of the International Conference on Field Programmable Logic and Applications. IEEE, Porto, Portugal, 1--6.Google Scholar
- Felix Winterstein, Samuel Bayliss, and George A. Constantinides. 2013b. High-level synthesis of dynamic data structures: A case study using Vivado HLS. In Proceedings of the International Conference on Field-Programmable Technology. IEEE, Kyoto, Japan, 362--365.Google Scholar
- Felix Winterstein, Samuel Bayliss, and George A. Constantinides. 2014. Separation logic-assisted code transformations for efficient high-level synthesis. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines. IEEE, Boston, MA, 1--8. Google Scholar
Digital Library
Index Terms
Separation Logic for High-Level Synthesis
Recommendations
Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysCurrent pipelining approach in high-level synthesis (HLS) achieves high performance for applications with regular and statically analyzable memory access patterns. However, it cannot effectively handle infrequent data-dependent structural and data ...
Probabilistic Optimization for High-Level Synthesis
FPGA '21: The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysHigh-level synthesis (HLS) tools automatically transform a high-level program, for example in C/C++, into a low-level hardware description. A key challenge in HLS tools is scheduling, i.e. determining the start time of all the operations in the untimed ...
A Case for Precise, Fine-Grained Pointer Synthesis in High-Level Synthesis
This article combines two practical approaches to improve pointer synthesis within HLS tools. Both approaches focus on inefficiencies in how HLS tools treat the points-to graph—a mapping that connects each instruction to the memory locations that it might ...






Comments