skip to main content
research-article

Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future Prospects

Published:13 September 2021Publication History
Skip Abstract Section

Abstract

FPGA-based accelerators are increasingly popular across a broad range of applications, because they offer massive parallelism, high energy efficiency, and great flexibility for customizations. However, difficulties in programming and integrating FPGAs have hindered their widespread adoption. Since the mid 2000s, there has been extensive research and development toward making FPGAs accessible to software-inclined developers, besides hardware specialists. Many programming models and automated synthesis tools, such as high-level synthesis, have been proposed to tackle this grand challenge. In this survey, we describe the progression and future prospects of the ongoing journey in significantly improving the software programmability of FPGAs. We first provide a taxonomy of the essential techniques for building a high-performance FPGA accelerator, which requires customizations of the compute engines, memory hierarchy, and data representations. We then summarize a rich spectrum of work on programming abstractions and optimizing compilers that provide different trade-offs between performance and productivity. Finally, we highlight several additional challenges and opportunities that deserve extra attention by the community to bring FPGA-based computing to the masses.

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. Retrieved from https://arXiv:1603.04467.Google ScholarGoogle Scholar
  2. Mohamed S. Abdelfattah and Vaughn Betz. 2014. Networks-on-Chip for FPGAs: Hard, soft or mixed?ACM Trans. Reconfig. Technol. Syst. 7, 3 (2014), 1–22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Mohamed S. Abdelfattah, David Han, Andrew Bitar, Roberto DiCecco, Shane O'Connell, Nitika Shanker, Joseph Chu, Ian Prins, Joshua Fender, Andrew C. Ling, et al. 2018. DLA: Compiler and FPGA overlay for neural network inference acceleration. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'18). 411–4117.Google ScholarGoogle ScholarCross RefCross Ref
  4. Michael Adler, Kermin E. Fleming, Angshuman Parashar, Michael Pellauer, and Joel Emer. 2011. Leap scratchpads: Automatic memory and cache management for reconfigurable logic. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jason Agron. 2009. Domain-specific language for HW/SW Co-Design for FPGAs. In IFIP Working Conference on Domain-Specific Languages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Muhammed Al Kadi, Benedikt Janssen, and Michael Huebner. 2016. FGPU: An SIMT-architecture for FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'16). 254–263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mythri Alle, Antoine Morvan, and Steven Derrien. 2013. Runtime dependency analysis for loop pipelining in high-level synthesis. In Proceedings of the Design Automation Conference (DAC'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A polyhedral compiler for expressing fast and portable code. In Proceedings of the International Symposium on Code Generation and Optimization (CGO) (2019). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Samridhi Bansal, Hsuan Hsiao, Tomasz Czajkowski, and Jason H. Anderson. 2018. High-level synthesis of software-customizable floating-point cores. In Proceedings of the Design, Automation, and Test in Europe (DATE'18).Google ScholarGoogle Scholar
  10. Cedric Bastoul. 2004. Code generation in the polyhedral model is easier than you think. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Shuvra S. Bhattacharyya, Gordon Brebner, Jörn W. Janneck, Johan Eker, Carl Von Platen, Marco Mattavelli, and Mickaël Raulet. 2009. OpenDF: A dataflow toolset for reconfigurable hardware and multicore systems. ACM SIGARCH Comput. Architect. News 36, 5 (2009), 29–35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46, 5, 720–748. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. David Boland and George A. Constantinides. 2010. Automated precision analysis: A polynomial algebraic approach. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. David Boland and George A. Constantinides. 2012. A scalable approach for automated precision analysis. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Uday Bondhugula, Albert Hartono, Jagannathan Ramanujam, and Ponnuswamy Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Uday Bondhugula, Jagannathan Ramanujam, and Ponnuswamy Sadayappan. 2007. Automatic mapping of nested loops to FPGAs. In Proceedings of the ACM SIGPLAN Conference on Principles and Practice of Parallel Programming (PPoPP'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Alexander Brant and Guy G. F. Lemieux. 2012. ZUMA: An open FPGA overlay architecture. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Pavan Kumar Bussa, Jeffrey Goeders, and Steven J. E. Wilton. 2017. Accelerating In-System FPGA debug of high-level synthesis circuits using incremental compilation techniques. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'17).Google ScholarGoogle Scholar
  19. Cadence. 2020. Stratus High-Level Synthesis. Retrieved from https://www.cadence.com/content/dam/cadence-www/global/en_US/documents/tools/digital-design-signoff/stratus-ds.pdf.Google ScholarGoogle Scholar
  20. Nazanin Calagar, Stephen D. Brown, and Jason H. Anderson. 2014. Source-level debugging for FPGA high-level synthesis. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'14). 1–8.Google ScholarGoogle Scholar
  21. Andrew Canis, Jason H. Anderson, and Stephen D. Brown. 2013. Multi-pumping for resource reduction in FPGA high-level synthesis. In Proceedings of the Design, Automation, and Test in Europe (DATE'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Andrew Canis, Stephen D. Brown, and Jason H. Anderson. 2014. Modulo SDC scheduling with recurrence minimization in high-level synthesis. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'14).Google ScholarGoogle Scholar
  23. Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Andrew Canis, Jongsok Choi, Blair Fort, Ruolong Lian, Qijing Huang, Nazanin Calagar, Marcel Gort, Jia Jun Qin, Mark Aldham, Tomasz Czajkowski, et al. 2013. From software to accelerators with LegUp high-level synthesis. In Proceedings of the International Conference on Compilers, Architectures and Synthesis of Embedded Systems (CASES'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zachariah Carmichael, Hamed F. Langroudi, Char Khazanov, Jeffrey Lillie, John L. Gustafson, and Dhireesha Kudithipudi. 2019. Deep positron: A deep neural network using the posit number system. In Proceedings of the Design, Automation, and Test in Europe (DATE'19).Google ScholarGoogle ScholarCross RefCross Ref
  26. Adrian M. Caulfield, Eric S. Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, et al. 2016. A cloud-scale acceleration architecture. In Proceedings of the International Symposium on Microarchitecture (MICRO'16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tao Chen, Shreesha Srinath, Christopher Batten, and G. Edward Suh. 2018. An architectural framework for accelerating dynamic parallel algorithms on reconfigurable hardware. In Proceedings of the International Symposium on Microarchitecture (MICRO'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Tao Chen and G. Edward Suh. 2016. Efficient data supply for hardware accelerators with prefetching and access/execute decoupling. In Proceedings of the International Symposium on Microarchitecture (MICRO'16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Xinyu Chen, Ronak Bajaj, Yao Chen, Jiong He, Bingsheng He, Weng-Fai Wong, and Deming Chen. 2019. On-the-fly parallel data shuffling for graph processing on OpenCL-based FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'19).Google ScholarGoogle ScholarCross RefCross Ref
  31. Yao Chen, Swathi T. Gurumani, Yun Liang, Guofeng Li, Donghui Guo, Kyle Rupnow, and Deming Chen. 2016. FCUDA-NoC: A scalable and efficient network-on-chip implementation for the CUDA-to-FPGA flow. IEEE Trans. Very Large Scale Integr. Syst. 24, 6 (2016), 2220–2233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yu Ting Chen and Jason H. Anderson. 2017. Automated generation of banked memory architectures in the high-level synthesis of multi-threaded software. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'17).Google ScholarGoogle Scholar
  33. Yu-Ting Chen, Jason Cong, Zhenman Fang, Jie Lei, and Peng Wei. 2016. When spark meets FPGAs: A case study for next-generation DNA sequencing acceleration. In Proceedings of the Workshop on Hot Topics in Cloud Computing (HotCloud'16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jianyi Cheng, Lana Josipovic, George A. Constantinides, Paolo Ienne, and John Wickerson. 2020. Combining dynamic & static scheduling in high-level synthesis. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'20). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yuze Chi, Jason Cong, Peng Wei, and Peipei Zhou. 2018. SODA: Stencil with optimized dataflow architecture. In Proceedings of the International Conference on Computer-Aided Design (ICCAD) (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yuze Chi, Licheng Guo, Young-kyu Choi, Jie Wang, and Jason Cong. 2021. Extending high-level synthesis for task-parallel programs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'21). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jongsok Choi, Stephen Brown, and Jason Anderson. 2013. From software threads to parallel hardware in high-level synthesis for FPGAs. In Proceedings of the International Conference on Field Programmable Technology (FPT'13).Google ScholarGoogle ScholarCross RefCross Ref
  38. Jongsok Choi, Kevin Nam, Andrew Canis, Jason Anderson, Stephen Brown, and Tomasz Czajkowski. 2012. Impact of cache architecture and interface on performance and area of FPGA-based processor/parallel-accelerator systems. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Young-kyu Choi, Yuze Chi, Weikang Qiao, Nikola Samardzic, and Jason Cong. 2021. HBM connect: High-performance HLS interconnect for FPGA HBM. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'21). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Young-Kyu Choi, Yuze Chi, Jie Wang, and Jason Cong. 2020. FLASH: Fast, parallel, and accurate simulator for HLS. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. (2020).Google ScholarGoogle Scholar
  41. Young-kyu Choi and Jason Cong. 2017. HLScope: High-level performance debugging for FPGA designs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'17).Google ScholarGoogle Scholar
  42. Young-kyu Choi and Jason Cong. 2018. HLS-based optimization and design space exploration for applications with variable loop bounds. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Young-kyu Choi, Jason Cong, Zhenman Fang, Yuchen Hao, Glenn Reinman, and Peng Wei. 2016. A quantitative analysis on microarchitectures of modern CPU-FPGA platforms. In Proceedings of the Design Automation Conference (DAC'16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Young-Kyu Choi, Jason Cong, Zhenman Fang, Yuchen Hao, Glenn Reinman, and Peng Wei. 2019. In-depth analysis on microarchitectures of modern heterogeneous CPU-FPGA platforms. ACM Trans. Reconfig. Technol. Syst. (2019). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Young-kyu Choi, Peng Zhang, Peng Li, and Jason Cong. 2017. HLScope+: Fast and accurate performance estimation for FPGA HLS. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Christopher H. Chou, Aaron Severance, Alex D. Brant, Zhiduo Liu, Saurabh Sant, and Guy G. F. Lemieux. 2011. VEGAS: Soft vector processor with scratchpad memory. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Eric S. Chung, James C. Hoe, and Ken Mai. 2011. CoRAM: An in-fabric memory architecture for FPGA-based computing. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Alessandro Cilardo and Luca Gallo. 2015. Improving multibank memory access parallelism with lattice-based partitioning. ACM Trans. Architect. Code Optimiz. 11, 4 (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Albert Cohen, Marc Sigler, Sylvain Girbal, Olivier Temam, David Parello, and Nicolas Vasilache. 2005. Facilitating the search for compositions of program transformations. In Proceedings of the International Symposium on Supercomputing (ICS'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Jason Cong, Zhenman Fang, Yuchen Hao, Peng Wei, Cody Hao Yu, Chen Zhang, and Peipei Zhou. 2018. Best-effort FPGA programming: A few steps can go a long way. Retrieved from https://arXiv:1807.01340.Google ScholarGoogle Scholar
  51. Jason Cong, Zhenman Fang, Muhuan Huang, Libo Wang, and Di Wu. 2017. CPU-FPGA coscheduling for big data applications. IEEE Design Test 35, 1 (2017), 16–22.Google ScholarGoogle ScholarCross RefCross Ref
  52. Jason Cong, Zhenman Fang, Muhuan Huang, Peng Wei, Di Wu, and Cody Hao Yu. 2018. Customizable computing–from single chip to datacenters. Proc. IEEE 107, 1 (2018), 185–203.Google ScholarGoogle ScholarCross RefCross Ref
  53. Jason Cong, Zhenman Fang, Michael Lo, Hanrui Wang, Jingxian Xu, and Shaochong Zhang. 2018. Understanding performance differences of FPGAs and GPUs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google ScholarGoogle Scholar
  54. Jason Cong, Muhuan Huang, Peichen Pan, Yuxin Wang, and Peng Zhang. 2016. Source-to-source optimization for HLS. FPGAs Softw. Program. (2016).Google ScholarGoogle Scholar
  55. Jason Cong, Wei Jiang, Bin Liu, and Yi Zou. 2011. Automatic memory partitioning and scheduling for throughput and power optimization. ACM Trans. Design Autom. Electron. Syst. 16, 2 (2011), 1–25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Jason Cong, Peng Li, Bingjun Xiao, and Peng Zhang. 2016. An optimal microarchitecture for stencil computation acceleration based on nonuniform partitioning of data reuse buffers. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 35, 3 (2016), 407–418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Jason Cong, Bin Liu, Stephen Neuendorffer, Juanjo Noguera, Kees Vissers, and Zhiru Zhang. 2011. High-level synthesis for FPGAs: From prototyping to deployment. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 30, 4 (2011), 473–491. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Jason Cong and Jie Wang. 2018. PolySA: Polyhedral-based systolic array auto-compilation. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Jason Cong, Peng Wei, Cody Hao Yu, and Peng Zhang. 2018. Automated accelerator generation and optimization with composable, parallel and pipeline architecture. In Proceedings of the Design Automation Conference (DAC'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Jason Cong, Peng Wei, Cody Hao Yu, and Peipei Zhou. 2017. Bandwidth optimization through on-chip memory restructuring for HLS. In Proceedings of the Design Automation Conference (DAC'17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Jason Cong, Peng Wei, Cody Hao Yu, and Peipei Zhou. 2018. Latte: Locality aware transformation for high-level synthesis. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google ScholarGoogle ScholarCross RefCross Ref
  62. Jason Cong and Zhiru Zhang. 2006. An efficient and versatile scheduling algorithm based on SDC formulation. In Proceedings of the Design Automation Conference (DAC'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. James Coole and Greg Stitt. 2010. Intermediate fabrics: Virtual architectures for circuit portability and fast placement and routing. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Philippe Coussy, Cyrille Chavet, Pierre Bomel, Dominique Heller, Eric Senn, and Eric Martin. 2008. GAUT: A high-level synthesis tool for DSP applications. High-Level Synth. (2008).Google ScholarGoogle Scholar
  65. John Curreri, Seth Koehler, Alan D. George, Brian Holland, and Rafael Garcia. 2010. Performance analysis framework for high-level language applications in reconfigurable computing. ACM Trans. Reconfig. Technol. Syst. 3, 1 (2010), 1–23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Tomasz S. Czajkowski, Utku Aydonat, Dmitry Denisenko, John Freeman, Michael Kinsner, David Neto, Jason Wong, Peter Yiannacouras, and Deshanand P. Singh. 2012. From OpenCL to high-performance hardware on FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'12).Google ScholarGoogle Scholar
  67. Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. 2016. FPGP: Graph processing framework on FPGA a case study of breadth-first search. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Steve Dai, Gai Liu, and Zhiru Zhang. 2018. A scalable approach to exact resource-constrained scheduling based on a joint SDC and SAT formulation. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Steve Dai, Gai Liu, Ritchie Zhao, and Zhiru Zhang. 2017. Enabling adaptive loop pipelining in high-level synthesis. In Proceedings of the Asilomar Conference on Signals, Systems, and Computers.Google ScholarGoogle ScholarCross RefCross Ref
  70. Steve Dai, Mingxing Tan, Kecheng Hao, and Zhiru Zhang. 2014. Flushing-enabled loop pipelining for high-level synthesis. In Proceedings of the Design Automation Conference (DAC'14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Steve Dai, Ritchie Zhao, Gai Liu, Shreesha Srinath, Udit Gupta, Christopher Batten, and Zhiru Zhang. 2017. Dynamic hazard resolution for pipelining irregular loops in high-level synthesis. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Steve Dai, Yuan Zhou, Hang Zhang, Ecenur Ustun, Evangeline F. Y. Young, and Zhiru Zhang. 2018. Fast and accurate estimation of quality of results in high-level synthesis with machine learning. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google ScholarGoogle ScholarCross RefCross Ref
  73. Luka Daoud, Dawid Zydek, and Henry Selvaraj. 2014. A survey of high level synthesis languages, tools, and compilers for reconfigurable high performance computing. Adv. Syst. Sci. (2014).Google ScholarGoogle Scholar
  74. Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao, Ming Liu, Jeremy Fowers, Kalin Ovtcharov, Anna Vinogradsky, Sarah Massengill, Lita Yang, Ray Bittner, et al. 2020. Pushing the limits of narrow precision inferencing at cloud scale with microsoft floating point. Adv. Neural Info. Process. Syst. (2020).Google ScholarGoogle Scholar
  75. Florent De Dinechin and Bogdan Pasca. 2011. Designing custom arithmetic data paths with FloPoCo. IEEE Design Test Comput. 28, 4 (2011), 18–27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Luiz Henrique De Figueiredo and Jorge Stolfi. 2004. Affine arithmetic: Concepts and applications. Numer. Algor. (2004).Google ScholarGoogle Scholar
  77. Johannes de Fine Licht, Simon Meierhans, and Torsten Hoefler. 2018. Transformations of high-level synthesis codes for high-performance computing. Retrieved from https://arXiv:1805.08288.Google ScholarGoogle Scholar
  78. Steven Derrien, Thibaut Marty, Simon Rokicki, and Tomofumi Yuki. 2020. Toward speculative loop pipelining for high-level synthesis. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 39, 11 (2020), 4229–4239.Google ScholarGoogle ScholarCross RefCross Ref
  79. Javier Duarte, Song Han, Philip Harris, Sergo Jindariani, Edward Kreinar, Benjamin Kreis, Jennifer Ngadiuba, Maurizio Pierini, Ryan Rivera, Nhan Tran, et al. 2018. Fast inference of deep neural networks in FPGAs for particle physics. J. Instrument. (2018).Google ScholarGoogle Scholar
  80. David Durst, Matthew Feldman, Dillon Huff, David Akeley, Ross Daly, Gilbert Louis Bernstein, Marco Patrignani, Kayvon Fatahalian, and Pat Hanrahan. 2020. Type-directed scheduling of streaming accelerators. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'20). Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Stephen A. Edwards, Richard Townsend, Martha Barker, and Martha A. Kim. 2019. Compositional dataflow circuits. ACM Trans. Embed. Comput. Syst. 18, 1 (2019), 1–27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Johan Eker and J. Janneck. 2003. CAL language report: Specification of the CAL actor language. ERL Tech. Memo UCB/ERL (2003).Google ScholarGoogle Scholar
  83. Fatemeh Eslami and Steven J. E. Wilton. 2018. Rapid triggering capability using an adaptive overlay during FPGA debug. ACM Trans. Design Autom. Electron. Syst. 23, 6 (2018), 1–25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Zhenman Fang, Farnoosh Javadi, Jason Cong, and Glenn Reinman. 2019. Understanding performance gains of accelerator-rich architectures. In Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors (ASAP'19).Google ScholarGoogle ScholarCross RefCross Ref
  85. Lorenzo Ferretti, Jihye Kwon, Giovanni Ansaloni, Giuseppe Di Guglielmo, Luca P. Carloni, and Laura Pozzi. 2020. Leveraging prior knowledge for effective design-space exploration in high-level synthesis. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 39, 11 (2020), 3736–3747.Google ScholarGoogle ScholarCross RefCross Ref
  86. Pietro Fezzardi, Marco Lattuada, and Fabrizio Ferrandi. 2017. Using efficient path profiling to optimize memory consumption of on-chip debugging for high-level synthesis. ACM Trans. Embed. Comput. Syst. 16, 5s (2017), 1–22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Christian Fobel, Gary Grewal, and Deborah Stacey. 2014. A scalable, serially equivalent, high-quality parallel placement methodology suitable for modern multicore and GPU architectures. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'14).Google ScholarGoogle ScholarCross RefCross Ref
  88. Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, et al. 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of the International Symposium on Computer Architecture (ISCA'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Tushar Garg, Saud Wasly, Rodolfo Pellizzoni, and Nachiket Kapre. 2020. HopliteBuf: Network calculus-based design of FPGA NoCs with provably stall-free FIFOs. ACM Trans. Reconfig. Technol. Syst. 13, 2 (2020). Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Mohammad Ghasemzadeh, Mohammad Samragh, and Farinaz Koushanfar. 2018. ReBNet: Residual binarized neural network. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google ScholarGoogle ScholarCross RefCross Ref
  91. Jeffrey Goeders and Steven J. E. Wilton. 2014. Effective FPGA debug for high-level synthesis generated circuits. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'14).Google ScholarGoogle Scholar
  92. Jeffrey Goeders and Steven J. E. Wilton. 2016. Signal-tracing techniques for In-System FPGA debugging of high-level synthesis circuits. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 36, 1 (2016), 83–96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Jeffrey B. Goeders, Guy G. F. Lemieux, and Steven J. E. Wilton. 2011. Deterministic timing-driven parallel placement by simulated annealing using half-box window decomposition. In Proceedings of the International Conference on Reconfigruable Computing and FPGAs (ReConFig'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Marcel Gort and Jason H. Anderson. 2010. Deterministic multi-core parallel routing for FPGAs. In Proceedings of the International Conference on Field Programmable Technology (FPT'10).Google ScholarGoogle Scholar
  95. Marcel Gort and Jason H. Anderson. 2013. Range and bitmask analysis for hardware optimization in high-level synthesis. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'13).Google ScholarGoogle Scholar
  96. Ian Gray, Yu Chan, Jamie Garside, Neil Audsley, and Andy Wellings. 2015. Transparent hardware synthesis of Java for predictable large-scale distributed systems. Retrieved from https://arXiv:1508.07142.Google ScholarGoogle Scholar
  97. Paul Grigoraş, Xinyu Niu, Jose G. F. Coutinho, Wayne Luk, Jacob Bower, and Oliver Pell. 2013. Aspect driven compilation for dataflow designs. In Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors (ASAP'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Tobias Grosser, Armin Groesslinger, and Christian Lengauer. 2012. Polly–performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. (2012).Google ScholarGoogle Scholar
  99. Sikender Gul, Muhammad Faisal Siddiqui, and Naveed Ur Rehman. 2019. FPGA based real-time implementation of online EMD with fixed point architecture. IEEE Access 7 (2019), 176565–176577.Google ScholarGoogle ScholarCross RefCross Ref
  100. Licheng Guo, Yuze Chi, Jie Wang, Jason Lau, Weikang Qiao, Ecenur Ustun, Zhiru Zhang, and Jason Cong. 2021. AutoBridge: Coupling coarse-grained floorplanning and pipelining for high-frequency HLS design on multi-die FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'21). Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Licheng Guo, Jason Lau, Yuze Chi, Jie Wang, Cody Hao Yu, Zhe Chen, Zhiru Zhang, and Jason Cong. 2020. Analysis and optimization of the implicit broadcasts in FPGA HLS to improve maximum frequency. In Proceedings of the Design Automation Conference (DAC'20). Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Licheng Guo, Jason Lau, Zhenyuan Ruan, Peng Wei, and Jason Cong. 2019. Hardware acceleration of long read pairwise overlapping in genome sequencing: A race between FPGA and GPU. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'19).Google ScholarGoogle ScholarCross RefCross Ref
  103. Peng Guo, Hong Ma, Ruizhi Chen, Pin Li, Shaolin Xie, and Donglin Wang. 2018. FBNA: A fully binarized neural network accelerator. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'18).Google ScholarGoogle ScholarCross RefCross Ref
  104. Tae Jun Ham, Juan L. Aragón, and Margaret Martonosi. 2017. Decoupling data supply from computation for latency-tolerant communication in heterogeneous architectures. ACM Trans. Architect. Code Optimiz. 14, 2 (2017), 1–27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Mohamed Ben Hammouda, Philippe Coussy, and Loïc Lagadec. 2014. A design approach to automatically synthesize ANSI-C assertions during high-level synthesis of hardware accelerators. In Proceedings of the International Symposium on Circuits and Systems (ISCAS'14).Google ScholarGoogle ScholarCross RefCross Ref
  106. Frank Hannig, Holger Ruckdeschel, Hritam Dutta, and Jürgen Teich. 2008. PARO: Synthesis of hardware accelerators for multi-dimensional dataflow-intensive applications. In Proceedings of the International Workshop on Applied Reconfigurable Computing (ARC'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. James Hegarty, John Brunhaver, Zachary DeVito, Jonathan Ragan-Kelley, Noy Cohen, Steven Bell, Artem Vasilyev, Mark Horowitz, and Pat Hanrahan. 2014. Darkroom: Compiling high-level image processing code into hardware pipelines. ACM Trans. Graph. 33, 4 (2014), 144:1–144:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. James Hegarty, Ross Daly, Zachary DeVito, Jonathan Ragan-Kelley, Mark Horowitz, and Pat Hanrahan. 2016. Rigel: Flexible multi-rate image processing hardware. ACM Trans. Graph. 25, 4 (2016), 1–11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Timothy Hickey, Qun Ju, and Maarten H. Van Emden. 2001. Interval arithmetic: From principles to implementation. J. ACM 48, 5 (2001), 1038–1068. Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Daniel Holanda Noronha, Ruizhe Zhao, Jeff Goeders, Wayne Luk, and Steven J. E. Wilton. 2019. On-Chip FPGA debug instrumentation for machine learning applications. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'19). Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Chin Hau Hoo and Akash Kumar. 2018. ParaDRo: A parallel deterministic router based on spatial partitioning and scheduling. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Amir Hormati, Manjunath Kudlur, Scott Mahlke, David Bacon, and Rodric Rabbah. 2008. Optimus: Efficient realization of streaming applications on FPGAs. In Proceedings of the International Conference on Compilers, Architectures and Synthesis of Embedded Systems (CASES'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Hsuan Hsiao and Jason Anderson. 2019. Thread weaving: Static resource scheduling for multithreaded high-level synthesis. In Proceedings of the Design Automation Conference (DAC'19). Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Bohu Huang and Haibin Zhang. 2013. Application of multi-core parallel computing in FPGA placement. In Proceedings of the International Symposium on Instrumentation and Measurement, Sensor Network and Automation (IMSNA'13).Google ScholarGoogle ScholarCross RefCross Ref
  115. Muhuan Huang, Di Wu, Cody Hao Yu, Zhenman Fang, Matteo Interlandi, Tyson Condie, and Jason Cong. 2016. Programming and runtime support to blaze FPGA accelerator deployment at datacenter scale. In Proceedings of the ACM Symposium on Cloud Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Sitao Huang, Gowthami Jayashri Manikandan, Anand Ramachandran, Kyle Rupnow, Wen-mei W Hwu, and Deming Chen. 2017. Hardware acceleration of the pair-HMM algorithm for DNA variant calling. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Yuanjie Huang, Paolo Ienne, Olivier Temam, Yunji Chen, and Chengyong Wu. 2013. Elastic CGRAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Stephen Ibanez, Gordon Brebner, Nick McKeown, and Noa Zilberman. 2019. The P4->NetFPGA workflow for line-rate packet processing. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'19). Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. Mohsen Imani, Samuel Bosch, Sohum Datta, Sharadhi Ramakrishna, Sahand Salamat, Jan M. Rabaey, and Tajana Rosing. 2019. QuantHD: A quantization framework for hyperdimensional computing. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 39, 10 (2019), 2268–2278.Google ScholarGoogle ScholarCross RefCross Ref
  120. Mohsen Imani, Sahand Salamat, Behnam Khaleghi, Mohammad Samragh, Farinaz Koushanfar, and Tajana Rosing. 2019. SparseHD: Algorithm-hardware co-optimization for efficient high-dimensional computing. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'19).Google ScholarGoogle ScholarCross RefCross Ref
  121. Intel. 2019. Intel Agilex F-Series FPGAs & SoCs. Retrieved from https://www.intel.com/content/www/us/en/products/programmable/fpga/agilex/f-series.html.Google ScholarGoogle Scholar
  122. Intel. 2020. Intel High Level Synthesis Compiler Pro Edition: Reference Manual. Retrieved from https://www.intel.com/content/www/us/en/programmable/documentation/ewa1462824960255.html.Google ScholarGoogle Scholar
  123. Intel. 2020. Intel SoC FPGAs. Retrieved from https://www.intel.ca/content/www/ca/en/products/programmable/soc.html.Google ScholarGoogle Scholar
  124. Intel. 2020. The oneAPI Specification. Retrieved from https://www.oneapi.com/.Google ScholarGoogle Scholar
  125. Christian Iseli and Eduardo Sanchez. 1993. Spyder: A reconfigurable VLIW processor using FPGAs. In Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines.Google ScholarGoogle ScholarCross RefCross Ref
  126. Asif Islam and Nachiket Kapre. 2018. LegUp-NoC: High-level synthesis of loops with indirect addressing. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google ScholarGoogle ScholarCross RefCross Ref
  127. Manish Kumar Jaiswal and Ray C. C. Cheung. 2013. Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support. Microelectr. J. 44, 5 (2013), 421–430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. Jiantong Jiang, Zeke Wang, Xue Liu, Juan Gómez-Luna, Nan Guan, Qingxu Deng, Wei Zhang, and Onur Mutlu. 2020. Boyi: A systematic framework for automatically deciding the right execution model of OpenCL applications on FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'20). Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. Lana Josipovic, Philip Brisk, and Paolo Ienne. 2017. An out-of-order load-store queue for spatial computing. ACM Trans. Embed. Comput. Syst. 16, 5s (2017), 1–19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. Lana Josipović, Radhika Ghosal, and Paolo Ienne. 2018. Dynamically scheduled high-level synthesis. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. Lana Josipovic, Andrea Guerrieri, and Paolo Ienne. 2019. Speculative dataflow circuits. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'19). Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. Juniper. 2020. Juniper: Java Platform for High-performance and Real-time Large-scale Data. Retrieved from http:// www.juniper-project.org/.Google ScholarGoogle Scholar
  134. Nachiket Kapre et al. 2018. Hoplite-Q: Priority-aware routing in FPGA overlay NoCs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google ScholarGoogle Scholar
  135. Nachiket Kapre and Jan Gray. 2015. Hoplite: Building austere overlay NoCs for FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'15).Google ScholarGoogle ScholarCross RefCross Ref
  136. Nachiket Kapre and Deheng Ye. 2016. GPU-Accelerated high-level synthesis for bitwidth optimization of FPGA datapaths. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. Soguy Mak karé Gueye, Gwenaël Delaval, Eric Rutten, Dominique Heller, and Jean-Philippe Diguet. 2018. A domain-specific language for autonomic managers in FPGA reconfigurable architectures. In Proceedings of the International Conference on Autonomic Computing (ICAC'18).Google ScholarGoogle ScholarCross RefCross Ref
  138. Ryan Kastner, Janarbek Matai, and Stephen Neuendorffer. 2018. Parallel programming for FPGAs. Retrieved from https://arXiv:1805.03648.Google ScholarGoogle Scholar
  139. Keras. 2020. Keras. Simple. Flexible. Powerful.Retrieved from https://keras.io/.Google ScholarGoogle Scholar
  140. Ronan Keryell and Lin-Ya Yu. 2018. Early experiments using SYCL single-source modern C++ on Xilinx FPGA: Extended Abstract of technical presentation. In Proceedings of the International Workshop on OpenCL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Ahmed Khawaja, Joshua Landgraf, Rohith Prakash, Michael Wei, Eric Schkufza, and Christopher J. Rossbach. 2018. Sharing, protection, and compatibility for reconfigurable fabric with amorphos. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  142. Soroosh Khoram, Jialiang Zhang, Maxwell Strange, and Jing Li. 2018. Accelerating graph analytics by co-optimizing storage and access on an FPGA-HMC platform. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  143. Jeffrey Kingyens and J. Gregory Steffan. 2011. The potential for a GPU-Like overlay architecture for FPGAs. Intl. J. Reconfig. Comput. (2011).Google ScholarGoogle Scholar
  144. Adam B. Kinsman and Nicola Nicolici. 2009. Finite precision bit-width allocation using SAT-Modulo theory. In Design, Automation, and Test in Europe (DATE'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  145. Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The tensor algebra compiler. Proc. ACM Program. Lang. (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. Ana Klimovic and Jason H. Anderson. 2013. Bitwidth-optimized hardware accelerators with software fallback. In Proceedings of the International Conference on Field Programmable Technology (FPT'13).Google ScholarGoogle Scholar
  147. David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis et al. 2018. Spatial: A language and compiler for application accelerators. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  148. David Koeplinger, Raghu Prabhakar, Yaqi Zhang, Christina Delimitrou, Christos Kozyrakis, and Kunle Olukotun. 2016. Automatic generation of efficient accelerators for reconfigurable hardware. In Proceedings of the International Symposium on Computer Architecture (ISCA'16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. Maciej Kurek, Tobias Becker, Thomas C. P. Chau, and Wayne Luk. 2014. Automating optimization of reconfigurable designs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  150. Yi-Hsiang Lai, Yuze Chi, Yuwei Hu, Jie Wang, Cody Hao Yu, Yuan Zhou, Jason Cong, and Zhiru Zhang. 2019. HeteroCL: A multi-paradigm programming infrastructure for software-defined reconfigurable computing. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'19). Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. Yi-Hsiang Lai, Hongbo Rong, Size Zheng, Weihao Zhang, Xiuping Cui, Yunshan Jia, Jie Wang, Brendan Sullivan, Zhiru Zhang, Yun Liang, et al. 2020. SuSy: A programming model for productive construction of high-performance systolic arrays on FPGAs. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'20). Google ScholarGoogle ScholarDigital LibraryDigital Library
  152. Chris Lavin and Alireza Kaviani. 2018. RapidWright: Enabling custom crafted implementations for FPGAs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google ScholarGoogle ScholarCross RefCross Ref
  153. D.-U. Lee, Altaf Abdul Gaffar, Ray C. C. Cheung, Oskar Mencer, Wayne Luk, and George A. Constantinides. 2006. Accuracy-guaranteed bit-width optimization. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 25, 10 (2006), 1990–2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  154. David M. Lewis, Marcus H. van Ierssel, and Daniel H. Wong. 1993. A field programmable accelerator for compiled-code applications. In Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines.Google ScholarGoogle Scholar
  155. Peng Li, Louis-Noël Pouchet, and Jason Cong. 2014. Throughput optimization for high-level synthesis using resource constraints. In Proceedings of the International Workshop on Polyhedral Compilation Techniques (IMPACT'14).Google ScholarGoogle Scholar
  156. Wuxi Li, Meng Li, Jiajun Wang, and David Z. Pan. 2017. UTPlaceF 3.0: A parallelization framework for modern FPGA global placement. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  157. Shuang Liang, Shouyi Yin, Leibo Liu, Wayne Luk, and Shaojun Wei. 2018. FP-BNN: Binarized neural network on FPGA. Neurocomputing (2018).Google ScholarGoogle Scholar
  158. Yibo Lin, Zixuan Jiang, Jiaqi Gu, Wuxi Li, Shounak Dhar, Haoxing Ren, Brucek Khailany, and David Z. Pan. 2020. DREAMPlace: Deep learning toolkit-enabled GPU acceleration for modern VLSI placement. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 40, 4 (2020), 748–761.Google ScholarGoogle ScholarCross RefCross Ref
  159. Junyi Liu, Samuel Bayliss, and George A. Constantinides. 2015. Offline synthesis of online dependence testing: Parametric loop pipelining for HLS. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  160. Ji Liu, Abdullah-Al Kafi, Xipeng Shen, and Huiyang Zhou. 2020. MKPipe: A compiler framework for optimizing multi-kernel workloads in OpenCL for FPGA. In Proceedings of the International Symposium on Supercomputing (ICS'20). Google ScholarGoogle ScholarDigital LibraryDigital Library
  161. Junyi Liu, John Wickerson, Samuel Bayliss, and George A. Constantinides. 2017. Polyhedral-based dynamic loop pipelining for high-level synthesis. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 37, 9 (2017), 1802–1815. Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. Leo Liu, Jay Weng, and Nachiket Kapre. 2019. RapidRoute: Fast assembly of communication structures for FPGA overlays. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'19).Google ScholarGoogle ScholarCross RefCross Ref
  163. Qiang Liu, George A. Constantinides, Konstantinos Masselos, and Peter Y. K. Cheung. 2007. Automatic on-chip memory minimization for data reuse. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  164. Charles Lo and Paul Chow. 2016. Model-based optimization of high level synthesis directives. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'16).Google ScholarGoogle ScholarCross RefCross Ref
  165. Charles Lo and Paul Chow. 2018. Multi-fidelity optimization for high-level synthesis directives. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'18).Google ScholarGoogle ScholarCross RefCross Ref
  166. Charles Lo and Paul Chow. 2020. Hierarchical modelling of generators in design-space exploration. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'20).Google ScholarGoogle ScholarCross RefCross Ref
  167. Alec Lu, Zhenman Fang, Weihua Liu, and Lesley Shannon. 2021. Demystifying the memory system of modern datacenter FPGAs for software programmers through microbenchmarking. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'21). Google ScholarGoogle ScholarDigital LibraryDigital Library
  168. Adrian Ludwin and Vaughn Betz. 2011. Efficient and deterministic parallel placement for FPGAs. ACM Trans. Design Autom. Electron. Syst. 16, 3 (2011), 1–23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  169. Adrian Ludwin, Vaughn Betz, and Ketan Padalia. 2008. High-quality, deterministic parallel placement for FPGAs on commodity hardware. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  170. Rui Ma, Jia-Ching Hsu, Tian Tan, Eriko Nurvitadhi, David Sheffield, Rob Pelt, Martin Langhammer, Jaewoong Sim, Aravind Dasu, and Derek Chiou. 2019. Specializing FGPU for persistent deep learning. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'19). 326–333.Google ScholarGoogle ScholarCross RefCross Ref
  171. Xiaoyin Ma, Walid A. Najjar, and Amit K. Roy-Chowdhury. 2015. Evaluation and acceleration of high-throughput fixed-point object detection on FPGAs. IEEE Trans. Circ. Syst. Video Technol. 25, 6 (2015), 1051–1062.Google ScholarGoogle ScholarDigital LibraryDigital Library
  172. Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, and Hadi Esmaeilzadeh. 2016. TABLA: A unified template-based framework for accelerating statistical machine learning. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA'16).Google ScholarGoogle ScholarCross RefCross Ref
  173. Hosein Mohammadi Makrani, Farnoud Farahmand, Hossein Sayadi, Sara Bondi, Sai Manoj Pudukotai Dinakarrao, Houman Homayoun, and Setareh Rafatirad. 2019. Pyramid: Machine learning framework to estimate the optimal timing and resource usage of a high-level synthesis design. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'19).Google ScholarGoogle Scholar
  174. Maxeler. 2020. Maxeler High-performance Dataflow Computing Systems. Retrieved from https://www.maxeler.com/products/software/maxcompiler/.Google ScholarGoogle Scholar
  175. Séamas McGettrick, Kunjan Patel, and Chris Bleakley. 2011. High performance programmable FPGA overlay for digital signal processing. In Proceedings of the International Conference on Reconfigurable Computing: Architectures, Tools and Applications (ARC'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  176. Atefeh Mehrabi, Aninda Manocha, Benjamin C. Lee, and Daniel J. Sorin. 2020. Prospector: Synthesizing efficient accelerators via statistical learning. In Proceedings of the Design, Automation, and Test in Europe (DATE'20). Google ScholarGoogle ScholarDigital LibraryDigital Library
  177. Richard Membarth, Oliver Reiche, Frank Hannig, Jürgen Teich, Mario Körner, and Wieland Eckert. 2016. Hipacc: A domain-specific language and compiler for image processing. IEEE Trans. Parallel Distrib. Syst. 27, 1 (2016), 210–224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  178. Mentor. 2020. Catapult High-Level Synthesis. Retrieved from https://s3.amazonaws.com/s3.mentor.com/public_documents/datasheet/hls-lp/catapult-high-level-synthesis.pdf.Google ScholarGoogle Scholar
  179. Microchip. 2020. LegUp 9.1 Documentation. Retrieved from https://download-soc.microsemi.com/FPGA/HLS-EAP/docs/legup-9.1-docs/index.html.Google ScholarGoogle Scholar
  180. Microchip. 2020. Microchip Acquires High-Level Synthesis Tool Provider LegUp to Simplify Development of PolarFire FPGA-based Edge Compute Solutions. Retrieved from https://www.microchip.com/en-us/about/news-releases/products/microchip-acquires-high-level-synthesis-tool-provider-legup.Google ScholarGoogle Scholar
  181. Microsoft. 2020. A Microsoft Custom Data Type for Efficient Inference. Retrieved from https://www.microsoft.com/en-us/research/blog/a-microsoft-custom-data-type-for-efficient-inference/.Google ScholarGoogle Scholar
  182. Peter Milder, Franz Franchetti, James C. Hoe, and Markus Püschel. 2012. Computer generation of hardware for linear digital signal processing transforms. ACM Trans. Design Autom. Electron. Syst. 16, 3 (2012), 1–23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  183. Yehdhih Moctar, Mirjana Stojilović, and Philip Brisk. 2018. Deterministic parallel routing for FPGAs based on galois parallel execution model. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'18).Google ScholarGoogle ScholarCross RefCross Ref
  184. Joshua S. Monson and Brad Hutchings. 2014. New approaches for in-system debug of behaviorally synthesized FPGA circuits. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'14).Google ScholarGoogle Scholar
  185. Joshua S. Monson and Brad L. Hutchings. 2015. Using source-level transformations to improve high-level synthesis debug and validation on FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  186. Joshua S. Monson and Brad L. Hutchings. 2018. Enhancing debug observability for HLS-based FPGA circuits through source-to-source compilation. J. Parallel Distrib. Comput. 117 (2018), 148–160.Google ScholarGoogle ScholarCross RefCross Ref
  187. Thierry Moreau, Tianqi Chen, Ziheng Jiang, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. VTA: An open hardware-software stack for deep learning. Retrieved from https://arXiv:1807.04188.Google ScholarGoogle Scholar
  188. Antoine Morvan, Steven Derrien, and Patrice Quinton. 2013. Polyhedral bubble insertion: A method to improve nested loop pipelining for high-level synthesis. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 32, 3 (2013), 339–352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  189. Kevin E. Murray, Mohamed A. Elgammal, Vaughn Betz, Tim Ansell, Keith Rothman, and Alessandro Comodi. 2020. SymbiFlow and VPR: An open-source design flow for commercial and novel FPGAs. IEEE Micro (2020).Google ScholarGoogle Scholar
  190. Kevin E. Murray, Oleg Petelin, Sheng Zhong, Jia Min Wang, Mohamed Eldafrawy, Jean-Philippe Legault, Eugene Sha, Aaron G. Graham, Jean Wu, Matthew J. P. Walker, et al. 2020. VTR 8: High-performance CAD and customizable FPGA architecture modelling. ACM Trans. Reconfig. Technol. Syst. 13, 2 (2020), 339–352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  191. Razvan Nane, Vlad-Mihai Sima, Christian Pilato, Jongsok Choi, Blair Fort, Andrew Canis, Yu Ting Chen, Hsuan Hsiao, Stephen Brown, Fabrizio Ferrandi, et al. 2016. A survey and evaluation of FPGA high-level synthesis tools. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 35, 10 (2016), 1591–1604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  192. Rachit Nigam, Sachille Atapattu, Samuel Thomas, Zhijing Li, Theodore Bauer, Yuwei Ye, Apurva Koti, Adrian Sampson, and Zhiru Zhang. 2020. Predictable accelerator design with time-sensitive affine types. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'20). Google ScholarGoogle ScholarDigital LibraryDigital Library
  193. Mostafa W. Numan, Braden J. Phillips, Gavin S. Puddy, and Katrina Falkner. 2020. Towards automatic high-level code deployment on reconfigurable platforms: A survey of high-level synthesis tools and toolchains. IEEE Access 8 (2020), 174692–174722.Google ScholarGoogle ScholarCross RefCross Ref
  194. Eriko Nurvitadhi, Gabriel Weisz, Yu Wang, Skand Hurkat, Marie Nguyen, James C. Hoe, José F Martínez, and Carlos Guestrin. 2014. GraphGen: An FPGA framework for vertex-centric graph computation. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  195. William George Osborne, Ray C. C. Cheung, José Gabriel F. Coutinho, Wayne Luk, and Oskar Mencer. 2007. Automatic accuracy-guaranteed bit-width optimization for fixed and floating-point systems. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'07).Google ScholarGoogle ScholarCross RefCross Ref
  196. Ganda Stephane Ouedraogo, Matthieu Gautier, and Olivier Sentieys. 2014. A frame-based domain-specific language for rapid prototyping of FPGA-based software-defined radios. EURASIP J. Adv. Signal Process. 1 (2014), 1–15.Google ScholarGoogle Scholar
  197. M. Akif Özkan, Oliver Reiche, Frank Hannig, and Jürgen Teich. 2016. FPGA-based accelerator design from a domain-specific language. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'16).Google ScholarGoogle ScholarCross RefCross Ref
  198. Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, and Wen-Mei W. Hwu. 2009. FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs. In Proceedings of the Symposium on Application Specific Processors (SASP'09).Google ScholarGoogle Scholar
  199. Philippos Papaphilippou, Jiuxi Meng, and Wayne Luk. 2020. High-performance FPGA network switch architecture. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'20). Google ScholarGoogle ScholarDigital LibraryDigital Library
  200. Dongjoon Park, Yuanlong Xiao, Nevo Magnezi, and André DeHon. 2018. Case for fast FPGA compilation using partial reconfiguration. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'18).Google ScholarGoogle ScholarCross RefCross Ref
  201. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Retrieved from https://arXiv:1912.01703. Google ScholarGoogle ScholarDigital LibraryDigital Library
  202. Ryan Pattison, Christian Fobel, Gary Grewal, and Shawki Areibi. 2015. Scalable analytic placement for FPGA on GPGPU. In Proceedings of the International Conference on Reconfigruable Computing and FPGAs (ReConFig'15).Google ScholarGoogle ScholarCross RefCross Ref
  203. Francesco Peverelli, Marco Rabozzi, Emanuele Del Sozzo, and Marco D. Santambrogio. 2018. OXiGen: A tool for automatic acceleration of C functions into dataflow FPGA-based kernels. In Proceedings of the International Parallel and Distributed Processing Symposium Workshops (IPDPSW'18).Google ScholarGoogle Scholar
  204. Christian Pilato and Fabrizio Ferrandi. 2013. Bambu: A modular framework for the high level synthesis of memory-intensive applications. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'13).Google ScholarGoogle ScholarCross RefCross Ref
  205. Christian Pilato, Daniele Loiacono, Antonino Tumeo, Fabrizio Ferrandi, Pier Luca Lanzi, and Donatella Sciuto. 2010. Speeding-Up expensive evaluations in high-level synthesis using solution modeling and fitness inheritance. Comput. Intell. Exp. Optimiz. Problems (2010).Google ScholarGoogle Scholar
  206. Jose P. Pinilla and Steven J. E. Wilton. 2016. Enhanced source-level instrumentation for FPGA in-system debug of high-level synthesis designs. In Proceedings of the International Conference on Field Programmable Technology (FPT'16).Google ScholarGoogle Scholar
  207. Louis-Noel Pouchet, Peng Zhang, Ponnuswamy Sadayappan, and Jason Cong. 2013. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  208. Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, and Mark Horowitz. 2017. Programming heterogeneous systems from an image processing DSL. ACM Trans. Architect. Code Optimiz. 14, 3 (2017), 1–25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  209. Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song et al. 2016. Going deeper with embedded FPGA platform for convolutional neural network. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'16). 26–35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  210. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  211. B. Ramakrishna Rau. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the International Symposium on Microarchitecture (MICRO'94). Google ScholarGoogle ScholarDigital LibraryDigital Library
  212. Oliver Reiche, M. Akif Özkan, Richard Membarth, Jürgen Teich, and Frank Hannig. 2017. Generating FPGA-Based image processing accelerators with hipacc. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  213. Hongbo Rong. 2017. Programmatic control of a compiler for generating high-performance spatial hardware. Retrieved from https://arXiv:1711.07606.Google ScholarGoogle Scholar
  214. Zhenyuan Ruan, Tong He, Bojie Li, Peipei Zhou, and Jason Cong. 2018. ST-Accel: A high-level programming platform for streaming applications on FPGA. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google ScholarGoogle ScholarCross RefCross Ref
  215. Sahand Salamat, Mohsen Imani, Behnam Khaleghi, and Tajana Rosing. 2019. F5-HD: Fast flexible FPGA-based framework for refreshing hyperdimensional computing. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'19). Google ScholarGoogle ScholarDigital LibraryDigital Library
  216. Andrew G. Schmidt, Neil Steiner, Matthew French, and Ron Sass. 2012. HwPMI: An extensible performance monitoring infrastructure for improving hardware design and productivity on FPGAs. Int. J. Reconfig. Comput. (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  217. Robert Schreiber, Shail Aditya, Scott Mahlke, Vinod Kathail, B. Ramakrishna Rau, Darren Cronquist, and Mukund Sivaraman. 2002. PICO-NPA: High-level synthesis of nonprogrammable hardware accelerators. J. VLSI Signal Process. Syst. Signal Image Video Technol. 31, 2 (2002), 127–142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  218. Jocelyn Sérot, François Berry, and Sameer Ahmed. 2013. CAPH: A language for implementing stream-processing applications on FPGAs. Embed. Syst. Design FPGAs (2013).Google ScholarGoogle Scholar
  219. Aaron Severance, Joe Edwards, Hossein Omidian, and Guy Lemieux. 2014. Soft vector processors with streaming pipelines. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  220. Aaron Severance and Guy Lemieux. 2012. VENICE: A compact vector processor for FPGA applications. In Proceedings of the International Conference on Field Programmable Technology (FPT'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  221. Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In Proceedings of the International Symposium on Microarchitecture (MICRO'16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  222. Minghua Shen and Guojie Luo. 2015. Accelerate FPGA routing with parallel recursive partitioning. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  223. Minghua Shen and Guojie Luo. 2017. Corolla: GPU-accelerated FPGA routing based on subgraph dynamic expansion. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  224. Minghua Shen, Guojie Luo, and Nong Xiao. 2020. Coarse-grained parallel routing with recursive partitioning for FPGAs. IEEE Trans. Parallel Distrib. Syst. 32, 4 (2020), 884–899.Google ScholarGoogle ScholarDigital LibraryDigital Library
  225. Sam Skalicky, Joshua Monson, Andrew Schmidt, and Matthew French. 2018. Hot & spicy: Improving productivity with python and HLS for FPGAs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google ScholarGoogle ScholarCross RefCross Ref
  226. Atefeh Sohrabizadeh, Jie Wang, and Jason Cong. 2020. End-to-end optimization of deep learning applications. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'20). Google ScholarGoogle ScholarDigital LibraryDigital Library
  227. Roman A. Solovyev, Alexandr A. Kalinin, Alexander G. Kustov, Dmitry V. Telpukhov, and Vladimir S. Ruhlov. 2018. FPGA implementation of convolutional neural networks with fixed-point calculations. Retrieved from https://arXiv:1808.09945.Google ScholarGoogle Scholar
  228. Lukas Sommer, Lukas Weber, Martin Kumm, and Andreas Koch. 2020. Comparison of arithmetic number formats for inference in sum-product networks on FPGAs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'20).Google ScholarGoogle ScholarCross RefCross Ref
  229. Nitish Srivastava, Hongbo Rong, Prithayan Barua, Guanyu Feng, Huanqi Cao, Zhiru Zhang, David Albonesi, Vivek Sarkar, Wenguang Chen, Paul Petersen, et al. 2019. T2S-Tensor: Productively generating high-performance spatial hardware for dense tensor computations. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'19).Google ScholarGoogle ScholarCross RefCross Ref
  230. Robert Stewart, Kirsty Duncan, Greg Michaelson, Paulo Garcia, Deepayan Bhowmik, and Andrew Wallace. 2018. RIPL: A parallel image processing language for FPGAs. ACM Trans. Reconfig. Technol. Syst. 11, 1 (2018), 1–24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  231. Ian Swarbrick, Dinesh Gaitonde, Sagheer Ahmad, Bala Jayadev, Jeff Cuppett, Abbas Morshed, Brian Gaide, and Ygal Arbel. 2019. Versal network-on-chip (NoC). In Proceedings of the Symposium on High-Performance Interconnects (Hot Interconnects'19).Google ScholarGoogle ScholarCross RefCross Ref
  232. Synthesijer. 2020. Synthesijer GitHub. Retrieved from https://github.com/synthesijer/synthesijer.Google ScholarGoogle Scholar
  233. Mingxing Tan, Steve Dai, Udit Gupta, and Zhiru Zhang. 2015. Mapping-aware constrained scheduling for LUT-Based FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  234. Mingxing Tan, Gai Liu, Ritchie Zhao, Steve Dai, and Zhiru Zhang. 2015. Elasticflow: A complexity-effective approach for pipelining irregular loop nests. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  235. James Thomas, Pat Hanrahan, and Matei Zaharia. 2020. Fleet: A framework for massively parallel streaming on FPGAs. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'20). Google ScholarGoogle ScholarDigital LibraryDigital Library
  236. Tim Todman and Wayne Luk. 2013. Runtime assertions and exceptions for streaming systems. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'13).Google ScholarGoogle ScholarCross RefCross Ref
  237. Stephen M. Steve Trimberger. 2018. Three ages of FPGAs: A retrospective on the first thirty years of FPGA technology: This paper reflects on how Moore's law has driven the design of FPGAs through three epochs: The age of invention, the age of expansion, and the age of accumulation. IEEE Solid-State Circ. Mag. 10, 2 (2018), 16–29.Google ScholarGoogle ScholarCross RefCross Ref
  238. Ecenur Ustun, Chenhui Deng, Debjit Pal, Zhijing Li, and Zhiru Zhang. 2020. Accurate operation delay prediction for FPGA HLS using graph neural networks. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'20). Google ScholarGoogle ScholarDigital LibraryDigital Library
  239. Ecenur Ustun, Shaojie Xiang, Jinny Gui, Cunxi Yu, and Zhiru Zhang. 2019. LAMDA: Learning-assisted multi-stage autotuning for FPGA design closure. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'19).Google ScholarGoogle ScholarCross RefCross Ref
  240. Shervin Vakili, J. M. Pierre Langlois, and Guy Bois. 2013. Enhanced precision analysis for accuracy-aware bit-width optimization using affine arithmetic. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 32, 12 (2013), 1853–1865. Google ScholarGoogle ScholarDigital LibraryDigital Library
  241. Anshuman Verma, Huiyang Zhou, Skip Booth, Robbie King, James Coole, Andy Keep, John Marshall, and Wu-chun Feng. 2017. Developing dynamic profiling and debugging support in OpenCL for FPGAs. In Proceedings of the Design Automation Conference (DAC'17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  242. Chris C. Wang and Guy G. F. Lemieux. 2011. Scalable and deterministic timing-driven parallel placement for FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  243. Dekui Wang, Zhenhua Duan, Cong Tian, Bohu Huang, and Nan Zhang. 2020. ParRA: A shared memory parallel FPGA router using hybrid partitioning approach. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 39, 4 (2020), 830–842.Google ScholarGoogle ScholarCross RefCross Ref
  244. Han Wang, Robert Soulé, Huynh Tu Dang, Ki Suh Lee, Vishal Shrivastav, Nate Foster, and Hakim Weatherspoon. 2017. P4FPGA: A rapid prototyping framework for p4. In Proceedings of the Symposium on SDN Research. Google ScholarGoogle ScholarDigital LibraryDigital Library
  245. Jie Wang, Licheng Guo, and Jason Cong. 2021. AutoSA: A polyhedral compiler for high-performance systolic arrays on FPGA. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'21). Google ScholarGoogle ScholarDigital LibraryDigital Library
  246. Shibo Wang and Pankaj Kanwar. 2019. BFloat16: The secret to high performance on cloud TPUs. Google Cloud Blog (2019).Google ScholarGoogle Scholar
  247. Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, Yanzhi Wang, and Yun Liang. 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  248. Xiaojun Wang and Miriam Leeser. 2010. VFloat: A variable precision fixed- and floating-point library for reconfigurable hardware. ACM Trans. Reconfig. Technol. Syst. 16, 3 (2010), 1–23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  249. Yu Wang, James C. Hoe, and Eriko Nurvitadhi. 2019. Processor assisted worklist scheduling for FPGA accelerated graph processing on a shared-memory platform. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'19).Google ScholarGoogle ScholarCross RefCross Ref
  250. Yuxin Wang, Peng Li, and Jason Cong. 2014. Theory and algorithm for generalized memory partitioning in high-level synthesis. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  251. Yuxin Wang, Peng Li, Peng Zhang, Chen Zhang, and Jason Cong. 2013. Memory partitioning for multidimensional arrays in high-level synthesis. In Proceedings of the Design Automation Conference (DAC'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  252. Saud Wasly, Rodolfo Pellizzoni, and Nachiket Kapre. 2017. HopliteRT: An efficient FPGA NoC for real-time applications. In Proceedings of the International Conference on Field Programmable Technology (FPT'17).Google ScholarGoogle ScholarCross RefCross Ref
  253. Richard Wei, Lane Schwartz, and Vikram Adve. 2017. DLVM: A modern compiler infrastructure for deep learning systems. Retrieved from https://arXiv:1711.03016.Google ScholarGoogle Scholar
  254. Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Proceedings of the Design Automation Conference (DAC'17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  255. Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65–76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  256. Yuanlong Xiao, Dongjoon Park, Andrew Butt, Hans Giesen, Zhaoyang Han, Rui Ding, Nevo Magnezi, Raphael Rubin, and André DeHon. 2019. Reducing FPGA compile time with separate compilation for FPGA building blocks. In Proceedings of the International Conference on Field Programmable Technology (FPT'19).Google ScholarGoogle ScholarCross RefCross Ref
  257. Xilinx. 2012. ChipScope Pro Software and Cores (UG029). Retrieved from https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/chipscope_pro_sw_cores_ug029.pdf.Google ScholarGoogle Scholar
  258. Xilinx. 2020. SDNet Packet Processor User Guide. Retrieved from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_1/ug1012-sdnet-packet-processor.pdf.Google ScholarGoogle Scholar
  259. Xilinx. 2020. Vitis High-Level Synthesis User Guide. Retrieved from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_2/ug1399-vitis-hls.pdf.Google ScholarGoogle Scholar
  260. Xilinx. 2020. Zynq UltraScale+ MPSoC. Retrieved from https://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html.Google ScholarGoogle Scholar
  261. Yu Xing, Shuang Liang, Lingzhi Sui, Xijie Jia, Jiantao Qiu, Xin Liu, Yushun Wang, Yi Shan, and Yu Wang. 2019. DNNVM: End-to-end compiler leveraging heterogeneous optimizations on FPGA-based CNN accelerators. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 39, 10, 2668–2681.Google ScholarGoogle ScholarCross RefCross Ref
  262. Li Yang, Zhezhi He, and Deliang Fan. 2018. A fully onchip binarized convolutional neural network FPGA implementation with accurate inference. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  263. Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek, and Kurt Keutzer. 2019. Synetgy: Algorithm-hardware co-design for convnet accelerators on embedded FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'19). Google ScholarGoogle ScholarDigital LibraryDigital Library
  264. Peter Yiannacouras, J. Gregory Steffan, and Jonathan Rose. 2007. Exploration and customization of FPGA-based soft processors. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 26, 2 (2007), 266–277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  265. Cody Hao Yu, Peng Wei, Max Grossman, Peng Zhang, Vivek Sarker, and Jason Cong. 2018. S2FA: An accelerator automation framework for heterogeneous computing in datacenters. In Proceedings of the Design Automation Conference (DAC'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  266. Jason Yu, Guy Lemieux, and Christpher Eagleston. 2008. Vector processing as a soft-core CPU accelerator. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'08). 222–232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  267. David C. Zaretsky, Gaurav Mittal, Robert P. Dick, and Prith Banerjee. 2007. Balanced scheduling and operation chaining in high-level synthesis for FPGA designs. In Proceedings of the International Symposium on Quality Electronic Design (ISQED'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  268. Hanqing Zeng and Viktor Prasanna. 2020. GraphACT: Accelerating GCN training on CPU-FPGA heterogeneous platforms. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'20). Google ScholarGoogle ScholarDigital LibraryDigital Library
  269. Yue Zha and Jing Li. 2020. Virtualizing FPGAs in the cloud. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'20). 845–858. Google ScholarGoogle ScholarDigital LibraryDigital Library
  270. Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  271. Chen Zhang, Guangyu Sun, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2019. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 38, 11 (2019), 2072–2085.Google ScholarGoogle ScholarDigital LibraryDigital Library
  272. Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, and Deming Chen. 2018. DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  273. Yichi Zhang, Junhao Pan, Xinheng Liu, Hongzheng Chen, Deming Chen, and Zhiru Zhang. 2021. FracBNN: Accurate and FPGA-efficient binary neural networks with fractional activations. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'21). Google ScholarGoogle ScholarDigital LibraryDigital Library
  274. Zhiru Zhang and Bin Liu. 2013. SDC-based modulo scheduling for pipeline synthesis. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  275. Jieru Zhao, Liang Feng, Sharad Sinha, Wei Zhang, Yun Liang, and Bingsheng He. 2017. COMBA: A comprehensive model-based analysis framework for high level synthesis of real applications. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  276. Jieru Zhao, Tingyuan Liang, Sharad Sinha, and Wei Zhang. 2019. Machine learning based routing congestion prediction in FPGA high-level synthesis. In Proceedings of the Design, Automation, and Test in Europe (DATE'19).Google ScholarGoogle ScholarCross RefCross Ref
  277. Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Accelerating binarized convolutional neural networks with software-programmable fpgas. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  278. Ritchie Zhao, Mingxing Tan, Steve Dai, and Zhiru Zhang. 2015. Area-efficient pipelining for FPGA-targeted high-level synthesis. In Proceedings of the Design Automation Conference (DAC'15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  279. Zhipeng Zhao and James C. Hoe. 2017. Using vivado-HLS for structural design: A NoC case study. Retrieved from https://arXiv:1710.10290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  280. Guanwen Zhong, Alok Prakash, Yun Liang, Tulika Mitra, and Smail Niar. 2016. Lin-analyzer: A high-level performance analysis tool for FPGA-based accelerators. In Proceedings of the Design Automation Conference (DAC'16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  281. Shijie Zhou, Rajgopal Kannan, Viktor K. Prasanna, Guna Seetharaman, and Qing Wu. 2019. HitGraph: High-throughput graph processing framework on FPGA. IEEE Trans. Parallel Distrib. Syst. 30, 10, 2249–2264.Google ScholarGoogle ScholarCross RefCross Ref
  282. Yuan Zhou, Khalid Musa Al-Hawaj, and Zhiru Zhang. 2017. A new approach to automatic memory banking using trace-based address mining. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  283. Wei Zuo, Peng Li, Deming Chen, Louis-Noël Pouchet, Shunan Zhong, and Jason Cong. 2013. Improving polyhedral code generation for high-level synthesis. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'13). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future Prospects

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              HTML Format

              View this article in HTML Format .

              View HTML Format
              About Cookies On This Site

              We use cookies to ensure that we give you the best experience on our website.

              Learn more

              Got it!