Abstract
FPGA-based accelerators are increasingly popular across a broad range of applications, because they offer massive parallelism, high energy efficiency, and great flexibility for customizations. However, difficulties in programming and integrating FPGAs have hindered their widespread adoption. Since the mid 2000s, there has been extensive research and development toward making FPGAs accessible to software-inclined developers, besides hardware specialists. Many programming models and automated synthesis tools, such as high-level synthesis, have been proposed to tackle this grand challenge. In this survey, we describe the progression and future prospects of the ongoing journey in significantly improving the software programmability of FPGAs. We first provide a taxonomy of the essential techniques for building a high-performance FPGA accelerator, which requires customizations of the compute engines, memory hierarchy, and data representations. We then summarize a rich spectrum of work on programming abstractions and optimizing compilers that provide different trade-offs between performance and productivity. Finally, we highlight several additional challenges and opportunities that deserve extra attention by the community to bring FPGA-based computing to the masses.
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. Retrieved from https://arXiv:1603.04467.Google Scholar
- Mohamed S. Abdelfattah and Vaughn Betz. 2014. Networks-on-Chip for FPGAs: Hard, soft or mixed?ACM Trans. Reconfig. Technol. Syst. 7, 3 (2014), 1–22. Google Scholar
Digital Library
- Mohamed S. Abdelfattah, David Han, Andrew Bitar, Roberto DiCecco, Shane O'Connell, Nitika Shanker, Joseph Chu, Ian Prins, Joshua Fender, Andrew C. Ling, et al. 2018. DLA: Compiler and FPGA overlay for neural network inference acceleration. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'18). 411–4117.Google Scholar
Cross Ref
- Michael Adler, Kermin E. Fleming, Angshuman Parashar, Michael Pellauer, and Joel Emer. 2011. Leap scratchpads: Automatic memory and cache management for reconfigurable logic. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'11). Google Scholar
Digital Library
- Jason Agron. 2009. Domain-specific language for HW/SW Co-Design for FPGAs. In IFIP Working Conference on Domain-Specific Languages. Google Scholar
Digital Library
- Muhammed Al Kadi, Benedikt Janssen, and Michael Huebner. 2016. FGPU: An SIMT-architecture for FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'16). 254–263. Google Scholar
Digital Library
- Mythri Alle, Antoine Morvan, and Steven Derrien. 2013. Runtime dependency analysis for loop pipelining in high-level synthesis. In Proceedings of the Design Automation Conference (DAC'13). Google Scholar
Digital Library
- Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A polyhedral compiler for expressing fast and portable code. In Proceedings of the International Symposium on Code Generation and Optimization (CGO) (2019). Google Scholar
Digital Library
- Samridhi Bansal, Hsuan Hsiao, Tomasz Czajkowski, and Jason H. Anderson. 2018. High-level synthesis of software-customizable floating-point cores. In Proceedings of the Design, Automation, and Test in Europe (DATE'18).Google Scholar
- Cedric Bastoul. 2004. Code generation in the polyhedral model is easier than you think. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'04). Google Scholar
Digital Library
- Shuvra S. Bhattacharyya, Gordon Brebner, Jörn W. Janneck, Johan Eker, Carl Von Platen, Marco Mattavelli, and Mickaël Raulet. 2009. OpenDF: A dataflow toolset for reconfigurable hardware and multicore systems. ACM SIGARCH Comput. Architect. News 36, 5 (2009), 29–35. Google Scholar
Digital Library
- Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46, 5, 720–748. Google Scholar
Digital Library
- David Boland and George A. Constantinides. 2010. Automated precision analysis: A polynomial algebraic approach. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'10). Google Scholar
Digital Library
- David Boland and George A. Constantinides. 2012. A scalable approach for automated precision analysis. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'12). Google Scholar
Digital Library
- Uday Bondhugula, Albert Hartono, Jagannathan Ramanujam, and Ponnuswamy Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'08). Google Scholar
Digital Library
- Uday Bondhugula, Jagannathan Ramanujam, and Ponnuswamy Sadayappan. 2007. Automatic mapping of nested loops to FPGAs. In Proceedings of the ACM SIGPLAN Conference on Principles and Practice of Parallel Programming (PPoPP'07). Google Scholar
Digital Library
- Alexander Brant and Guy G. F. Lemieux. 2012. ZUMA: An open FPGA overlay architecture. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'12). Google Scholar
Digital Library
- Pavan Kumar Bussa, Jeffrey Goeders, and Steven J. E. Wilton. 2017. Accelerating In-System FPGA debug of high-level synthesis circuits using incremental compilation techniques. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'17).Google Scholar
- Cadence. 2020. Stratus High-Level Synthesis. Retrieved from https://www.cadence.com/content/dam/cadence-www/global/en_US/documents/tools/digital-design-signoff/stratus-ds.pdf.Google Scholar
- Nazanin Calagar, Stephen D. Brown, and Jason H. Anderson. 2014. Source-level debugging for FPGA high-level synthesis. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'14). 1–8.Google Scholar
- Andrew Canis, Jason H. Anderson, and Stephen D. Brown. 2013. Multi-pumping for resource reduction in FPGA high-level synthesis. In Proceedings of the Design, Automation, and Test in Europe (DATE'13). Google Scholar
Digital Library
- Andrew Canis, Stephen D. Brown, and Jason H. Anderson. 2014. Modulo SDC scheduling with recurrence minimization in high-level synthesis. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'14).Google Scholar
- Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'11). Google Scholar
Digital Library
- Andrew Canis, Jongsok Choi, Blair Fort, Ruolong Lian, Qijing Huang, Nazanin Calagar, Marcel Gort, Jia Jun Qin, Mark Aldham, Tomasz Czajkowski, et al. 2013. From software to accelerators with LegUp high-level synthesis. In Proceedings of the International Conference on Compilers, Architectures and Synthesis of Embedded Systems (CASES'13). Google Scholar
Digital Library
- Zachariah Carmichael, Hamed F. Langroudi, Char Khazanov, Jeffrey Lillie, John L. Gustafson, and Dhireesha Kudithipudi. 2019. Deep positron: A deep neural network using the posit number system. In Proceedings of the Design, Automation, and Test in Europe (DATE'19).Google Scholar
Cross Ref
- Adrian M. Caulfield, Eric S. Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, et al. 2016. A cloud-scale acceleration architecture. In Proceedings of the International Symposium on Microarchitecture (MICRO'16). Google Scholar
Digital Library
- Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI'18). Google Scholar
Digital Library
- Tao Chen, Shreesha Srinath, Christopher Batten, and G. Edward Suh. 2018. An architectural framework for accelerating dynamic parallel algorithms on reconfigurable hardware. In Proceedings of the International Symposium on Microarchitecture (MICRO'18). Google Scholar
Digital Library
- Tao Chen and G. Edward Suh. 2016. Efficient data supply for hardware accelerators with prefetching and access/execute decoupling. In Proceedings of the International Symposium on Microarchitecture (MICRO'16). Google Scholar
Digital Library
- Xinyu Chen, Ronak Bajaj, Yao Chen, Jiong He, Bingsheng He, Weng-Fai Wong, and Deming Chen. 2019. On-the-fly parallel data shuffling for graph processing on OpenCL-based FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'19).Google Scholar
Cross Ref
- Yao Chen, Swathi T. Gurumani, Yun Liang, Guofeng Li, Donghui Guo, Kyle Rupnow, and Deming Chen. 2016. FCUDA-NoC: A scalable and efficient network-on-chip implementation for the CUDA-to-FPGA flow. IEEE Trans. Very Large Scale Integr. Syst. 24, 6 (2016), 2220–2233.Google Scholar
Digital Library
- Yu Ting Chen and Jason H. Anderson. 2017. Automated generation of banked memory architectures in the high-level synthesis of multi-threaded software. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'17).Google Scholar
- Yu-Ting Chen, Jason Cong, Zhenman Fang, Jie Lei, and Peng Wei. 2016. When spark meets FPGAs: A case study for next-generation DNA sequencing acceleration. In Proceedings of the Workshop on Hot Topics in Cloud Computing (HotCloud'16). Google Scholar
Digital Library
- Jianyi Cheng, Lana Josipovic, George A. Constantinides, Paolo Ienne, and John Wickerson. 2020. Combining dynamic & static scheduling in high-level synthesis. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'20). Google Scholar
Digital Library
- Yuze Chi, Jason Cong, Peng Wei, and Peipei Zhou. 2018. SODA: Stencil with optimized dataflow architecture. In Proceedings of the International Conference on Computer-Aided Design (ICCAD) (2018). Google Scholar
Digital Library
- Yuze Chi, Licheng Guo, Young-kyu Choi, Jie Wang, and Jason Cong. 2021. Extending high-level synthesis for task-parallel programs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'21). Google Scholar
Digital Library
- Jongsok Choi, Stephen Brown, and Jason Anderson. 2013. From software threads to parallel hardware in high-level synthesis for FPGAs. In Proceedings of the International Conference on Field Programmable Technology (FPT'13).Google Scholar
Cross Ref
- Jongsok Choi, Kevin Nam, Andrew Canis, Jason Anderson, Stephen Brown, and Tomasz Czajkowski. 2012. Impact of cache architecture and interface on performance and area of FPGA-based processor/parallel-accelerator systems. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'12). Google Scholar
Digital Library
- Young-kyu Choi, Yuze Chi, Weikang Qiao, Nikola Samardzic, and Jason Cong. 2021. HBM connect: High-performance HLS interconnect for FPGA HBM. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'21). Google Scholar
Digital Library
- Young-Kyu Choi, Yuze Chi, Jie Wang, and Jason Cong. 2020. FLASH: Fast, parallel, and accurate simulator for HLS. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. (2020).Google Scholar
- Young-kyu Choi and Jason Cong. 2017. HLScope: High-level performance debugging for FPGA designs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'17).Google Scholar
- Young-kyu Choi and Jason Cong. 2018. HLS-based optimization and design space exploration for applications with variable loop bounds. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'18). Google Scholar
Digital Library
- Young-kyu Choi, Jason Cong, Zhenman Fang, Yuchen Hao, Glenn Reinman, and Peng Wei. 2016. A quantitative analysis on microarchitectures of modern CPU-FPGA platforms. In Proceedings of the Design Automation Conference (DAC'16). Google Scholar
Digital Library
- Young-Kyu Choi, Jason Cong, Zhenman Fang, Yuchen Hao, Glenn Reinman, and Peng Wei. 2019. In-depth analysis on microarchitectures of modern heterogeneous CPU-FPGA platforms. ACM Trans. Reconfig. Technol. Syst. (2019). Google Scholar
Digital Library
- Young-kyu Choi, Peng Zhang, Peng Li, and Jason Cong. 2017. HLScope+: Fast and accurate performance estimation for FPGA HLS. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'17). Google Scholar
Digital Library
- Christopher H. Chou, Aaron Severance, Alex D. Brant, Zhiduo Liu, Saurabh Sant, and Guy G. F. Lemieux. 2011. VEGAS: Soft vector processor with scratchpad memory. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'11). Google Scholar
Digital Library
- Eric S. Chung, James C. Hoe, and Ken Mai. 2011. CoRAM: An in-fabric memory architecture for FPGA-based computing. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'11). Google Scholar
Digital Library
- Alessandro Cilardo and Luca Gallo. 2015. Improving multibank memory access parallelism with lattice-based partitioning. ACM Trans. Architect. Code Optimiz. 11, 4 (2015). Google Scholar
Digital Library
- Albert Cohen, Marc Sigler, Sylvain Girbal, Olivier Temam, David Parello, and Nicolas Vasilache. 2005. Facilitating the search for compositions of program transformations. In Proceedings of the International Symposium on Supercomputing (ICS'05). Google Scholar
Digital Library
- Jason Cong, Zhenman Fang, Yuchen Hao, Peng Wei, Cody Hao Yu, Chen Zhang, and Peipei Zhou. 2018. Best-effort FPGA programming: A few steps can go a long way. Retrieved from https://arXiv:1807.01340.Google Scholar
- Jason Cong, Zhenman Fang, Muhuan Huang, Libo Wang, and Di Wu. 2017. CPU-FPGA coscheduling for big data applications. IEEE Design Test 35, 1 (2017), 16–22.Google Scholar
Cross Ref
- Jason Cong, Zhenman Fang, Muhuan Huang, Peng Wei, Di Wu, and Cody Hao Yu. 2018. Customizable computing–from single chip to datacenters. Proc. IEEE 107, 1 (2018), 185–203.Google Scholar
Cross Ref
- Jason Cong, Zhenman Fang, Michael Lo, Hanrui Wang, Jingxian Xu, and Shaochong Zhang. 2018. Understanding performance differences of FPGAs and GPUs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google Scholar
- Jason Cong, Muhuan Huang, Peichen Pan, Yuxin Wang, and Peng Zhang. 2016. Source-to-source optimization for HLS. FPGAs Softw. Program. (2016).Google Scholar
- Jason Cong, Wei Jiang, Bin Liu, and Yi Zou. 2011. Automatic memory partitioning and scheduling for throughput and power optimization. ACM Trans. Design Autom. Electron. Syst. 16, 2 (2011), 1–25. Google Scholar
Digital Library
- Jason Cong, Peng Li, Bingjun Xiao, and Peng Zhang. 2016. An optimal microarchitecture for stencil computation acceleration based on nonuniform partitioning of data reuse buffers. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 35, 3 (2016), 407–418. Google Scholar
Digital Library
- Jason Cong, Bin Liu, Stephen Neuendorffer, Juanjo Noguera, Kees Vissers, and Zhiru Zhang. 2011. High-level synthesis for FPGAs: From prototyping to deployment. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 30, 4 (2011), 473–491. Google Scholar
Digital Library
- Jason Cong and Jie Wang. 2018. PolySA: Polyhedral-based systolic array auto-compilation. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'18). Google Scholar
Digital Library
- Jason Cong, Peng Wei, Cody Hao Yu, and Peng Zhang. 2018. Automated accelerator generation and optimization with composable, parallel and pipeline architecture. In Proceedings of the Design Automation Conference (DAC'18). Google Scholar
Digital Library
- Jason Cong, Peng Wei, Cody Hao Yu, and Peipei Zhou. 2017. Bandwidth optimization through on-chip memory restructuring for HLS. In Proceedings of the Design Automation Conference (DAC'17). Google Scholar
Digital Library
- Jason Cong, Peng Wei, Cody Hao Yu, and Peipei Zhou. 2018. Latte: Locality aware transformation for high-level synthesis. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google Scholar
Cross Ref
- Jason Cong and Zhiru Zhang. 2006. An efficient and versatile scheduling algorithm based on SDC formulation. In Proceedings of the Design Automation Conference (DAC'06). Google Scholar
Digital Library
- James Coole and Greg Stitt. 2010. Intermediate fabrics: Virtual architectures for circuit portability and fast placement and routing. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'10). Google Scholar
Digital Library
- Philippe Coussy, Cyrille Chavet, Pierre Bomel, Dominique Heller, Eric Senn, and Eric Martin. 2008. GAUT: A high-level synthesis tool for DSP applications. High-Level Synth. (2008).Google Scholar
- John Curreri, Seth Koehler, Alan D. George, Brian Holland, and Rafael Garcia. 2010. Performance analysis framework for high-level language applications in reconfigurable computing. ACM Trans. Reconfig. Technol. Syst. 3, 1 (2010), 1–23. Google Scholar
Digital Library
- Tomasz S. Czajkowski, Utku Aydonat, Dmitry Denisenko, John Freeman, Michael Kinsner, David Neto, Jason Wong, Peter Yiannacouras, and Deshanand P. Singh. 2012. From OpenCL to high-performance hardware on FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'12).Google Scholar
- Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. 2016. FPGP: Graph processing framework on FPGA a case study of breadth-first search. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'16). Google Scholar
Digital Library
- Steve Dai, Gai Liu, and Zhiru Zhang. 2018. A scalable approach to exact resource-constrained scheduling based on a joint SDC and SAT formulation. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'18). Google Scholar
Digital Library
- Steve Dai, Gai Liu, Ritchie Zhao, and Zhiru Zhang. 2017. Enabling adaptive loop pipelining in high-level synthesis. In Proceedings of the Asilomar Conference on Signals, Systems, and Computers.Google Scholar
Cross Ref
- Steve Dai, Mingxing Tan, Kecheng Hao, and Zhiru Zhang. 2014. Flushing-enabled loop pipelining for high-level synthesis. In Proceedings of the Design Automation Conference (DAC'14). Google Scholar
Digital Library
- Steve Dai, Ritchie Zhao, Gai Liu, Shreesha Srinath, Udit Gupta, Christopher Batten, and Zhiru Zhang. 2017. Dynamic hazard resolution for pipelining irregular loops in high-level synthesis. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'17). Google Scholar
Digital Library
- Steve Dai, Yuan Zhou, Hang Zhang, Ecenur Ustun, Evangeline F. Y. Young, and Zhiru Zhang. 2018. Fast and accurate estimation of quality of results in high-level synthesis with machine learning. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google Scholar
Cross Ref
- Luka Daoud, Dawid Zydek, and Henry Selvaraj. 2014. A survey of high level synthesis languages, tools, and compilers for reconfigurable high performance computing. Adv. Syst. Sci. (2014).Google Scholar
- Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao, Ming Liu, Jeremy Fowers, Kalin Ovtcharov, Anna Vinogradsky, Sarah Massengill, Lita Yang, Ray Bittner, et al. 2020. Pushing the limits of narrow precision inferencing at cloud scale with microsoft floating point. Adv. Neural Info. Process. Syst. (2020).Google Scholar
- Florent De Dinechin and Bogdan Pasca. 2011. Designing custom arithmetic data paths with FloPoCo. IEEE Design Test Comput. 28, 4 (2011), 18–27. Google Scholar
Digital Library
- Luiz Henrique De Figueiredo and Jorge Stolfi. 2004. Affine arithmetic: Concepts and applications. Numer. Algor. (2004).Google Scholar
- Johannes de Fine Licht, Simon Meierhans, and Torsten Hoefler. 2018. Transformations of high-level synthesis codes for high-performance computing. Retrieved from https://arXiv:1805.08288.Google Scholar
- Steven Derrien, Thibaut Marty, Simon Rokicki, and Tomofumi Yuki. 2020. Toward speculative loop pipelining for high-level synthesis. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 39, 11 (2020), 4229–4239.Google Scholar
Cross Ref
- Javier Duarte, Song Han, Philip Harris, Sergo Jindariani, Edward Kreinar, Benjamin Kreis, Jennifer Ngadiuba, Maurizio Pierini, Ryan Rivera, Nhan Tran, et al. 2018. Fast inference of deep neural networks in FPGAs for particle physics. J. Instrument. (2018).Google Scholar
- David Durst, Matthew Feldman, Dillon Huff, David Akeley, Ross Daly, Gilbert Louis Bernstein, Marco Patrignani, Kayvon Fatahalian, and Pat Hanrahan. 2020. Type-directed scheduling of streaming accelerators. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'20). Google Scholar
Digital Library
- Stephen A. Edwards, Richard Townsend, Martha Barker, and Martha A. Kim. 2019. Compositional dataflow circuits. ACM Trans. Embed. Comput. Syst. 18, 1 (2019), 1–27. Google Scholar
Digital Library
- Johan Eker and J. Janneck. 2003. CAL language report: Specification of the CAL actor language. ERL Tech. Memo UCB/ERL (2003).Google Scholar
- Fatemeh Eslami and Steven J. E. Wilton. 2018. Rapid triggering capability using an adaptive overlay during FPGA debug. ACM Trans. Design Autom. Electron. Syst. 23, 6 (2018), 1–25. Google Scholar
Digital Library
- Zhenman Fang, Farnoosh Javadi, Jason Cong, and Glenn Reinman. 2019. Understanding performance gains of accelerator-rich architectures. In Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors (ASAP'19).Google Scholar
Cross Ref
- Lorenzo Ferretti, Jihye Kwon, Giovanni Ansaloni, Giuseppe Di Guglielmo, Luca P. Carloni, and Laura Pozzi. 2020. Leveraging prior knowledge for effective design-space exploration in high-level synthesis. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 39, 11 (2020), 3736–3747.Google Scholar
Cross Ref
- Pietro Fezzardi, Marco Lattuada, and Fabrizio Ferrandi. 2017. Using efficient path profiling to optimize memory consumption of on-chip debugging for high-level synthesis. ACM Trans. Embed. Comput. Syst. 16, 5s (2017), 1–22. Google Scholar
Digital Library
- Christian Fobel, Gary Grewal, and Deborah Stacey. 2014. A scalable, serially equivalent, high-quality parallel placement methodology suitable for modern multicore and GPU architectures. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'14).Google Scholar
Cross Ref
- Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, et al. 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of the International Symposium on Computer Architecture (ISCA'18). Google Scholar
Digital Library
- Tushar Garg, Saud Wasly, Rodolfo Pellizzoni, and Nachiket Kapre. 2020. HopliteBuf: Network calculus-based design of FPGA NoCs with provably stall-free FIFOs. ACM Trans. Reconfig. Technol. Syst. 13, 2 (2020). Google Scholar
Digital Library
- Mohammad Ghasemzadeh, Mohammad Samragh, and Farinaz Koushanfar. 2018. ReBNet: Residual binarized neural network. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google Scholar
Cross Ref
- Jeffrey Goeders and Steven J. E. Wilton. 2014. Effective FPGA debug for high-level synthesis generated circuits. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'14).Google Scholar
- Jeffrey Goeders and Steven J. E. Wilton. 2016. Signal-tracing techniques for In-System FPGA debugging of high-level synthesis circuits. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 36, 1 (2016), 83–96. Google Scholar
Digital Library
- Jeffrey B. Goeders, Guy G. F. Lemieux, and Steven J. E. Wilton. 2011. Deterministic timing-driven parallel placement by simulated annealing using half-box window decomposition. In Proceedings of the International Conference on Reconfigruable Computing and FPGAs (ReConFig'11). Google Scholar
Digital Library
- Marcel Gort and Jason H. Anderson. 2010. Deterministic multi-core parallel routing for FPGAs. In Proceedings of the International Conference on Field Programmable Technology (FPT'10).Google Scholar
- Marcel Gort and Jason H. Anderson. 2013. Range and bitmask analysis for hardware optimization in high-level synthesis. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'13).Google Scholar
- Ian Gray, Yu Chan, Jamie Garside, Neil Audsley, and Andy Wellings. 2015. Transparent hardware synthesis of Java for predictable large-scale distributed systems. Retrieved from https://arXiv:1508.07142.Google Scholar
- Paul Grigoraş, Xinyu Niu, Jose G. F. Coutinho, Wayne Luk, Jacob Bower, and Oliver Pell. 2013. Aspect driven compilation for dataflow designs. In Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors (ASAP'13). Google Scholar
Digital Library
- Tobias Grosser, Armin Groesslinger, and Christian Lengauer. 2012. Polly–performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. (2012).Google Scholar
- Sikender Gul, Muhammad Faisal Siddiqui, and Naveed Ur Rehman. 2019. FPGA based real-time implementation of online EMD with fixed point architecture. IEEE Access 7 (2019), 176565–176577.Google Scholar
Cross Ref
- Licheng Guo, Yuze Chi, Jie Wang, Jason Lau, Weikang Qiao, Ecenur Ustun, Zhiru Zhang, and Jason Cong. 2021. AutoBridge: Coupling coarse-grained floorplanning and pipelining for high-frequency HLS design on multi-die FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'21). Google Scholar
Digital Library
- Licheng Guo, Jason Lau, Yuze Chi, Jie Wang, Cody Hao Yu, Zhe Chen, Zhiru Zhang, and Jason Cong. 2020. Analysis and optimization of the implicit broadcasts in FPGA HLS to improve maximum frequency. In Proceedings of the Design Automation Conference (DAC'20). Google Scholar
Digital Library
- Licheng Guo, Jason Lau, Zhenyuan Ruan, Peng Wei, and Jason Cong. 2019. Hardware acceleration of long read pairwise overlapping in genome sequencing: A race between FPGA and GPU. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'19).Google Scholar
Cross Ref
- Peng Guo, Hong Ma, Ruizhi Chen, Pin Li, Shaolin Xie, and Donglin Wang. 2018. FBNA: A fully binarized neural network accelerator. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'18).Google Scholar
Cross Ref
- Tae Jun Ham, Juan L. Aragón, and Margaret Martonosi. 2017. Decoupling data supply from computation for latency-tolerant communication in heterogeneous architectures. ACM Trans. Architect. Code Optimiz. 14, 2 (2017), 1–27. Google Scholar
Digital Library
- Mohamed Ben Hammouda, Philippe Coussy, and Loïc Lagadec. 2014. A design approach to automatically synthesize ANSI-C assertions during high-level synthesis of hardware accelerators. In Proceedings of the International Symposium on Circuits and Systems (ISCAS'14).Google Scholar
Cross Ref
- Frank Hannig, Holger Ruckdeschel, Hritam Dutta, and Jürgen Teich. 2008. PARO: Synthesis of hardware accelerators for multi-dimensional dataflow-intensive applications. In Proceedings of the International Workshop on Applied Reconfigurable Computing (ARC'08). Google Scholar
Digital Library
- James Hegarty, John Brunhaver, Zachary DeVito, Jonathan Ragan-Kelley, Noy Cohen, Steven Bell, Artem Vasilyev, Mark Horowitz, and Pat Hanrahan. 2014. Darkroom: Compiling high-level image processing code into hardware pipelines. ACM Trans. Graph. 33, 4 (2014), 144:1–144:11. Google Scholar
Digital Library
- James Hegarty, Ross Daly, Zachary DeVito, Jonathan Ragan-Kelley, Mark Horowitz, and Pat Hanrahan. 2016. Rigel: Flexible multi-rate image processing hardware. ACM Trans. Graph. 25, 4 (2016), 1–11. Google Scholar
Digital Library
- Timothy Hickey, Qun Ju, and Maarten H. Van Emden. 2001. Interval arithmetic: From principles to implementation. J. ACM 48, 5 (2001), 1038–1068. Google Scholar
Digital Library
- Daniel Holanda Noronha, Ruizhe Zhao, Jeff Goeders, Wayne Luk, and Steven J. E. Wilton. 2019. On-Chip FPGA debug instrumentation for machine learning applications. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'19). Google Scholar
Digital Library
- Chin Hau Hoo and Akash Kumar. 2018. ParaDRo: A parallel deterministic router based on spatial partitioning and scheduling. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'18). Google Scholar
Digital Library
- Amir Hormati, Manjunath Kudlur, Scott Mahlke, David Bacon, and Rodric Rabbah. 2008. Optimus: Efficient realization of streaming applications on FPGAs. In Proceedings of the International Conference on Compilers, Architectures and Synthesis of Embedded Systems (CASES'08). Google Scholar
Digital Library
- Hsuan Hsiao and Jason Anderson. 2019. Thread weaving: Static resource scheduling for multithreaded high-level synthesis. In Proceedings of the Design Automation Conference (DAC'19). Google Scholar
Digital Library
- Bohu Huang and Haibin Zhang. 2013. Application of multi-core parallel computing in FPGA placement. In Proceedings of the International Symposium on Instrumentation and Measurement, Sensor Network and Automation (IMSNA'13).Google Scholar
Cross Ref
- Muhuan Huang, Di Wu, Cody Hao Yu, Zhenman Fang, Matteo Interlandi, Tyson Condie, and Jason Cong. 2016. Programming and runtime support to blaze FPGA accelerator deployment at datacenter scale. In Proceedings of the ACM Symposium on Cloud Computing. Google Scholar
Digital Library
- Sitao Huang, Gowthami Jayashri Manikandan, Anand Ramachandran, Kyle Rupnow, Wen-mei W Hwu, and Deming Chen. 2017. Hardware acceleration of the pair-HMM algorithm for DNA variant calling. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'17). Google Scholar
Digital Library
- Yuanjie Huang, Paolo Ienne, Olivier Temam, Yunji Chen, and Chengyong Wu. 2013. Elastic CGRAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'13). Google Scholar
Digital Library
- Stephen Ibanez, Gordon Brebner, Nick McKeown, and Noa Zilberman. 2019. The P4->NetFPGA workflow for line-rate packet processing. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'19). Google Scholar
Digital Library
- Mohsen Imani, Samuel Bosch, Sohum Datta, Sharadhi Ramakrishna, Sahand Salamat, Jan M. Rabaey, and Tajana Rosing. 2019. QuantHD: A quantization framework for hyperdimensional computing. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 39, 10 (2019), 2268–2278.Google Scholar
Cross Ref
- Mohsen Imani, Sahand Salamat, Behnam Khaleghi, Mohammad Samragh, Farinaz Koushanfar, and Tajana Rosing. 2019. SparseHD: Algorithm-hardware co-optimization for efficient high-dimensional computing. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'19).Google Scholar
Cross Ref
- Intel. 2019. Intel Agilex F-Series FPGAs & SoCs. Retrieved from https://www.intel.com/content/www/us/en/products/programmable/fpga/agilex/f-series.html.Google Scholar
- Intel. 2020. Intel High Level Synthesis Compiler Pro Edition: Reference Manual. Retrieved from https://www.intel.com/content/www/us/en/programmable/documentation/ewa1462824960255.html.Google Scholar
- Intel. 2020. Intel SoC FPGAs. Retrieved from https://www.intel.ca/content/www/ca/en/products/programmable/soc.html.Google Scholar
- Intel. 2020. The oneAPI Specification. Retrieved from https://www.oneapi.com/.Google Scholar
- Christian Iseli and Eduardo Sanchez. 1993. Spyder: A reconfigurable VLIW processor using FPGAs. In Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines.Google Scholar
Cross Ref
- Asif Islam and Nachiket Kapre. 2018. LegUp-NoC: High-level synthesis of loops with indirect addressing. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google Scholar
Cross Ref
- Manish Kumar Jaiswal and Ray C. C. Cheung. 2013. Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support. Microelectr. J. 44, 5 (2013), 421–430. Google Scholar
Digital Library
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the International Conference on Multimedia. Google Scholar
Digital Library
- Jiantong Jiang, Zeke Wang, Xue Liu, Juan Gómez-Luna, Nan Guan, Qingxu Deng, Wei Zhang, and Onur Mutlu. 2020. Boyi: A systematic framework for automatically deciding the right execution model of OpenCL applications on FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'20). Google Scholar
Digital Library
- Lana Josipovic, Philip Brisk, and Paolo Ienne. 2017. An out-of-order load-store queue for spatial computing. ACM Trans. Embed. Comput. Syst. 16, 5s (2017), 1–19. Google Scholar
Digital Library
- Lana Josipović, Radhika Ghosal, and Paolo Ienne. 2018. Dynamically scheduled high-level synthesis. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'18). Google Scholar
Digital Library
- Lana Josipovic, Andrea Guerrieri, and Paolo Ienne. 2019. Speculative dataflow circuits. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'19). Google Scholar
Digital Library
- Juniper. 2020. Juniper: Java Platform for High-performance and Real-time Large-scale Data. Retrieved from http:// www.juniper-project.org/.Google Scholar
- Nachiket Kapre et al. 2018. Hoplite-Q: Priority-aware routing in FPGA overlay NoCs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google Scholar
- Nachiket Kapre and Jan Gray. 2015. Hoplite: Building austere overlay NoCs for FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'15).Google Scholar
Cross Ref
- Nachiket Kapre and Deheng Ye. 2016. GPU-Accelerated high-level synthesis for bitwidth optimization of FPGA datapaths. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'16). Google Scholar
Digital Library
- Soguy Mak karé Gueye, Gwenaël Delaval, Eric Rutten, Dominique Heller, and Jean-Philippe Diguet. 2018. A domain-specific language for autonomic managers in FPGA reconfigurable architectures. In Proceedings of the International Conference on Autonomic Computing (ICAC'18).Google Scholar
Cross Ref
- Ryan Kastner, Janarbek Matai, and Stephen Neuendorffer. 2018. Parallel programming for FPGAs. Retrieved from https://arXiv:1805.03648.Google Scholar
- Keras. 2020. Keras. Simple. Flexible. Powerful.Retrieved from https://keras.io/.Google Scholar
- Ronan Keryell and Lin-Ya Yu. 2018. Early experiments using SYCL single-source modern C++ on Xilinx FPGA: Extended Abstract of technical presentation. In Proceedings of the International Workshop on OpenCL. Google Scholar
Digital Library
- Ahmed Khawaja, Joshua Landgraf, Rohith Prakash, Michael Wei, Eric Schkufza, and Christopher J. Rossbach. 2018. Sharing, protection, and compatibility for reconfigurable fabric with amorphos. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI'18). Google Scholar
Digital Library
- Soroosh Khoram, Jialiang Zhang, Maxwell Strange, and Jing Li. 2018. Accelerating graph analytics by co-optimizing storage and access on an FPGA-HMC platform. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'18). Google Scholar
Digital Library
- Jeffrey Kingyens and J. Gregory Steffan. 2011. The potential for a GPU-Like overlay architecture for FPGAs. Intl. J. Reconfig. Comput. (2011).Google Scholar
- Adam B. Kinsman and Nicola Nicolici. 2009. Finite precision bit-width allocation using SAT-Modulo theory. In Design, Automation, and Test in Europe (DATE'09). Google Scholar
Digital Library
- Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The tensor algebra compiler. Proc. ACM Program. Lang. (2017). Google Scholar
Digital Library
- Ana Klimovic and Jason H. Anderson. 2013. Bitwidth-optimized hardware accelerators with software fallback. In Proceedings of the International Conference on Field Programmable Technology (FPT'13).Google Scholar
- David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis et al. 2018. Spatial: A language and compiler for application accelerators. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'18). Google Scholar
Digital Library
- David Koeplinger, Raghu Prabhakar, Yaqi Zhang, Christina Delimitrou, Christos Kozyrakis, and Kunle Olukotun. 2016. Automatic generation of efficient accelerators for reconfigurable hardware. In Proceedings of the International Symposium on Computer Architecture (ISCA'16). Google Scholar
Digital Library
- Maciej Kurek, Tobias Becker, Thomas C. P. Chau, and Wayne Luk. 2014. Automating optimization of reconfigurable designs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'14). Google Scholar
Digital Library
- Yi-Hsiang Lai, Yuze Chi, Yuwei Hu, Jie Wang, Cody Hao Yu, Yuan Zhou, Jason Cong, and Zhiru Zhang. 2019. HeteroCL: A multi-paradigm programming infrastructure for software-defined reconfigurable computing. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'19). Google Scholar
Digital Library
- Yi-Hsiang Lai, Hongbo Rong, Size Zheng, Weihao Zhang, Xiuping Cui, Yunshan Jia, Jie Wang, Brendan Sullivan, Zhiru Zhang, Yun Liang, et al. 2020. SuSy: A programming model for productive construction of high-performance systolic arrays on FPGAs. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'20). Google Scholar
Digital Library
- Chris Lavin and Alireza Kaviani. 2018. RapidWright: Enabling custom crafted implementations for FPGAs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google Scholar
Cross Ref
- D.-U. Lee, Altaf Abdul Gaffar, Ray C. C. Cheung, Oskar Mencer, Wayne Luk, and George A. Constantinides. 2006. Accuracy-guaranteed bit-width optimization. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 25, 10 (2006), 1990–2000. Google Scholar
Digital Library
- David M. Lewis, Marcus H. van Ierssel, and Daniel H. Wong. 1993. A field programmable accelerator for compiled-code applications. In Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines.Google Scholar
- Peng Li, Louis-Noël Pouchet, and Jason Cong. 2014. Throughput optimization for high-level synthesis using resource constraints. In Proceedings of the International Workshop on Polyhedral Compilation Techniques (IMPACT'14).Google Scholar
- Wuxi Li, Meng Li, Jiajun Wang, and David Z. Pan. 2017. UTPlaceF 3.0: A parallelization framework for modern FPGA global placement. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'17). Google Scholar
Digital Library
- Shuang Liang, Shouyi Yin, Leibo Liu, Wayne Luk, and Shaojun Wei. 2018. FP-BNN: Binarized neural network on FPGA. Neurocomputing (2018).Google Scholar
- Yibo Lin, Zixuan Jiang, Jiaqi Gu, Wuxi Li, Shounak Dhar, Haoxing Ren, Brucek Khailany, and David Z. Pan. 2020. DREAMPlace: Deep learning toolkit-enabled GPU acceleration for modern VLSI placement. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 40, 4 (2020), 748–761.Google Scholar
Cross Ref
- Junyi Liu, Samuel Bayliss, and George A. Constantinides. 2015. Offline synthesis of online dependence testing: Parametric loop pipelining for HLS. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'15). Google Scholar
Digital Library
- Ji Liu, Abdullah-Al Kafi, Xipeng Shen, and Huiyang Zhou. 2020. MKPipe: A compiler framework for optimizing multi-kernel workloads in OpenCL for FPGA. In Proceedings of the International Symposium on Supercomputing (ICS'20). Google Scholar
Digital Library
- Junyi Liu, John Wickerson, Samuel Bayliss, and George A. Constantinides. 2017. Polyhedral-based dynamic loop pipelining for high-level synthesis. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 37, 9 (2017), 1802–1815. Google Scholar
Digital Library
- Leo Liu, Jay Weng, and Nachiket Kapre. 2019. RapidRoute: Fast assembly of communication structures for FPGA overlays. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'19).Google Scholar
Cross Ref
- Qiang Liu, George A. Constantinides, Konstantinos Masselos, and Peter Y. K. Cheung. 2007. Automatic on-chip memory minimization for data reuse. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'07). Google Scholar
Digital Library
- Charles Lo and Paul Chow. 2016. Model-based optimization of high level synthesis directives. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'16).Google Scholar
Cross Ref
- Charles Lo and Paul Chow. 2018. Multi-fidelity optimization for high-level synthesis directives. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'18).Google Scholar
Cross Ref
- Charles Lo and Paul Chow. 2020. Hierarchical modelling of generators in design-space exploration. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'20).Google Scholar
Cross Ref
- Alec Lu, Zhenman Fang, Weihua Liu, and Lesley Shannon. 2021. Demystifying the memory system of modern datacenter FPGAs for software programmers through microbenchmarking. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'21). Google Scholar
Digital Library
- Adrian Ludwin and Vaughn Betz. 2011. Efficient and deterministic parallel placement for FPGAs. ACM Trans. Design Autom. Electron. Syst. 16, 3 (2011), 1–23. Google Scholar
Digital Library
- Adrian Ludwin, Vaughn Betz, and Ketan Padalia. 2008. High-quality, deterministic parallel placement for FPGAs on commodity hardware. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'08). Google Scholar
Digital Library
- Rui Ma, Jia-Ching Hsu, Tian Tan, Eriko Nurvitadhi, David Sheffield, Rob Pelt, Martin Langhammer, Jaewoong Sim, Aravind Dasu, and Derek Chiou. 2019. Specializing FGPU for persistent deep learning. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'19). 326–333.Google Scholar
Cross Ref
- Xiaoyin Ma, Walid A. Najjar, and Amit K. Roy-Chowdhury. 2015. Evaluation and acceleration of high-throughput fixed-point object detection on FPGAs. IEEE Trans. Circ. Syst. Video Technol. 25, 6 (2015), 1051–1062.Google Scholar
Digital Library
- Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, and Hadi Esmaeilzadeh. 2016. TABLA: A unified template-based framework for accelerating statistical machine learning. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA'16).Google Scholar
Cross Ref
- Hosein Mohammadi Makrani, Farnoud Farahmand, Hossein Sayadi, Sara Bondi, Sai Manoj Pudukotai Dinakarrao, Houman Homayoun, and Setareh Rafatirad. 2019. Pyramid: Machine learning framework to estimate the optimal timing and resource usage of a high-level synthesis design. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'19).Google Scholar
- Maxeler. 2020. Maxeler High-performance Dataflow Computing Systems. Retrieved from https://www.maxeler.com/products/software/maxcompiler/.Google Scholar
- Séamas McGettrick, Kunjan Patel, and Chris Bleakley. 2011. High performance programmable FPGA overlay for digital signal processing. In Proceedings of the International Conference on Reconfigurable Computing: Architectures, Tools and Applications (ARC'11). Google Scholar
Digital Library
- Atefeh Mehrabi, Aninda Manocha, Benjamin C. Lee, and Daniel J. Sorin. 2020. Prospector: Synthesizing efficient accelerators via statistical learning. In Proceedings of the Design, Automation, and Test in Europe (DATE'20). Google Scholar
Digital Library
- Richard Membarth, Oliver Reiche, Frank Hannig, Jürgen Teich, Mario Körner, and Wieland Eckert. 2016. Hipacc: A domain-specific language and compiler for image processing. IEEE Trans. Parallel Distrib. Syst. 27, 1 (2016), 210–224. Google Scholar
Digital Library
- Mentor. 2020. Catapult High-Level Synthesis. Retrieved from https://s3.amazonaws.com/s3.mentor.com/public_documents/datasheet/hls-lp/catapult-high-level-synthesis.pdf.Google Scholar
- Microchip. 2020. LegUp 9.1 Documentation. Retrieved from https://download-soc.microsemi.com/FPGA/HLS-EAP/docs/legup-9.1-docs/index.html.Google Scholar
- Microchip. 2020. Microchip Acquires High-Level Synthesis Tool Provider LegUp to Simplify Development of PolarFire FPGA-based Edge Compute Solutions. Retrieved from https://www.microchip.com/en-us/about/news-releases/products/microchip-acquires-high-level-synthesis-tool-provider-legup.Google Scholar
- Microsoft. 2020. A Microsoft Custom Data Type for Efficient Inference. Retrieved from https://www.microsoft.com/en-us/research/blog/a-microsoft-custom-data-type-for-efficient-inference/.Google Scholar
- Peter Milder, Franz Franchetti, James C. Hoe, and Markus Püschel. 2012. Computer generation of hardware for linear digital signal processing transforms. ACM Trans. Design Autom. Electron. Syst. 16, 3 (2012), 1–23. Google Scholar
Digital Library
- Yehdhih Moctar, Mirjana Stojilović, and Philip Brisk. 2018. Deterministic parallel routing for FPGAs based on galois parallel execution model. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'18).Google Scholar
Cross Ref
- Joshua S. Monson and Brad Hutchings. 2014. New approaches for in-system debug of behaviorally synthesized FPGA circuits. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'14).Google Scholar
- Joshua S. Monson and Brad L. Hutchings. 2015. Using source-level transformations to improve high-level synthesis debug and validation on FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'15). Google Scholar
Digital Library
- Joshua S. Monson and Brad L. Hutchings. 2018. Enhancing debug observability for HLS-based FPGA circuits through source-to-source compilation. J. Parallel Distrib. Comput. 117 (2018), 148–160.Google Scholar
Cross Ref
- Thierry Moreau, Tianqi Chen, Ziheng Jiang, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. VTA: An open hardware-software stack for deep learning. Retrieved from https://arXiv:1807.04188.Google Scholar
- Antoine Morvan, Steven Derrien, and Patrice Quinton. 2013. Polyhedral bubble insertion: A method to improve nested loop pipelining for high-level synthesis. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 32, 3 (2013), 339–352. Google Scholar
Digital Library
- Kevin E. Murray, Mohamed A. Elgammal, Vaughn Betz, Tim Ansell, Keith Rothman, and Alessandro Comodi. 2020. SymbiFlow and VPR: An open-source design flow for commercial and novel FPGAs. IEEE Micro (2020).Google Scholar
- Kevin E. Murray, Oleg Petelin, Sheng Zhong, Jia Min Wang, Mohamed Eldafrawy, Jean-Philippe Legault, Eugene Sha, Aaron G. Graham, Jean Wu, Matthew J. P. Walker, et al. 2020. VTR 8: High-performance CAD and customizable FPGA architecture modelling. ACM Trans. Reconfig. Technol. Syst. 13, 2 (2020), 339–352. Google Scholar
Digital Library
- Razvan Nane, Vlad-Mihai Sima, Christian Pilato, Jongsok Choi, Blair Fort, Andrew Canis, Yu Ting Chen, Hsuan Hsiao, Stephen Brown, Fabrizio Ferrandi, et al. 2016. A survey and evaluation of FPGA high-level synthesis tools. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 35, 10 (2016), 1591–1604. Google Scholar
Digital Library
- Rachit Nigam, Sachille Atapattu, Samuel Thomas, Zhijing Li, Theodore Bauer, Yuwei Ye, Apurva Koti, Adrian Sampson, and Zhiru Zhang. 2020. Predictable accelerator design with time-sensitive affine types. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'20). Google Scholar
Digital Library
- Mostafa W. Numan, Braden J. Phillips, Gavin S. Puddy, and Katrina Falkner. 2020. Towards automatic high-level code deployment on reconfigurable platforms: A survey of high-level synthesis tools and toolchains. IEEE Access 8 (2020), 174692–174722.Google Scholar
Cross Ref
- Eriko Nurvitadhi, Gabriel Weisz, Yu Wang, Skand Hurkat, Marie Nguyen, James C. Hoe, José F Martínez, and Carlos Guestrin. 2014. GraphGen: An FPGA framework for vertex-centric graph computation. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'14). Google Scholar
Digital Library
- William George Osborne, Ray C. C. Cheung, José Gabriel F. Coutinho, Wayne Luk, and Oskar Mencer. 2007. Automatic accuracy-guaranteed bit-width optimization for fixed and floating-point systems. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'07).Google Scholar
Cross Ref
- Ganda Stephane Ouedraogo, Matthieu Gautier, and Olivier Sentieys. 2014. A frame-based domain-specific language for rapid prototyping of FPGA-based software-defined radios. EURASIP J. Adv. Signal Process. 1 (2014), 1–15.Google Scholar
- M. Akif Özkan, Oliver Reiche, Frank Hannig, and Jürgen Teich. 2016. FPGA-based accelerator design from a domain-specific language. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'16).Google Scholar
Cross Ref
- Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, and Wen-Mei W. Hwu. 2009. FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs. In Proceedings of the Symposium on Application Specific Processors (SASP'09).Google Scholar
- Philippos Papaphilippou, Jiuxi Meng, and Wayne Luk. 2020. High-performance FPGA network switch architecture. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'20). Google Scholar
Digital Library
- Dongjoon Park, Yuanlong Xiao, Nevo Magnezi, and André DeHon. 2018. Case for fast FPGA compilation using partial reconfiguration. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'18).Google Scholar
Cross Ref
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Retrieved from https://arXiv:1912.01703. Google Scholar
Digital Library
- Ryan Pattison, Christian Fobel, Gary Grewal, and Shawki Areibi. 2015. Scalable analytic placement for FPGA on GPGPU. In Proceedings of the International Conference on Reconfigruable Computing and FPGAs (ReConFig'15).Google Scholar
Cross Ref
- Francesco Peverelli, Marco Rabozzi, Emanuele Del Sozzo, and Marco D. Santambrogio. 2018. OXiGen: A tool for automatic acceleration of C functions into dataflow FPGA-based kernels. In Proceedings of the International Parallel and Distributed Processing Symposium Workshops (IPDPSW'18).Google Scholar
- Christian Pilato and Fabrizio Ferrandi. 2013. Bambu: A modular framework for the high level synthesis of memory-intensive applications. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'13).Google Scholar
Cross Ref
- Christian Pilato, Daniele Loiacono, Antonino Tumeo, Fabrizio Ferrandi, Pier Luca Lanzi, and Donatella Sciuto. 2010. Speeding-Up expensive evaluations in high-level synthesis using solution modeling and fitness inheritance. Comput. Intell. Exp. Optimiz. Problems (2010).Google Scholar
- Jose P. Pinilla and Steven J. E. Wilton. 2016. Enhanced source-level instrumentation for FPGA in-system debug of high-level synthesis designs. In Proceedings of the International Conference on Field Programmable Technology (FPT'16).Google Scholar
- Louis-Noel Pouchet, Peng Zhang, Ponnuswamy Sadayappan, and Jason Cong. 2013. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'13). Google Scholar
Digital Library
- Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, and Mark Horowitz. 2017. Programming heterogeneous systems from an image processing DSL. ACM Trans. Architect. Code Optimiz. 14, 3 (2017), 1–25. Google Scholar
Digital Library
- Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song et al. 2016. Going deeper with embedded FPGA platform for convolutional neural network. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'16). 26–35. Google Scholar
Digital Library
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices (2013). Google Scholar
Digital Library
- B. Ramakrishna Rau. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the International Symposium on Microarchitecture (MICRO'94). Google Scholar
Digital Library
- Oliver Reiche, M. Akif Özkan, Richard Membarth, Jürgen Teich, and Frank Hannig. 2017. Generating FPGA-Based image processing accelerators with hipacc. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'17). Google Scholar
Digital Library
- Hongbo Rong. 2017. Programmatic control of a compiler for generating high-performance spatial hardware. Retrieved from https://arXiv:1711.07606.Google Scholar
- Zhenyuan Ruan, Tong He, Bojie Li, Peipei Zhou, and Jason Cong. 2018. ST-Accel: A high-level programming platform for streaming applications on FPGA. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google Scholar
Cross Ref
- Sahand Salamat, Mohsen Imani, Behnam Khaleghi, and Tajana Rosing. 2019. F5-HD: Fast flexible FPGA-based framework for refreshing hyperdimensional computing. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'19). Google Scholar
Digital Library
- Andrew G. Schmidt, Neil Steiner, Matthew French, and Ron Sass. 2012. HwPMI: An extensible performance monitoring infrastructure for improving hardware design and productivity on FPGAs. Int. J. Reconfig. Comput. (2012). Google Scholar
Digital Library
- Robert Schreiber, Shail Aditya, Scott Mahlke, Vinod Kathail, B. Ramakrishna Rau, Darren Cronquist, and Mukund Sivaraman. 2002. PICO-NPA: High-level synthesis of nonprogrammable hardware accelerators. J. VLSI Signal Process. Syst. Signal Image Video Technol. 31, 2 (2002), 127–142. Google Scholar
Digital Library
- Jocelyn Sérot, François Berry, and Sameer Ahmed. 2013. CAPH: A language for implementing stream-processing applications on FPGAs. Embed. Syst. Design FPGAs (2013).Google Scholar
- Aaron Severance, Joe Edwards, Hossein Omidian, and Guy Lemieux. 2014. Soft vector processors with streaming pipelines. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'14). Google Scholar
Digital Library
- Aaron Severance and Guy Lemieux. 2012. VENICE: A compact vector processor for FPGA applications. In Proceedings of the International Conference on Field Programmable Technology (FPT'12). Google Scholar
Digital Library
- Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In Proceedings of the International Symposium on Microarchitecture (MICRO'16). Google Scholar
Digital Library
- Minghua Shen and Guojie Luo. 2015. Accelerate FPGA routing with parallel recursive partitioning. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'15). Google Scholar
Digital Library
- Minghua Shen and Guojie Luo. 2017. Corolla: GPU-accelerated FPGA routing based on subgraph dynamic expansion. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'17). Google Scholar
Digital Library
- Minghua Shen, Guojie Luo, and Nong Xiao. 2020. Coarse-grained parallel routing with recursive partitioning for FPGAs. IEEE Trans. Parallel Distrib. Syst. 32, 4 (2020), 884–899.Google Scholar
Digital Library
- Sam Skalicky, Joshua Monson, Andrew Schmidt, and Matthew French. 2018. Hot & spicy: Improving productivity with python and HLS for FPGAs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'18).Google Scholar
Cross Ref
- Atefeh Sohrabizadeh, Jie Wang, and Jason Cong. 2020. End-to-end optimization of deep learning applications. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'20). Google Scholar
Digital Library
- Roman A. Solovyev, Alexandr A. Kalinin, Alexander G. Kustov, Dmitry V. Telpukhov, and Vladimir S. Ruhlov. 2018. FPGA implementation of convolutional neural networks with fixed-point calculations. Retrieved from https://arXiv:1808.09945.Google Scholar
- Lukas Sommer, Lukas Weber, Martin Kumm, and Andreas Koch. 2020. Comparison of arithmetic number formats for inference in sum-product networks on FPGAs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'20).Google Scholar
Cross Ref
- Nitish Srivastava, Hongbo Rong, Prithayan Barua, Guanyu Feng, Huanqi Cao, Zhiru Zhang, David Albonesi, Vivek Sarkar, Wenguang Chen, Paul Petersen, et al. 2019. T2S-Tensor: Productively generating high-performance spatial hardware for dense tensor computations. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'19).Google Scholar
Cross Ref
- Robert Stewart, Kirsty Duncan, Greg Michaelson, Paulo Garcia, Deepayan Bhowmik, and Andrew Wallace. 2018. RIPL: A parallel image processing language for FPGAs. ACM Trans. Reconfig. Technol. Syst. 11, 1 (2018), 1–24. Google Scholar
Digital Library
- Ian Swarbrick, Dinesh Gaitonde, Sagheer Ahmad, Bala Jayadev, Jeff Cuppett, Abbas Morshed, Brian Gaide, and Ygal Arbel. 2019. Versal network-on-chip (NoC). In Proceedings of the Symposium on High-Performance Interconnects (Hot Interconnects'19).Google Scholar
Cross Ref
- Synthesijer. 2020. Synthesijer GitHub. Retrieved from https://github.com/synthesijer/synthesijer.Google Scholar
- Mingxing Tan, Steve Dai, Udit Gupta, and Zhiru Zhang. 2015. Mapping-aware constrained scheduling for LUT-Based FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'15). Google Scholar
Digital Library
- Mingxing Tan, Gai Liu, Ritchie Zhao, Steve Dai, and Zhiru Zhang. 2015. Elasticflow: A complexity-effective approach for pipelining irregular loop nests. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'15). Google Scholar
Digital Library
- James Thomas, Pat Hanrahan, and Matei Zaharia. 2020. Fleet: A framework for massively parallel streaming on FPGAs. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'20). Google Scholar
Digital Library
- Tim Todman and Wayne Luk. 2013. Runtime assertions and exceptions for streaming systems. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'13).Google Scholar
Cross Ref
- Stephen M. Steve Trimberger. 2018. Three ages of FPGAs: A retrospective on the first thirty years of FPGA technology: This paper reflects on how Moore's law has driven the design of FPGAs through three epochs: The age of invention, the age of expansion, and the age of accumulation. IEEE Solid-State Circ. Mag. 10, 2 (2018), 16–29.Google Scholar
Cross Ref
- Ecenur Ustun, Chenhui Deng, Debjit Pal, Zhijing Li, and Zhiru Zhang. 2020. Accurate operation delay prediction for FPGA HLS using graph neural networks. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'20). Google Scholar
Digital Library
- Ecenur Ustun, Shaojie Xiang, Jinny Gui, Cunxi Yu, and Zhiru Zhang. 2019. LAMDA: Learning-assisted multi-stage autotuning for FPGA design closure. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'19).Google Scholar
Cross Ref
- Shervin Vakili, J. M. Pierre Langlois, and Guy Bois. 2013. Enhanced precision analysis for accuracy-aware bit-width optimization using affine arithmetic. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 32, 12 (2013), 1853–1865. Google Scholar
Digital Library
- Anshuman Verma, Huiyang Zhou, Skip Booth, Robbie King, James Coole, Andy Keep, John Marshall, and Wu-chun Feng. 2017. Developing dynamic profiling and debugging support in OpenCL for FPGAs. In Proceedings of the Design Automation Conference (DAC'17). Google Scholar
Digital Library
- Chris C. Wang and Guy G. F. Lemieux. 2011. Scalable and deterministic timing-driven parallel placement for FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'11). Google Scholar
Digital Library
- Dekui Wang, Zhenhua Duan, Cong Tian, Bohu Huang, and Nan Zhang. 2020. ParRA: A shared memory parallel FPGA router using hybrid partitioning approach. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 39, 4 (2020), 830–842.Google Scholar
Cross Ref
- Han Wang, Robert Soulé, Huynh Tu Dang, Ki Suh Lee, Vishal Shrivastav, Nate Foster, and Hakim Weatherspoon. 2017. P4FPGA: A rapid prototyping framework for p4. In Proceedings of the Symposium on SDN Research. Google Scholar
Digital Library
- Jie Wang, Licheng Guo, and Jason Cong. 2021. AutoSA: A polyhedral compiler for high-performance systolic arrays on FPGA. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'21). Google Scholar
Digital Library
- Shibo Wang and Pankaj Kanwar. 2019. BFloat16: The secret to high performance on cloud TPUs. Google Cloud Blog (2019).Google Scholar
- Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, Yanzhi Wang, and Yun Liang. 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'18). Google Scholar
Digital Library
- Xiaojun Wang and Miriam Leeser. 2010. VFloat: A variable precision fixed- and floating-point library for reconfigurable hardware. ACM Trans. Reconfig. Technol. Syst. 16, 3 (2010), 1–23. Google Scholar
Digital Library
- Yu Wang, James C. Hoe, and Eriko Nurvitadhi. 2019. Processor assisted worklist scheduling for FPGA accelerated graph processing on a shared-memory platform. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'19).Google Scholar
Cross Ref
- Yuxin Wang, Peng Li, and Jason Cong. 2014. Theory and algorithm for generalized memory partitioning in high-level synthesis. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'14). Google Scholar
Digital Library
- Yuxin Wang, Peng Li, Peng Zhang, Chen Zhang, and Jason Cong. 2013. Memory partitioning for multidimensional arrays in high-level synthesis. In Proceedings of the Design Automation Conference (DAC'13). Google Scholar
Digital Library
- Saud Wasly, Rodolfo Pellizzoni, and Nachiket Kapre. 2017. HopliteRT: An efficient FPGA NoC for real-time applications. In Proceedings of the International Conference on Field Programmable Technology (FPT'17).Google Scholar
Cross Ref
- Richard Wei, Lane Schwartz, and Vikram Adve. 2017. DLVM: A modern compiler infrastructure for deep learning systems. Retrieved from https://arXiv:1711.03016.Google Scholar
- Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Proceedings of the Design Automation Conference (DAC'17). Google Scholar
Digital Library
- Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65–76. Google Scholar
Digital Library
- Yuanlong Xiao, Dongjoon Park, Andrew Butt, Hans Giesen, Zhaoyang Han, Rui Ding, Nevo Magnezi, Raphael Rubin, and André DeHon. 2019. Reducing FPGA compile time with separate compilation for FPGA building blocks. In Proceedings of the International Conference on Field Programmable Technology (FPT'19).Google Scholar
Cross Ref
- Xilinx. 2012. ChipScope Pro Software and Cores (UG029). Retrieved from https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/chipscope_pro_sw_cores_ug029.pdf.Google Scholar
- Xilinx. 2020. SDNet Packet Processor User Guide. Retrieved from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_1/ug1012-sdnet-packet-processor.pdf.Google Scholar
- Xilinx. 2020. Vitis High-Level Synthesis User Guide. Retrieved from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_2/ug1399-vitis-hls.pdf.Google Scholar
- Xilinx. 2020. Zynq UltraScale+ MPSoC. Retrieved from https://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html.Google Scholar
- Yu Xing, Shuang Liang, Lingzhi Sui, Xijie Jia, Jiantao Qiu, Xin Liu, Yushun Wang, Yi Shan, and Yu Wang. 2019. DNNVM: End-to-end compiler leveraging heterogeneous optimizations on FPGA-based CNN accelerators. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 39, 10, 2668–2681.Google Scholar
Cross Ref
- Li Yang, Zhezhi He, and Deliang Fan. 2018. A fully onchip binarized convolutional neural network FPGA implementation with accurate inference. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'18). Google Scholar
Digital Library
- Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek, and Kurt Keutzer. 2019. Synetgy: Algorithm-hardware co-design for convnet accelerators on embedded FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'19). Google Scholar
Digital Library
- Peter Yiannacouras, J. Gregory Steffan, and Jonathan Rose. 2007. Exploration and customization of FPGA-based soft processors. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 26, 2 (2007), 266–277. Google Scholar
Digital Library
- Cody Hao Yu, Peng Wei, Max Grossman, Peng Zhang, Vivek Sarker, and Jason Cong. 2018. S2FA: An accelerator automation framework for heterogeneous computing in datacenters. In Proceedings of the Design Automation Conference (DAC'18). Google Scholar
Digital Library
- Jason Yu, Guy Lemieux, and Christpher Eagleston. 2008. Vector processing as a soft-core CPU accelerator. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'08). 222–232. Google Scholar
Digital Library
- David C. Zaretsky, Gaurav Mittal, Robert P. Dick, and Prith Banerjee. 2007. Balanced scheduling and operation chaining in high-level synthesis for FPGA designs. In Proceedings of the International Symposium on Quality Electronic Design (ISQED'07). Google Scholar
Digital Library
- Hanqing Zeng and Viktor Prasanna. 2020. GraphACT: Accelerating GCN training on CPU-FPGA heterogeneous platforms. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'20). Google Scholar
Digital Library
- Yue Zha and Jing Li. 2020. Virtualizing FPGAs in the cloud. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'20). 845–858. Google Scholar
Digital Library
- Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'15). Google Scholar
Digital Library
- Chen Zhang, Guangyu Sun, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2019. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 38, 11 (2019), 2072–2085.Google Scholar
Digital Library
- Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, and Deming Chen. 2018. DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'18). Google Scholar
Digital Library
- Yichi Zhang, Junhao Pan, Xinheng Liu, Hongzheng Chen, Deming Chen, and Zhiru Zhang. 2021. FracBNN: Accurate and FPGA-efficient binary neural networks with fractional activations. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'21). Google Scholar
Digital Library
- Zhiru Zhang and Bin Liu. 2013. SDC-based modulo scheduling for pipeline synthesis. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'13). Google Scholar
Digital Library
- Jieru Zhao, Liang Feng, Sharad Sinha, Wei Zhang, Yun Liang, and Bingsheng He. 2017. COMBA: A comprehensive model-based analysis framework for high level synthesis of real applications. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'17). Google Scholar
Digital Library
- Jieru Zhao, Tingyuan Liang, Sharad Sinha, and Wei Zhang. 2019. Machine learning based routing congestion prediction in FPGA high-level synthesis. In Proceedings of the Design, Automation, and Test in Europe (DATE'19).Google Scholar
Cross Ref
- Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Accelerating binarized convolutional neural networks with software-programmable fpgas. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'17). Google Scholar
Digital Library
- Ritchie Zhao, Mingxing Tan, Steve Dai, and Zhiru Zhang. 2015. Area-efficient pipelining for FPGA-targeted high-level synthesis. In Proceedings of the Design Automation Conference (DAC'15). Google Scholar
Digital Library
- Zhipeng Zhao and James C. Hoe. 2017. Using vivado-HLS for structural design: A NoC case study. Retrieved from https://arXiv:1710.10290. Google Scholar
Digital Library
- Guanwen Zhong, Alok Prakash, Yun Liang, Tulika Mitra, and Smail Niar. 2016. Lin-analyzer: A high-level performance analysis tool for FPGA-based accelerators. In Proceedings of the Design Automation Conference (DAC'16). Google Scholar
Digital Library
- Shijie Zhou, Rajgopal Kannan, Viktor K. Prasanna, Guna Seetharaman, and Qing Wu. 2019. HitGraph: High-throughput graph processing framework on FPGA. IEEE Trans. Parallel Distrib. Syst. 30, 10, 2249–2264.Google Scholar
Cross Ref
- Yuan Zhou, Khalid Musa Al-Hawaj, and Zhiru Zhang. 2017. A new approach to automatic memory banking using trace-based address mining. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA'17). Google Scholar
Digital Library
- Wei Zuo, Peng Li, Deming Chen, Louis-Noël Pouchet, Shunan Zhong, and Jason Cong. 2013. Improving polyhedral code generation for high-level synthesis. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'13). Google Scholar
Digital Library
Index Terms
Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future Prospects
Recommendations
Acceleration of an FPGA router
FCCM '97: Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing MachinesThe authors describe their experience and progress in accelerating an FPGA router. Placement and routing is undoubtedly the most time-consuming process in automatic chip design or configuring programmable logic devices as reconfigurable computing ...
Synthesizable Standard Cell FPGA Fabrics Targetable by the Verilog-to-Routing CAD Flow
Special Section on Field Programmable Logic and Applications 2015 and Regular PapersIn this article, we consider implementing field-programmable gate arrays (FPGAs) using a standard cell design methodology and present a framework for the automated generation of synthesizable FPGA fabrics. The open-source Verilog-to-Routing (VTR) FPGA ...
FPGA acceleration of a quantum Monte Carlo application
Quantum Monte Carlo methods enable us to determine the ground-state properties of atomic or molecular clusters. Here, we present a reconfigurable computing architecture using Field Programmable Gate Arrays (FPGAs) to accelerate two computationally ...






Comments