Abstract
Field-Programmable Gate Arrays (FPGAs) can yield higher performance and lower power than software solutions on CPUs or GPUs. However, designing with FPGAs requires specialized hardware design skills and hours-long CAD processing times. To reduce and accelerate the design effort, we can implement an overlay architecture on the FPGA, on which we then more easily construct the desired system but at a large cost in performance and area relative to a direct FPGA implementation. In this work, we compare the micro-architecture, performance, and area of two soft-processor overlays: the Octavo multi-threaded soft-processor and the MXP soft vector processor. To measure the area and performance penalties of these overlays relative to the underlying FPGA hardware, we compare direct FPGA implementations of the micro-benchmarks written in C synthesized with the LegUp HLS tool and also written in the Verilog HDL. Overall, Octavo’s higher operating frequency and MXP’s more efficient code execution results in similar performance from both, within an order of magnitude of direct FPGA implementations, but with a penalty of an order of magnitude greater area.
- Altera. 2014. Nios II Performance Benchmarks. Retrieved August 2014 from http://www.altera.com/literature/ds/ds_nios2_perf.pdf.Google Scholar
- Alexander Brant and Guy G. F. Lemieux. 2012. ZUMA: An open FPGA overlay architecture. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM’12). 93--96. Google Scholar
Digital Library
- A. Canis, S. Brown, and J. H. Anderson. 2014. Modulo SDC scheduling with recurrence minimization in high-Level synthesis. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL’14). 1--8.Google Scholar
- Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Tomasz Czajkowski, Stephen D. Brown, and Jason H. Anderson. 2013. LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Trans. Embed. Comput. Syst. 13, 2 (Sept. 2013), Article 24, 27 pages. Google Scholar
Digital Library
- D. Capalija and T. S. Abdelrahman. 2013. A high-performance overlay architecture for pipelined execution of data flow graphs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’13). 1--8.Google Scholar
- Hui Yan Cheah, Fredrik Brosser, Suhaib A. Fahmy, and Douglas L. Maskell. 2014. The iDEA DSP block-based soft processor for FPGAs. ACM Trans. Reconfig. Technol. Syst. 7, 3 (Sept. 2014), Article 19, 23 pages. Google Scholar
Digital Library
- Alexander Choong, Rami Beidas, and Jianwen Zhu. 2010. Parallelizing simulated annealing-based placement using GPGPU. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’10). 31--34. Google Scholar
Digital Library
- Robert Dimond, Oskar Mencer, and Wayne Luk. 2005. CUSTARD - A customisable threaded FPGA soft processor and tools. In Proceedings of the International Conference on Field Programmable Logic (FPL’05). 1--6.Google Scholar
Cross Ref
- B. Fort, A. Canis, J. Choi, N. Calagar, R. Lian, S. Hadjis, Y. T. Chen, M. Hall, B. Syrowik, T. Czajkowski, S. Brown, and J. H. Anderson. 2014. Automating the design of processor/accelerator embedded systems with legup high-level synthesis. In Proceedings of the IEEE International Conference on Embedded and Ubiquitous Computing (EUC’14). Google Scholar
Digital Library
- B. Fort, D. Capalija, Z. G. Vranesic, and S. D. Brown. 2006. A multithreaded soft processor for SoPC area reduction. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06). 131--142. Google Scholar
Digital Library
- J. Gray. 2016. GRVI phalanx: A massively parallel RISC-V FPGA accelerator accelerator. In Proceedings of the 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’16). 17--20.Google Scholar
Cross Ref
- Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the International Symposium on Computer Architecture (ISCA’10). 37--47. Google Scholar
Digital Library
- ITRS. 2011. International Roadmap For Semiconductors: Design. Retrieved from http://www.itrs.net/Links/2011itrs/2011Chapters/2011Design.pdf.Google Scholar
- Alex K. Jones, Raymond Hoare, Dara Kusic, Joshua Fazekas, and John Foster. 2005. An FPGA-based VLIW processor with custom hardware execution. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’05). 107--117. Google Scholar
Digital Library
- Volodymyr V. Kindratenko, Robert J. Brunner, and Adam D. Myers. 2007. Mitrion-C application development on SGI altix 350/RC100. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’07). 239--250. Google Scholar
Digital Library
- A Krasnov and A Schultz. 2007. RAMP blue: A message-passing manycore system in FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’07). 54--61.Google Scholar
Cross Ref
- M. Labrecque and J. G. Steffan. 2007. Improving pipelined soft processors with multithreading. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL’07). 210--215.Google Scholar
- Martin Labrecque and J. Gregory Steffan. 2009. Fast critical sections via thread scheduling for FPGA-based multithreaded processors. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’09). 18--25.Google Scholar
- Martin Labrecque, Peter Yiannacouras, and J. Gregory Steffan. 2008. Scaling soft processor systems. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM’08). 195--205. Google Scholar
Digital Library
- Charles Eric LaForest, Jason Anderson, and John Gregory Steffan. 2014. Approaching overhead-free execution on FPGA soft-processors. In Proceedings of the International Conference on Field-Programmable Technology (FPT).Google Scholar
Cross Ref
- Charles Eric LaForest and John Gregory Steffan. 2012. OCTAVO: An FPGA-centric processor family. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’12). 219--228. Google Scholar
Digital Library
- Charles Eric LaForest and John Gregory Steffan. 2013. Maximizing speed and density of tiled FPGA overlays via partitioning. In Proceedings of the International Conference on Field-Programmable Technology (FPT’13). 238--245.Google Scholar
Cross Ref
- C. Liu, H. C. Ng, and H. K. H. So. 2015. QuickDough: A rapid FPGA loop accelerator design framework using soft CGRA overlay. In Proceedings of the 2015 International Conference on Field Programmable Technology (FPT’15). 56--63.Google Scholar
Cross Ref
- Adrian Ludwin and Vaughn Betz. 2011. Efficient and deterministic parallel placement for FPGAs. ACM Trans. Des. Autom. Electr. Syst. 16, 3 (June 2011), 1--23. Google Scholar
Digital Library
- K. E. Murray, S. Whitty, S. Liu, J. Luu, and V. Betz. 2013. Titan: Enabling large and complex benchmarks in academic CAD. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’13). 1--8.Google Scholar
- Mazen A. R. Saghir, Mohamad El-Majzoub, and Patrick Akl. 2006. Datapath and ISA customization for soft VLIW processors. In Proceedings of the IEEE International Conference on Reconfigurable Computing and FPGAs (ReConfig’06). 1--10.Google Scholar
Cross Ref
- Aaron Severance, Joe Edwards, Hossein Omidian, and Guy Lemieux. 2014. Soft vector processors with streaming pipelines. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’14). 117--126. Google Scholar
Digital Library
- Aaron Severance and Guy Lemieux. 2012. VENICE: A compact vector processor for FPGA applications. In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT’12). 261--268.Google Scholar
- A. Severance and G. G. F. Lemieux. 2013a. Embedded supercomputing in FPGAs with the vectorblox MXP matrix processor. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’13). 1--10. Google Scholar
Digital Library
- Aaron Severance and Guy Lemieux. 2013b. TputCache: High-frequency, multi-way cache for high-throughput FPGA applications. In Proceedings of the International Conference on Field Programmable Logic (FPL’13), 1--6.Google Scholar
Cross Ref
- Kuen Hung Tsoi and Wayne Luk. 2010. Axel: A heterogeneous cluster with FPGAs and GPUs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’10), 115--124. Google Scholar
Digital Library
- United States Bureau of Labor Statistics. 2012. Occupational Outlook Handbook.Google Scholar
- Henry Wong, Vaughn Betz, and Jonathan Rose. 2011. Comparing FPGA vs. custom CMOS and the impact on processor microarchitecture. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’11). 5--14. Google Scholar
Digital Library
- H. Wong, V. Betz, and J. Rose. 2014. Quantifying the gap between FPGA and custom CMOS to aid microarchitectural design. IEEE Trans. VLSI 22, 10 (Oct. 2014), 2067--2080.Google Scholar
Cross Ref
- Qinghong Wu and Kenneth S. McElvain. 2012. A fast discrete placement algorithm for FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’12) (2012), 115--118. Google Scholar
Digital Library
Index Terms
Microarchitectural Comparison of the MXP and Octavo Soft-Processor FPGA Overlays
Recommendations
OCTAVO: an FPGA-centric processor family
FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate ArraysOverlay processor architectures allow FPGAs to be programmed by non-experts using software, but prior designs have mainly been based on the architecture of their ASIC predecessors. In this paper we develop a new processor architecture that from the ...
Soft vector processors vs FPGA custom hardware: measuring and reducing the gap
FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arraysSoft processors are often used in FPGA-based systems because of their ease-of-use, but for a given computation there is a significant gap in area/performance between a C code implementation executing on a soft processor and a custom FPGA hardware ...
An FPGA implementation for neural networks with the FDFM processor core approach
This paper presents a field programmable gate array FPGA implementation of a three-layer perceptron using the few DSP blocks and few block RAMs FDFM approach implemented in the Xilinx Virtex-6 family FPGA. In the FDFM approach, multiple processor cores ...






Comments