Abstract
Soft vector processors can augment and extend the capability of FPGA-based embedded systems-on-chip such as the Xilinx Zynq. However, configuring and optimizing the soft processor for best performance is hard. We must consider architectural parameters such as precision, vector lane count, vector length, chunk size, and DMA scheduling to ensure efficient execution of code on the soft vector processing platform. To simplify the design process, we develop a compiler framework and an autotuning runtime that splits the optimization into a combination of static and dynamic passes that map data-parallel computations to the soft processor. We compare and contrast implementations running on the scalar ARM processor, the embedded NEON hard vector engine, and low-level streaming Verilog designs with the VectorBlox MXP soft vector processor. Across a range of data-parallel benchmarks, we show that the MXP soft vector processor can outperform other organizations by up to 4 × while saving ≈ 10% dynamic power. Our compilation and runtime framework is also able to outperform the gcc NEON vectorizer under certain conditions by explicit generation of NEON intrinsics and performance tuning of the autogenerated data-parallel code. When constrained by IO bandwidth, soft vector processors are even competitive with spatial Verilog implementations of computation.
- E. Caspi, M. Chu, R. Huang, J. Yeh, J. Wawrzynek, and A. DeHon. 2000. Stream computations organized for reconfigurable execution (SCORE). In Field-Programmable Logic and Applications: The Roadmap to Reconfigurable Computing. Lecture Notes in Computer Science, Vol. 1896. Springer, 605--614. Google Scholar
Digital Library
- Christopher H. Chou, Aaron Severance, Alex D. Brant, Zhiduo Liu, Saurabh Sant, and Guy G. F. Lemieux. 2011. VEGAS: Soft vector processor with scratchpad memory. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 15--24. Google Scholar
Digital Library
- Jason Cong, Mohammad Ali Ghodrat, Michael Gill, Hui Huang, Bin Liu, Raghu Prabhakar, Glenn Reinman, and Marco Vitanza. 2012. Compilation and architecture support for customized vector instruction extension. In Proceedings of the 2012 17th Asia and South Pacific Design Automation Conference (ASP-DAC’12). IEEE, Los Alamitos, CA, 652--657.Google Scholar
Cross Ref
- Soh Jun Jie and Nachiket Kapre. 2014. Comparing soft and hard vector processing in FPGA-based embedded systems. In Proceedings of the 24th InternationalConference on Field Programmable Logic and Applications (FPL’14). 1--7. DOI:http://dx.doi.org/10.1109/FPL.2014.6927467Google Scholar
- Nachiket Kapre and Andre DeHon. 2011. VLIW-SCORE: Beyond C for sequential control of SPICE FPGA acceleration. In Proceedings of the 2011 International Conference on Field-Programmable Technology (FPT’11). 1--9.Google Scholar
Cross Ref
- Jainik Kathiara and Miriam Leeser. 2011. An autonomous vector/scalar floating point coprocessor for FPGAs. In Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’11). IEEE, Los Alamitos, CA, 33--36. Google Scholar
Digital Library
- Zhiduo Liu, Aaron Severance, Satnam Singh, and Guy G. F. Lemieux. 2012. Accelerator compiler for the venice vector processor. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 229--232. Google Scholar
Digital Library
- Helene Martorell and Nachiket Kapre. 2012. FX-SCORE: A framework for fixed-point compilation of SPICE device models using gappa++. In Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines. 77--84. Google Scholar
Digital Library
- Matthew Naylor, Paul J. Fox, A. Theodore Markettos, and Simon W. Moore. 2013. Managing the FPGA memory wall: Custom computing or vector processing? In FProceedings of the 2013 23rd International Conference on Field Programmable Logic and Applications (FPL’13). IEEE, Los Alamitos, CA, 1--6.Google Scholar
- Aaron Severance and Guy Lemieux. 2012. VENICE: A compact vector processor for FPGA applications. In Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’12). IEEE, Los Alamitos, CA, 245--245. Google Scholar
Digital Library
- Aaron Severance and Guy G. F. Lemieux. 2013. Embedded supercomputing in FPGAs with the VectorBlox MXP matrix processor. In Proceedings of the 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’13). IEEE, Los Alamitos, CA, 1--10. Google Scholar
Digital Library
- K. Vipin, S. Shreejith, D. Gunasekera, S. A. Fahmy, and N. Kapre. 2013. System-level FPGA device driver with high-level synthesis support. In Proceedings of the 2013 International Conference on Field-Programmable Technology (FPT’13). 128--135.Google Scholar
- Deheng Ye and Nachiket Kapre. 2014. MixFX-SCORE: Heterogeneous fixed-point compilation of dataflow computations. In Proceedings of the 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’14). 206--209. Google Scholar
Digital Library
- Peter Yiannacouras, J. Gregory Steffan, and Jonathan Rose. 2008. VESPA: Portable, scalable, and flexible FPGA-based vector processors. In Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, New York, NY, 61--70. Google Scholar
Digital Library
- Jason Yu, Christopher Eagleston, Christopher Han-Yu Chou, Maxime Perreault, and Guy Lemieux. 2009. Vector processing as a soft processor accelerator. ACM Transactions on Reconfigurable Technology and Systems 2, 2, 12. Google Scholar
Digital Library
Index Terms
Optimizing Soft Vector Processing in FPGA-Based Embedded Systems
Recommendations
Vector Processing as a Soft Processor Accelerator
Current FPGA soft processor systems use dedicated hardware modules or accelerators to speed up data-parallel applications. This work explores an alternative approach of using a soft vector processor as a general-purpose accelerator. The approach has the ...
Scaling Soft Processor Systems
FCCM '08: Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing MachinesAs FPGA-based systems including soft-processors become increasingly common we are motivated to better understand the best way to scale the performance of such systems. In this paper we explore the organization of processors and caches connected to a ...
Soft vector processors vs FPGA custom hardware: measuring and reducing the gap
FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arraysSoft processors are often used in FPGA-based systems because of their ease-of-use, but for a given computation there is a significant gap in area/performance between a C code implementation executing on a soft processor and a custom FPGA hardware ...






Comments