skip to main content
research-article

Optimizing Soft Vector Processing in FPGA-Based Embedded Systems

Published:20 May 2016Publication History
Skip Abstract Section

Abstract

Soft vector processors can augment and extend the capability of FPGA-based embedded systems-on-chip such as the Xilinx Zynq. However, configuring and optimizing the soft processor for best performance is hard. We must consider architectural parameters such as precision, vector lane count, vector length, chunk size, and DMA scheduling to ensure efficient execution of code on the soft vector processing platform. To simplify the design process, we develop a compiler framework and an autotuning runtime that splits the optimization into a combination of static and dynamic passes that map data-parallel computations to the soft processor. We compare and contrast implementations running on the scalar ARM processor, the embedded NEON hard vector engine, and low-level streaming Verilog designs with the VectorBlox MXP soft vector processor. Across a range of data-parallel benchmarks, we show that the MXP soft vector processor can outperform other organizations by up to 4 × while saving ≈ 10% dynamic power. Our compilation and runtime framework is also able to outperform the gcc NEON vectorizer under certain conditions by explicit generation of NEON intrinsics and performance tuning of the autogenerated data-parallel code. When constrained by IO bandwidth, soft vector processors are even competitive with spatial Verilog implementations of computation.

References

  1. E. Caspi, M. Chu, R. Huang, J. Yeh, J. Wawrzynek, and A. DeHon. 2000. Stream computations organized for reconfigurable execution (SCORE). In Field-Programmable Logic and Applications: The Roadmap to Reconfigurable Computing. Lecture Notes in Computer Science, Vol. 1896. Springer, 605--614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Christopher H. Chou, Aaron Severance, Alex D. Brant, Zhiduo Liu, Saurabh Sant, and Guy G. F. Lemieux. 2011. VEGAS: Soft vector processor with scratchpad memory. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jason Cong, Mohammad Ali Ghodrat, Michael Gill, Hui Huang, Bin Liu, Raghu Prabhakar, Glenn Reinman, and Marco Vitanza. 2012. Compilation and architecture support for customized vector instruction extension. In Proceedings of the 2012 17th Asia and South Pacific Design Automation Conference (ASP-DAC’12). IEEE, Los Alamitos, CA, 652--657.Google ScholarGoogle ScholarCross RefCross Ref
  4. Soh Jun Jie and Nachiket Kapre. 2014. Comparing soft and hard vector processing in FPGA-based embedded systems. In Proceedings of the 24th InternationalConference on Field Programmable Logic and Applications (FPL’14). 1--7. DOI:http://dx.doi.org/10.1109/FPL.2014.6927467Google ScholarGoogle Scholar
  5. Nachiket Kapre and Andre DeHon. 2011. VLIW-SCORE: Beyond C for sequential control of SPICE FPGA acceleration. In Proceedings of the 2011 International Conference on Field-Programmable Technology (FPT’11). 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  6. Jainik Kathiara and Miriam Leeser. 2011. An autonomous vector/scalar floating point coprocessor for FPGAs. In Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’11). IEEE, Los Alamitos, CA, 33--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Zhiduo Liu, Aaron Severance, Satnam Singh, and Guy G. F. Lemieux. 2012. Accelerator compiler for the venice vector processor. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 229--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Helene Martorell and Nachiket Kapre. 2012. FX-SCORE: A framework for fixed-point compilation of SPICE device models using gappa++. In Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines. 77--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Matthew Naylor, Paul J. Fox, A. Theodore Markettos, and Simon W. Moore. 2013. Managing the FPGA memory wall: Custom computing or vector processing? In FProceedings of the 2013 23rd International Conference on Field Programmable Logic and Applications (FPL’13). IEEE, Los Alamitos, CA, 1--6.Google ScholarGoogle Scholar
  10. Aaron Severance and Guy Lemieux. 2012. VENICE: A compact vector processor for FPGA applications. In Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’12). IEEE, Los Alamitos, CA, 245--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Aaron Severance and Guy G. F. Lemieux. 2013. Embedded supercomputing in FPGAs with the VectorBlox MXP matrix processor. In Proceedings of the 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’13). IEEE, Los Alamitos, CA, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Vipin, S. Shreejith, D. Gunasekera, S. A. Fahmy, and N. Kapre. 2013. System-level FPGA device driver with high-level synthesis support. In Proceedings of the 2013 International Conference on Field-Programmable Technology (FPT’13). 128--135.Google ScholarGoogle Scholar
  13. Deheng Ye and Nachiket Kapre. 2014. MixFX-SCORE: Heterogeneous fixed-point compilation of dataflow computations. In Proceedings of the 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’14). 206--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Peter Yiannacouras, J. Gregory Steffan, and Jonathan Rose. 2008. VESPA: Portable, scalable, and flexible FPGA-based vector processors. In Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, New York, NY, 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jason Yu, Christopher Eagleston, Christopher Han-Yu Chou, Maxime Perreault, and Guy Lemieux. 2009. Vector processing as a soft processor accelerator. ACM Transactions on Reconfigurable Technology and Systems 2, 2, 12. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimizing Soft Vector Processing in FPGA-Based Embedded Systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 9, Issue 3
      Special Issue on Reconfigurable Components with Source Code
      September 2016
      128 pages
      ISSN:1936-7406
      EISSN:1936-7414
      DOI:10.1145/2940351
      • Editor:
      • Steve Wilton
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 May 2016
      • Accepted: 1 September 2015
      • Revised: 1 May 2015
      • Received: 1 January 2015
      Published in trets Volume 9, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!