skip to main content
research-article

FPGA-based Acceleration of FT Convolution for Pulsar Search Using OpenCL

Published:09 January 2019Publication History
Skip Abstract Section

Abstract

The Square Kilometre Array (SKA) project will be the world’s largest radio telescope array. With its large number of antennas, the number of signals that need to be processed is dramatic. One important element of the SKA’s Central Signal Processor package is pulsar search. This article focuses on the FPGA-based acceleration of the Frequency-Domain Acceleration Search module, which is a part of SKA pulsar search engine. In this module, the frequency-domain input signals have to be processed by 85 Finite Impulse response (FIR) filters within a short period of limitation and for thousands of input arrays. Because of the large scale of the input length and FIR filter size, even high-end FPGA devices cannot parallelise the task completely. We start by investigating both time-domain FIR filter (TDFIR) and frequency-domain FIR filter (FDFIR) to tackle this task. We applied the overlap-add algorithm to split the coefficient array of TDFIR and the overlap-save algorithm to split the input signals of FDFIR. To achieve fast prototyping design, we employed OpenCL, which is a high-level FPGA development technique. The performance and power consumption are evaluated using multiple FPGA devices simultaneously and compared with GPU results, which is achieved by porting FPGA-based OpenCL kernels. The experimental evaluation shows that the FDFIR solution is very competitive in terms of performance, with a clear energy consumption advantage over the GPU solution.

References

  1. Mohamed S. Abdelfattah, Andrei Hagiescu, and Deshanand Singh. 2014. Gzip on a chip: High performance lossless data compression on fpgas using opencl. In Proceedings of the International Workshop on OpenCL. ACM, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AMD. 2013. APP SDK-A Complete Development Platform. Retrieved from http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/.Google ScholarGoogle Scholar
  3. AMD. 2015. AMD APP SDK OpenCL Optimization Guide. Retrieved from http://developer.amd.com/wordpress/media/2013/12/AMD_OpenCL_Programming_Optimization_Guide2.pdf.Google ScholarGoogle Scholar
  4. Hugo A. Andrade and Scott Kovner. 1998. Software synthesis from dataflow models for G and LabVIEW/sup TM. In Proceedings of the 32nd Asilomar Conference on Signals, Systems 8 Computers, Vol. 2. IEEE, 1705--1709.Google ScholarGoogle Scholar
  5. Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, 33--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Tomasz Czajkowski, Stephen D. Brown, and Jason H. Anderson. 2013. LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Trans. Embed. Comput. Syst. 13, 2 (2013), 24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Christopher Carilli and Steve Rawlings. 2004. Science with the square kilometer array: Motivation, key science projects, standards and assumptions. arXiv preprint astro-ph/0409274.Google ScholarGoogle Scholar
  8. Doris Chen and Deshanand Singh. 2012. Invited paper: Using OpenCL to evaluate the efficiency of CPUS, GPUS, and FPGAS for information filtering. In Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL’12). IEEE, 5--12.Google ScholarGoogle ScholarCross RefCross Ref
  9. Doris Chen and Deshanand Singh. 2013. Fractal video compression in OpenCL: An evaluation of CPUs, GPUs, and FPGAs as acceleration platforms. In Proceedings of the 18th Asia and South Pacific Design Automation Conference (ASP-DAC’13). IEEE, 297--304.Google ScholarGoogle ScholarCross RefCross Ref
  10. Fei Chen, Yi Shan, Yu Zhang, Yu Wang, Hubertus Franke, Xiaotao Chang, and Kun Wang. 2014. Enabling FPGAs in the cloud. In Proceedings of the 11th ACM Conference on Computing Frontiers. ACM, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Michael A. Clark, P. C. La Plante, and Lincoln J. Greenhill. 2013. Accelerating radio astronomy cross-correlation with graphics processing units. Int. J. High Perform. Comput. Appl. 27, 2 (2013), 178--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Tomasz S. Czajkowski, Utku Aydonat, Dmitry Denisenko, John Freeman, Michael Kinsner, David Neto, Jason Wong, Peter Yiannacouras, and Deshanand P. Singh. 2012. From OpenCL to high-performance hardware on FPGAs. In Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL’12). IEEE, 531--534.Google ScholarGoogle Scholar
  13. Tomasz S. Czajkowski, David Neto, Michael Kinsner, Utku Aydonat, Jason Wong, Dmitry Denisenko, Peter Yiannacouras, John Freeman, Deshanand P. Singh, and Stephen D. Brown. 2012. OpenCL for FPGAs: Prototyping a compiler. In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA’12). The Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 1.Google ScholarGoogle Scholar
  14. Ludovico De Souza, John D. Bunton, Ducan Campbell-Wilson, Roger J. Cappallo, and Bart Kincaid. 2007. A radio astronomy correlator optimized for the Xilinx Virtex-4 SX FPGA. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’07). IEEE, 62--67.Google ScholarGoogle ScholarCross RefCross Ref
  15. Peter E. Dewdney, Peter J. Hall, Richard T. Schilizzi, and T. Joseph L. W. Lazio. 2009. The square kilometre array. Proc. IEEE 97, 8 (2009), 1482--1496.Google ScholarGoogle ScholarCross RefCross Ref
  16. Stephen A. Edwards. 2006. The challenges of synthesizing hardware from C-like languages. IEEE Design Test Comput. 23, 5 (2006), 375--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jeff Fifield, Ronan Keryell, Hervé Ratigner, Henry Styles, and Jim Wu. 2016. Optimizing OpenCL applications on Xilinx FPGA. In Proceedings of the 4th International Workshop on OpenCL. ACM, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jeremy Fowers, Greg Brown, John Wernsing, and Greg Stitt. 2013. A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors. ACM Trans. Architect. Code Optim. 9, 4 (2013), 25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mario Garrido, Jesús Grajal, M. A. Sánchez, and Oscar Gustafsson. 2013. Pipelined radix-feedforward FFT architectures. IEEE Trans. Very Large Scale Integr. Syst. 21, 1 (2013), 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Khronos OpenCL Working Group et al. 2008. The OpenCL Specification, version 1.0. 29. Retrieved from https://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf.Google ScholarGoogle Scholar
  21. Giulia Guidi, Enrico Reggiani, Lorenzo Di Tucci, Gianluca Durelli, Michaela Blott, and Marco D. Santambrogio. 2016. On how to improve FPGA-based systems design productivity via SDAccel. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops. IEEE, 247--252.Google ScholarGoogle Scholar
  22. Intel. 2016. Intel SDK for OpenCL Best Practices Guide. Retrieved from https://www.intel.com/content/www/us/en/programmable/documentation/mwh1391807516407.html.Google ScholarGoogle Scholar
  23. Intel. 2016. Intel SDK for OpenCL Programming Guide. Retrieved from https://www.intel.com/content/www/us/en/programmable/documentation/mwh1391807309901.html.Google ScholarGoogle Scholar
  24. S. Jouteux, R. Ramachandran, B. W. Stappers, P. G. Jonker, and M. Van Der Klis. 2002. Searching for pulsars in close circular binary systems. Astron. Astrophys. 384, 2 (2002), 532--544.Google ScholarGoogle ScholarCross RefCross Ref
  25. Nasser Kehtarnavaz and Sidharth Mahotra. 2010. Digital Signal Processing Laboratory: LabVIEW-Based FPGA Implementation. Universal-Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yanbing Li, Tim Callahan, Ervan Darnell, Randolph Harr, Uday Kurkure, and Jon Stockwood. 2000. Hardware-software co-design of embedded reconfigurable architectures. In Proceedings of the 37th Annual Design Automation Conference. ACM, 507--512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Walid Najjar and Jason Villarreal. 2013. FPGA code accelerators-the compiler perspective. In Proceedings of the 50th Annual Design Automation Conference. ACM, 141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Denis Navarro, Oscar Lucia, Luis Angel Barragan, Isidoro Urriza, and Oscar Jimenez. 2013. High-level synthesis for accelerating the FPGA implementation of computationally demanding control algorithms for power converters. IEEE Trans. Industr. Info. 9, 3 (2013), 1371--1379.Google ScholarGoogle ScholarCross RefCross Ref
  29. Aaron Parsons, Dan Werthimer, Donald Backer, Tim Bastian, Geoffrey Bower, Walter Brisken, Henry Chen, Adam Deller, Terry Filiba, Dale Gary et al. 2009. Digital instrumentation for the radio astronomy community. arXiv preprint arXiv:0904.1181 (2009).Google ScholarGoogle Scholar
  30. Karas Pavel and Svoboda David. 2013. Algorithms for Efficient Computation of Convolution. INTECH Open Access Publisher.Google ScholarGoogle Scholar
  31. Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray et al. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture (ISCA’14). IEEE, 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Scott M. Ransom, Stephen S. Eikenberry, and John Middleditch. 2002. Fourier techniques for very long astrophysical time-series analysis. Astronom. J. 124, 3 (2002), 1788.Google ScholarGoogle ScholarCross RefCross Ref
  33. M. A. Sanchez, Mario Garrido, Marisa López-Vallejo, Jesús Grajal, and Carlos López-Barrio. 2005. Digital channelised receivers on FPGAs platforms. In Proceedings of the IEEE International Radar Conference. IEEE, 816--821.Google ScholarGoogle ScholarCross RefCross Ref
  34. Moritz Schmid, Christian Schmitt, Frank Hannig, Gorker Alp Malazgirt, Nehir Sonmez, Arda Yurdakul, and Adrian Cristal. 2016. Big data and HPC Acceleration with Vivado HLS. In FPGAs for Software Programmers. Springer, 115--136.Google ScholarGoogle Scholar
  35. Steven W. Smith et al. 1997. Digital signal processors. The Scientist and Engineer's Guide to Digital Signal Processing. 503--534. Retrieved from http://www.dspguide.com/CH28.PDF.Google ScholarGoogle Scholar
  36. Rob V. Van Nieuwpoort and John W. Romein. 2009. Using many-core hardware to correlate radio astronomy signals. In Proceedings of the 23rd International Conference on Supercomputing. ACM, 440--449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Rob V. van Nieuwpoort and John W. Romein. 2011. Correlating radio astronomy signals with many-core hardware. Int. J. Parallel Program. 39, 1 (2011), 88--114.Google ScholarGoogle ScholarCross RefCross Ref
  38. Haomiao Wang and Oliver Sinnen. 2015. FPGA-based acceleration of FDAS module for pulsar search. In Proceedings of the International Conference on Field Programmable Technology (FPT’15). IEEE, 240--243.Google ScholarGoogle ScholarCross RefCross Ref
  39. Loring Wirbel. 2014. Xilinx SDAccel: A Unified Development Environment for Tomorrow Data Center. Technical Report. Technical Report, The Linley Group, Inc.Google ScholarGoogle Scholar
  40. Zhiru Zhang, Yiping Fan, Wei Jiang, Guoling Han, Changqi Yang, and Jason Cong. 2008. AutoPilot: A platform-based ESL synthesis system. In High-Level Synthesis. Springer, 99--112.Google ScholarGoogle Scholar

Index Terms

  1. FPGA-based Acceleration of FT Convolution for Pulsar Search Using OpenCL

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 11, Issue 4
          December 2018
          93 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/3303942
          • Editor:
          • Steve Wilton
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 January 2019
          • Accepted: 1 August 2018
          • Revised: 1 May 2018
          • Received: 1 December 2016
          Published in trets Volume 11, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!