skip to main content
research-article

Enhancing Design Space Exploration by Extending CPU/GPU Specifications onto FPGAs

Published:17 February 2015Publication History
Skip Abstract Section

Abstract

The design cycle for complex special-purpose computing systems is extremely costly and time-consuming. It involves a multiparametric design space exploration for optimization, followed by design verification. Designers of special purpose VLSI implementations often need to explore parameters, such as optimal bitwidth and data representation, through time-consuming Monte Carlo simulations. A prominent example of this simulation-based exploration process is the design of decoders for error correcting systems, such as the Low-Density Parity-Check (LDPC) codes adopted by modern communication standards, which involves thousands of Monte Carlo runs for each design point. Currently, high-performance computing offers a wide set of acceleration options that range from multicore CPUs to Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The exploitation of diverse target architectures is typically associated with developing multiple code versions, often using distinct programming paradigms. In this context, we evaluate the concept of retargeting a single OpenCL program to multiple platforms, thereby significantly reducing design time. A single OpenCL-based parallel kernel is used without modifications or code tuning on multicore CPUs, GPUs, and FPGAs. We use SOpenCL (Silicon to OpenCL), a tool that automatically converts OpenCL kernels to RTL in order to introduce FPGAs as a potential platform to efficiently execute simulations coded in OpenCL. We use LDPC decoding simulations as a case study. Experimental results were obtained by testing a variety of regular and irregular LDPC codes that range from short/medium (e.g., 8,000 bit) to long length (e.g., 64,800 bit) DVB-S2 codes. We observe that, depending on the design parameters to be simulated, on the dimension and phase of the design, the GPU or FPGA may suit different purposes more conveniently, thus providing different acceleration factors over conventional multicore CPUs.

References

  1. A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, J. H. Anderson, S. Brown, and T. Czajkowski. 2011. Legup: High-level synthesis for fpga-based processor/accelerator systems. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays. 33--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Cope, P. Y. K. Cheung, W. Luk, and L. Howes. 2010. Performance comparison of graphics processors to reconfigurable logic: A case study. IEEE Transactions on Computing 59, 4 (2010), 433--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. EN 302 307 V1. 1.1, European Telecommunications Standards Institute (ETSI). 2005. Digital video broadcasting (DVB); second generation framing structure, channel coding and modulation systems for broadcasting, interactive services, news gathering and other broad-band satellite applications. (2005).Google ScholarGoogle Scholar
  4. M. Eroz, F. W. Sun, and L. N. Lee. 2004. Dvb-s2 low density parity check codes with near Shannon limit performance. International Journal of Satellite Communications and Networking 22 (2004), 269--279.Google ScholarGoogle ScholarCross RefCross Ref
  5. G. Falcao, J. Andrade, V. Silva, and L. Sousa. 2011. GPU-based DVB-S2 LDPC decoder with high throughput and fast error floor detection. Electronics Letters 47, 9 (April 2011), 542--543.Google ScholarGoogle ScholarCross RefCross Ref
  6. G. Falcao, V. Silva, L. Sousa, and J. Andrade. 2012. Portable LDPC decoding on multicores using OpenCL. IEEE Signal Processing Magazine 29, 4 (2012), 81--109.Google ScholarGoogle ScholarCross RefCross Ref
  7. R. G. Gallager. 1962. Low-density parity-check codes. IRE Transactions on Information Theory 8, 1 (1962), 21--28.Google ScholarGoogle Scholar
  8. A. Gill, T. Bull, D. DePardo, A. Farmer, E. Komp, and E. Perrins. 2011. Using functional programming to generate an LDPC forward error corrector. In Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines. 133--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Jin, A. Khandekar, and R. McEliece. 2000. Irregular repeat-accumulate codes. In Proceedings of the 2nd International Symposium on Turbo Codes & Related Topics.Google ScholarGoogle Scholar
  10. V. Kathail, S. Aditya, R. Schreiber, B. R. Rau, D. Cronquist, and M. Sivaraman. 2002. Pico: Automatically designing custom computers. IEEE Computer Magazine 35, 9 (2002), 39--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Group Khronos. 2010. OpenCL -- The Open Standard for Parallel Programming of Heterogeneous Systems. Retrieved from http://www.khronos.org/opencl.Google ScholarGoogle Scholar
  12. C. Lattner and V. Adve. 2004. LLVM: A compilation framework for lifelong program analysis transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'04). 75--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Lin, I. Lebedev, and J. Wawrzynek. 2010. OpenrCL: Low-power high performance computing with reconfigurable devices. In Proceedings of the 2010 International Conference on Field Programmable Logic (FPL'10). 458--463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Llosa, A. Gonzalez, E. Ayguade, and M. Valero. 1996. Swing modulo scheduling: A lifetime-sensitive approach. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'96). 80--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. NVIDIA. 2007. CUDA -- Compute Unified Device Architecture. Retrieved from http://www.nvidia.com/object/cuda_home_new.html.Google ScholarGoogle Scholar
  16. Muhsen Owaida, Christos D. Antonopoulos, and Nikolaos Bellas. 2013. A Grammar Induction Method for Reducing Routing Overhead in Complex FPGA Designs. Technical Report. Department of Computer and Communication Engineering, University of Thessaly, Greece.Google ScholarGoogle Scholar
  17. M. Owaida, N. Bellas, K. Daloukas, and C. D. Antonopoulos. 2011a. Massively parallel programming models used as hardware description language: The OpenCL case. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Owaida, N. Bellas, K. Daloukas, and C. D. Antonopoulos. 2011b. Synthesis of platform architectures from opencl programs. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Papakonstantinou, K. Gururaj, J. A. Stratton, D. Chen, J. Cong, and Wen-mei Hwu. 2009. FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs. In Proceedings of the 7th IEEE Symposium on Application Specific Processors. 35--42.Google ScholarGoogle ScholarCross RefCross Ref
  20. Markus Rupp, Andreas Burg, and Eric Beck. 2003. Rapid prototyping for wireless designs: The five-ones approach. Signal Processing 83, 7 (2003), 1427--1444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Smith, A. Farhood, A. Hunt, F. Kschischang, and J. Lodge. 2011. Staircase codes: FEC for 100 Gb/s OTN. IEEE/OSA Lightwave Technology PP, 99 (2011), 1.Google ScholarGoogle Scholar
  22. M. Stephenson, J. Babb, and A. Amarasinghe. 2000. Bitwidth analysis with application to silicon compilation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'00). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Weber, A Gothandaraman, R. J. Hinde, and G. D. Peterson. 2011. Comparing hardware accelerators in scientific applications: A case study. IEEE Transactions on Parallel and Distributed Systems 22, 1 (2011), 58--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. B. Wicker and S. Kim. 2003. Fundamentals of Codes, Graphs, and Iterative Decoding. Kluwer Academic Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Chi-Li Yu and C. Chakrabarti. 2012. Transpose-free sar imaging on fpga platform. In Proceedings of the International Symposium on Circuits and Systems (ISCAS'12). 762--765. DOI:http://dx.doi.org/10.1109/ISCAS.2012.6272149Google ScholarGoogle Scholar
  26. Z. Zhang, Y. Fan, W. Jiang, G. Han, C. Yang, and J. Cong. 2008. High-Level Synthesis: From Algorithm to Digital Circuit. Springer Netherlands, Chapter AutoPilot: A Platform-Based ESL Synthesis System. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Enhancing Design Space Exploration by Extending CPU/GPU Specifications onto FPGAs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!