skip to main content
research-article
Public Access

Automatic Hierarchical Parallelization of Linear Recurrences

Published:19 March 2018Publication History
Skip Abstract Section

Abstract

Linear recurrences encompass many fundamental computations including prefix sums and digital filters. Later result values depend on earlier result values in recurrences, making it a challenge to compute them in parallel. We present a new work- and space-efficient algorithm to compute linear recurrences that is amenable to automatic parallelization and suitable for hierarchical massively-parallel architectures such as GPUs. We implemented our approach in a domain-specific code generator that emits optimized CUDA code. Our evaluation shows that, for standard prefix sums and single-stage IIR filters, the generated code reaches the throughput of memory copy for large inputs, which cannot be surpassed. On higher-order prefix sums, it performs nearly as well as the fastest handwritten code from the literature. On tuple-based prefix sums and digital filters, our automatically parallelized code outperforms the fastest prior implementations.

References

  1. Alg3: https://github.com/andmax/gpufilter/, accessed 8/8/2017.Google ScholarGoogle Scholar
  2. G.E. Blelloch. "Scans as Primitive Parallel Operations." IEEE Transactions on Computers, 38(11):1526--1538. 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G.E. Blelloch. "Prefix Sums and Their Applications." In John H. Reif (Ed.), Synthesis of Parallel Algorithms, Morgan Kaufmann, 1990.Google ScholarGoogle Scholar
  4. G. Chaurasia, J. Ragan-Kelley, S. Paris, G. Drettakis, and F. Durand. "Compiling High Performance Recursive Filters." In Proceedings of the 7th Conference on High-Performance Graphics, pp. 85--94. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. CUB: https://nvlabs.github.io/cub/, accessed 8/8/2017.Google ScholarGoogle Scholar
  6. Y. Dotsenko, N.K. Govindaraju, P.P. Sloan, C. Boyd, and J. Manferdelli. "Fast Scan Algorithms on Graphics Processors." In Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 205--213. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Hensley, T. Scheuermann, G. Coombe, M. Singh, and A. Lastra. "Fast Summed-Area Table Generation and its Applications." Computer Graphics Forum, 24(3):547--555. 2005.Google ScholarGoogle ScholarCross RefCross Ref
  8. W.D. Hillis and G.L. Steele. "Data Parallel Algorithms." Communications of the ACM, 29(12): 1170--1183. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R.M. Karp, R.E. Miller, and S. Winograd. "The Organization of Computations for Uniform Recurrence equations." Journal of the ACM, 14:3, pp. 563--590. 1967. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P.M. Kogge and H.S. Stone. "A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations." IEEE Transactions on Computers, 22(8):786--793. 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Maleki, A. Yang, and M. Burtscher. "Higher-Order and Tuple-Based Massively-Parallel Prefix Sums." In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 539--552. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Merry. "A Performance Comparison of Sort and Scan Libraries for GPUs." World Scientific Publishing Company. 2014.Google ScholarGoogle Scholar
  13. D. Merrill and M. Garland. "Single-Pass Parallel Prefix Scan with Decoupled Look-back." NVIDIA Technical Report NVR-2016-002. 2016.Google ScholarGoogle Scholar
  14. n-nacci numbers: https://en.wikipedia.org/wiki/Generalizations_of_Fibonacci_numbers, accessed 8/8/2017.Google ScholarGoogle Scholar
  15. D. Nehab, A. Maximo, R.S. Lima, and H. Hoppe. "GPU-Efficient Recursive Filtering and Summed-Area Tables." In Proceedings of the SIGGRAPH Asia Conference, pp. 176:1--176:12. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A.V. Oppenheim and R.W. Schafer. "Discrete-Time Signal Processing." 3rd Edition. Prentice Hall. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Rec: https://github.com/mit-gfx/recfilter, accessed 8/8/2017.Google ScholarGoogle Scholar
  18. SAM: http://cs.txstate.edu/~burtscher/research/SAM/, accessed 8/8/2017.Google ScholarGoogle Scholar
  19. S. Sengupta, A.E. Lefohn, and J.D. Owens. "A Work-Efficient Step-Efficient Prefix Sum Algorithm." In Proceedings of the Workshop on Edge Computing Using New Commodity Architectures, pp. 26--27. 2006.Google ScholarGoogle Scholar
  20. S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. "Scan Primitives for GPU Computing." In Proceedings of Graphics Hardware, pp. 97--106. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Sengupta, M. Harris, and M. Garland. "Efficient Parallel Scan Algorithms for GPUs." NVIDIA. 2008 - gpucomputing.net.Google ScholarGoogle Scholar
  22. S.W. Smith. "Digital Signal Processing: A Practical Guide for Engineers and Scientists." Newnes, 2002. ISBN 0--7506--7444-X.Google ScholarGoogle Scholar
  23. H.S. Stone. "An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations." Journal of the ACM, 20(1):27--38. 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. Sung and S. Mitra. "Efficient Multi-Processor Implementation of Recursive Digital Filters." In Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing, 11:257--260. 1986.Google ScholarGoogle ScholarCross RefCross Ref
  25. W. Thies, M. Karczmarek, and S.P. Amarasinghe. "StreamIt: A Language for Streaming Applications." In Proceedings of the 11th International Conference on Compiler Construction, pp. 179--196. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic Hierarchical Parallelization of Linear Recurrences

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 53, Issue 2
        ASPLOS '18
        February 2018
        809 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/3296957
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
          March 2018
          827 pages
          ISBN:9781450349116
          DOI:10.1145/3173162

        Copyright © 2018 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 March 2018

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!