skip to main content
research-article

Synthesising graphics card programs from DSLs

Published:11 June 2012Publication History
Skip Abstract Section

Abstract

Over the last five years, graphics cards have become a tempting target for scientific computing, thanks to unrivaled peak performance, often producing a runtime speed-up of x10 to x25 over comparable CPU solutions.

However, this increase can be difficult to achieve, and doing so often requires a fundamental rethink. This is especially problematic in scientific computing, where experts do not want to learn yet another architecture.

In this paper we develop a method for automatically parallelising recursive functions of the sort found in scientific papers. Using a static analysis of the function dependencies we identify sets - partitions - of independent elements, which we use to synthesise an efficient GPU implementation using polyhedral code generation techniques. We then augment our language with DSL extensions to support a wider variety of applications, and demonstrate the effectiveness of this with three case studies, showing significant performance improvement over equivalent CPU methods, and similar efficiency to hand-tuned GPU implementations.

References

  1. C. Bastoul. Code generation in the polyhedral model is easier than you think. In PACT'13 IEEE International Conference on Parallel Architecture and Compilation Techniques, pages 7--16, Juan-les-Pins, France, September 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. H. Chafi, Z. DeVito, A. Moors, T. Rompf, A. K. Sujeeth, P. Hanrahan, M. Odersky, and K. Olukotun. Language virtualization for heterogeneous parallel computing. In OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applications, pages 835--847, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0203-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, July 1999. ISBN 0521629713.Google ScholarGoogle Scholar
  4. S. Eddy. HMMer Website, including User Manual. http://hmmer.wustl.edu.Google ScholarGoogle Scholar
  5. C. Elliott. Programming graphics processors functionally. In Proceedings of the 2004 ACM SIGPLAN workshop on Haskell, Haskell '04, pages 45--56, New York, NY, USA, 2004. ACM. ISBN 1-58113-850-4. http://doi.acm.org/10.1145/1017472.1017482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Giegerich and C. Meyer. Algebraic dynamic programming. In AMAST '02: Proceedings of the 9th International Conference on Algebraic Methodology and Software Technology, pages 349--364, London, UK, 2002. Springer-Verlag. ISBN 3-540-44144-1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. A. Gunnels, F. G. Gustavson, G. M. Henry, and R. A. van de Geijn. FLAME: Formal Linear Algebra Methods Environment. ACM Transactions on Mathematical Software, 27 (4): 422--455, Dec. 2001. ISSN 0098-3500. URL http://doi.acm.org/10.1145/504210.504213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. L. P. Jones and S. Singh. A tutorial on parallel and concurrent programming in haskell. In P. W. M. Koopman, R. Plasmeijer, and S. D. Swierstra, editors, Advanced Functional Programming, volume 5832 of Lecture Notes in Computer Science, pages 267--305. Springer, 2008. ISBN 978-3-642-04651-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. M. Karp and M. Held. Finite-state processes and dynamic programming. SIAM Journal on Applied Mathematics, 15 (3): 693--718, 1967. 10.1137/0115060.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Krogh, I. S. Mian, and D. Haussler. A hidden markov model that finds genes in e.coli dna. Nucleic Acids Research, 22 (22): 4768--4778, 1994. 10.1093/nar/22.22.4768.Google ScholarGoogle ScholarCross RefCross Ref
  11. L. Lamport. The parallel execution of do loops. Communications of The ACM, 17: 83--93, February 1974. 10.1145/360827.360844. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Lengauer. Loop parallelization in the polytope model. In CONCUR '93, Lecture Notes in Computer Science 715, pages 398--416. Springer-Verlag, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Liu, B. Schmidt, and D. Maskell. Cudasw 2.0: enhanced smith-waterman protein database search on cuda-enabled gpus based on simt and virtualized simd abstractions. BMC Research Notes, 3 (1): 93, 2010. ISSN 1756-0500. 10.1186/1756-0500-3-93.Google ScholarGoogle Scholar
  14. G. Lunter. HMMoC a compiler for hidden Markov models. Bioinformatics, 23 (18): 2485--2487, September 2007. 10.1093/bioinformatics/btm350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Mainland and G. Morrisett. Nikola: embedding compiled gpu functions in haskell. In Haskell '10: Proceedings of the third ACM Haskell symposium on Haskell, pages 67--78, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0252-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. R. Pearson and D. J. Lipman. Improved tools for biological sequence comparison. Proceedings of The National Academy of Sciences, 85: 2444--2448, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  17. P. Steffen, R. Giegerich, and M. Giraud. Gpu parallelization of algebraic dynamic programming. 2009. URL HAL:http://hal.archives-ouvertes.fr/inria-00438219/en/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Svensson, K. Claessen, and M. Sheeran. Gpgpu kernel implementation and refinement using obsidian. Procedia Computer Science, 1 (1): 2065--2074, 2010. ISSN 1877-0509. DOI: 10.1016/j.procs.2010.04.231. ICCS 2010.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. van Deursen, P. Klint, and J. Visser. Domain-specific languages: an annotated bibliography. SIGPLAN Not., 35: 26--36, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Verge, C. Mauras, and P. Quinton. The alpha language and its use for the design of systolic arrays. The Journal of VLSI Signal Processing, 3: 173--182, 1991. ISSN 0922-5773. URL http://dx.doi.org/10.1007/BF00925828. 10.1007/BF00925828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. P. Walters, V. Balu, S. Kompalli, and V. Chaudhary. Evaluating the use of gpus in liver image segmentation and hmmer database searches. In IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pages 1--12, Washington, DC, USA, 2009. IEEE Computer Society. ISBN 978-1-4244-3751-1. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Synthesising graphics card programs from DSLs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 47, Issue 6
        PLDI '12
        June 2012
        534 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2345156
        Issue’s Table of Contents
        • cover image ACM Conferences
          PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2012
          572 pages
          ISBN:9781450312059
          DOI:10.1145/2254064

        Copyright © 2012 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 June 2012

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!