Abstract
Over the last five years, graphics cards have become a tempting target for scientific computing, thanks to unrivaled peak performance, often producing a runtime speed-up of x10 to x25 over comparable CPU solutions.
However, this increase can be difficult to achieve, and doing so often requires a fundamental rethink. This is especially problematic in scientific computing, where experts do not want to learn yet another architecture.
In this paper we develop a method for automatically parallelising recursive functions of the sort found in scientific papers. Using a static analysis of the function dependencies we identify sets - partitions - of independent elements, which we use to synthesise an efficient GPU implementation using polyhedral code generation techniques. We then augment our language with DSL extensions to support a wider variety of applications, and demonstrate the effectiveness of this with three case studies, showing significant performance improvement over equivalent CPU methods, and similar efficiency to hand-tuned GPU implementations.
- C. Bastoul. Code generation in the polyhedral model is easier than you think. In PACT'13 IEEE International Conference on Parallel Architecture and Compilation Techniques, pages 7--16, Juan-les-Pins, France, September 2004. Google Scholar
Digital Library
- H. Chafi, Z. DeVito, A. Moors, T. Rompf, A. K. Sujeeth, P. Hanrahan, M. Odersky, and K. Olukotun. Language virtualization for heterogeneous parallel computing. In OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applications, pages 835--847, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0203-6. Google Scholar
Digital Library
- R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, July 1999. ISBN 0521629713.Google Scholar
- S. Eddy. HMMer Website, including User Manual. http://hmmer.wustl.edu.Google Scholar
- C. Elliott. Programming graphics processors functionally. In Proceedings of the 2004 ACM SIGPLAN workshop on Haskell, Haskell '04, pages 45--56, New York, NY, USA, 2004. ACM. ISBN 1-58113-850-4. http://doi.acm.org/10.1145/1017472.1017482. Google Scholar
Digital Library
- R. Giegerich and C. Meyer. Algebraic dynamic programming. In AMAST '02: Proceedings of the 9th International Conference on Algebraic Methodology and Software Technology, pages 349--364, London, UK, 2002. Springer-Verlag. ISBN 3-540-44144-1. Google Scholar
Digital Library
- J. A. Gunnels, F. G. Gustavson, G. M. Henry, and R. A. van de Geijn. FLAME: Formal Linear Algebra Methods Environment. ACM Transactions on Mathematical Software, 27 (4): 422--455, Dec. 2001. ISSN 0098-3500. URL http://doi.acm.org/10.1145/504210.504213. Google Scholar
Digital Library
- S. L. P. Jones and S. Singh. A tutorial on parallel and concurrent programming in haskell. In P. W. M. Koopman, R. Plasmeijer, and S. D. Swierstra, editors, Advanced Functional Programming, volume 5832 of Lecture Notes in Computer Science, pages 267--305. Springer, 2008. ISBN 978-3-642-04651-3. Google Scholar
Digital Library
- R. M. Karp and M. Held. Finite-state processes and dynamic programming. SIAM Journal on Applied Mathematics, 15 (3): 693--718, 1967. 10.1137/0115060.Google Scholar
Cross Ref
- A. Krogh, I. S. Mian, and D. Haussler. A hidden markov model that finds genes in e.coli dna. Nucleic Acids Research, 22 (22): 4768--4778, 1994. 10.1093/nar/22.22.4768.Google Scholar
Cross Ref
- L. Lamport. The parallel execution of do loops. Communications of The ACM, 17: 83--93, February 1974. 10.1145/360827.360844. Google Scholar
Digital Library
- C. Lengauer. Loop parallelization in the polytope model. In CONCUR '93, Lecture Notes in Computer Science 715, pages 398--416. Springer-Verlag, 1993. Google Scholar
Digital Library
- Y. Liu, B. Schmidt, and D. Maskell. Cudasw 2.0: enhanced smith-waterman protein database search on cuda-enabled gpus based on simt and virtualized simd abstractions. BMC Research Notes, 3 (1): 93, 2010. ISSN 1756-0500. 10.1186/1756-0500-3-93.Google Scholar
- G. Lunter. HMMoC a compiler for hidden Markov models. Bioinformatics, 23 (18): 2485--2487, September 2007. 10.1093/bioinformatics/btm350. Google Scholar
Digital Library
- G. Mainland and G. Morrisett. Nikola: embedding compiled gpu functions in haskell. In Haskell '10: Proceedings of the third ACM Haskell symposium on Haskell, pages 67--78, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0252-4. Google Scholar
Digital Library
- W. R. Pearson and D. J. Lipman. Improved tools for biological sequence comparison. Proceedings of The National Academy of Sciences, 85: 2444--2448, 1988.Google Scholar
Cross Ref
- P. Steffen, R. Giegerich, and M. Giraud. Gpu parallelization of algebraic dynamic programming. 2009. URL HAL:http://hal.archives-ouvertes.fr/inria-00438219/en/. Google Scholar
Digital Library
- J. Svensson, K. Claessen, and M. Sheeran. Gpgpu kernel implementation and refinement using obsidian. Procedia Computer Science, 1 (1): 2065--2074, 2010. ISSN 1877-0509. DOI: 10.1016/j.procs.2010.04.231. ICCS 2010.Google Scholar
Cross Ref
- A. van Deursen, P. Klint, and J. Visser. Domain-specific languages: an annotated bibliography. SIGPLAN Not., 35: 26--36, June 2000. Google Scholar
Digital Library
- H. Verge, C. Mauras, and P. Quinton. The alpha language and its use for the design of systolic arrays. The Journal of VLSI Signal Processing, 3: 173--182, 1991. ISSN 0922-5773. URL http://dx.doi.org/10.1007/BF00925828. 10.1007/BF00925828. Google Scholar
Digital Library
- J. P. Walters, V. Balu, S. Kompalli, and V. Chaudhary. Evaluating the use of gpus in liver image segmentation and hmmer database searches. In IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pages 1--12, Washington, DC, USA, 2009. IEEE Computer Society. ISBN 978-1-4244-3751-1. Google Scholar
Digital Library
Index Terms
Synthesising graphics card programs from DSLs
Recommendations
Synthesising graphics card programs from DSLs
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and ImplementationOver the last five years, graphics cards have become a tempting target for scientific computing, thanks to unrivaled peak performance, often producing a runtime speed-up of x10 to x25 over comparable CPU solutions.
However, this increase can be ...
Developmental directions in parallel accelerators
AusPDC '14: Proceedings of the Twelfth Australasian Symposium on Parallel and Distributed Computing - Volume 152Parallel accelerators such as massively-cored graphical processing units or many-cored co-processors such as the Xeon Phi are becoming widespread and affordable on many systems including blade servers and even desktops. The use of a single such ...
Optimizing Dynamic Programming on Graphics Processing Units via Adaptive Thread-Level Parallelism
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed SystemsDynamic programming (DP) is an important computational method for solving a wide variety of discrete optimization problems such as scheduling, string editing, packaging, and inventory management. In general, DP is classified into four categories based ...







Comments