Abstract
Flattening nested parallelism is a vectorising code transform that converts irregular nested parallelism into flat data parallelism. Although the result has good asymptotic performance, flattening thoroughly restructures the code. Many intermediate data structures and traversals are introduced, which may or may not be eliminated by subsequent optimisation. We present a novel program analysis to identify parts of the program where flattening would only introduce overhead, without appropriate gain. We present empirical evidence that avoiding vectorisation in these cases leads to more efficient programs than if we had applied vectorisation and then relied on array fusion to eliminate intermediates from the resulting code.
- A. W. Appel and T. Jim. Continuation-passing, closure-passing style. In POPL '89: Proceedings of the 16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM Press, 1989. Google Scholar
Digital Library
- L. Bergstrom and J. Reppy. Nested data-parallelism on the GPU. In ICFP'12: Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming. ACM Press, 2012. Forthcoming. Google Scholar
Digital Library
- L. Bergstrom, J. Reppy, M. Rainey, A. Shaw, and M. Fluet. Lazy tree splitting. In ICFP'10: Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming. ACM Press, 2010. Google Scholar
Digital Library
- G. Blelloch, P. Gibbons, and Y. Matias. Provably efficient scheduling for languages with fine-grained parallelism. Journal of the Association for Computing Machinery, 46 (2), 1999. Google Scholar
Digital Library
- G. E. Blelloch and G. W. Sabot. Compiling collection-oriented languages onto massively parallel computers. Journal of Parallel and Distributed Computing, 8, 1990. Google Scholar
Digital Library
- M. M. T. Chakravarty and G. Keller. More types for nested data parallel programming. In ICFP'00: Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming. ACM Press, 2000. Google Scholar
Digital Library
- M. M. T. Chakravarty and G. Keller. Functional array fusion. In ICFP'01: Proceedings of the Sixth ACM SIGPLAN International Conference on Functional Programming. ACM Press, 2001. Google Scholar
Digital Library
- M. M. T. Chakravarty, G. Keller, and S. Peyton Jones. Associated type synonyms. In ICFP'05: Proceedings of the Tenth ACM SIGPLAN International Conference on Functional Programming. ACM Press, 2005. Google Scholar
Digital Library
- M. M. T. Chakravarty, G. Keller, S. Peyton Jones, and S. Marlow. Associated types with class. In POPL'05: Proceedings of the 32nd ACM SIGPLAN-SIGACT Sysposium on Principles of Programming Languages. ACM Press, 2005. Google Scholar
Digital Library
- M. M. T. Chakravarty, R. Leshchinskiy, S. Peyton Jones, G. Keller, and S. Marlow. Data Parallel Haskell: a status report. In DAMP 2007: Workshop on Declarative Aspects of Multicore Programming. ACM Press, 2007. Google Scholar
Digital Library
- M. M. T. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and G. Keller. Partial vectorisation of Haskell programs. In DAMP 2008: Workshop on Declarative Aspects of Multicore Programming, 2008.Google Scholar
- D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From lists to streams to nothing at all. In ICFP 2007: Proceedings of the ACM SIGPLAN International Conference on Functional Programming, 2007. Google Scholar
Digital Library
- D. Coutts, D. Stewart, and R. Leshchinskiy. Rewriting haskell strings. In PADL 2007: Practical Aspects of Declarative Languages 8th International Symposium. Springer-Verlag, Jan. 2007. Google Scholar
Digital Library
- G. Keller and M. M. T. Chakravarty. Flattening trees. In Euro-Par'98, Parallel Processing, number 1470 in LNCS. Springer-Verlag, 1998. Google Scholar
Digital Library
- R. Leshchinskiy, M. M. T. Chakravarty, and G. Keller. Higher order flattening. In PAPP 2006: Third International Workshop on Practical Aspects of High-level Parallel Programming, number 3992 in LNCS. Springer-Verlag, 2006. Google Scholar
Digital Library
- B. Lippmeier, M. M. T. Chakravarty, G. Keller, R. Leshchinskiy, and S. Peyton Jones. Work efficient higher order vectorisation. In ICFP'12: Proceedings of the ACM SIGPLAN International Conference on Functional Programming(to appear). ACM Press, 2012. Google Scholar
Digital Library
- S. Peyton Jones, R. Leshchinskiy, G. Keller, and M. M. T. Chakravarty. Harnessing the multicores: Nested data parallelism in Haskell. In FSTTCS 2008: IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, LIPIcs. Schloss Dagstuhl, 2008.Google Scholar
Digital Library
- A. Shaw. Implementation Techniques For Nested-data-parallel Languages. Phd thesis, Department Of Computer Science, The University Of Chicago, 2011. Google Scholar
Digital Library
- B. So, A. Ghuloum, and Y. Wu. Optimizing data parallel operations on many-core platforms. In STMCS'06: First Workshop on Software Tools for Multi-Core Systems, 2006.Google Scholar
- D. Spoonhower, G. E. Blelloch, R. Harper, and P. B. Gibbons. Space profiling for parallel functional programs. J. Funct. Program., 20 (5-6), 2010. ISSN 0956-7968. Google Scholar
Digital Library
- M. Sulzmann, M. Chakravarty, S. Peyton Jones, and K. Donnelly. System F with type equality coercions. In TLDI'07: ACM SIGPLAN International Workshop on Types in Language Design and Implementation. ACM, 2007. Google Scholar
Digital Library
Index Terms
Vectorisation avoidance
Recommendations
Vectorisation avoidance
Haskell '12: Proceedings of the 2012 Haskell SymposiumFlattening nested parallelism is a vectorising code transform that converts irregular nested parallelism into flat data parallelism. Although the result has good asymptotic performance, flattening thoroughly restructures the code. Many intermediate data ...
Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests
Linear loop transformations and tiling are known to be very effective for enhancing locality of reference in perfectly-nested loops. However, they cannot be applied directly to imperfectly-nested loops. Some compilers attempt to convert imperfectly-...
Outer-loop vectorization: revisited for short SIMD architectures
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniquesVectorization has been an important method of using data-level parallelism to accelerate scientific workloads on vector machines such as Cray for the past three decades. In the last decade it has also proven useful for accelerating multi-media and ...







Comments