Abstract
Existing approaches to higher-order vectorisation, also known as flattening nested data parallelism, do not preserve the asymptotic work complexity of the source program. Straightforward examples, such as sparse matrix-vector multiplication, can suffer a severe blow-up in both time and space, which limits the practicality of this method. We discuss why this problem arises, identify the mis-handling of index space transforms as the root cause, and present a solution using a refined representation of nested arrays. We have implemented this solution in Data Parallel Haskell (DPH) and present benchmarks showing that realistic programs, which used to suffer the blow-up, now have the correct asymptotic work complexity. In some cases, the asymptotic complexity of the vectorised program is even better than the original.
- G. Blelloch and G.W. Sabot. Compiling collection-oriented languages onto massively parallel computers. Journal of Parallel and Distributed Computing, 8:119--134, 1990. Google Scholar
Digital Library
- G. E. Blelloch. Vector models for data-parallel computing. MIT Press, 1990. Google Scholar
Digital Library
- G. E. Blelloch. NESL: A nested data-parallel language (version 3.1). Technical report, Carnegie Mellon University, 1995.Google Scholar
- G. E. Blelloch and J. Greiner. A provable time and space efficient implementation of NESL. In ICFP 1996: International Conference on Functional Programming, pages 213--225, 1996. Google Scholar
Digital Library
- M. M. T. Chakravarty, G. Keller, S. Peyton Jones, and S. Marlow. Associated types with class. In POPL 2005: Principles of Programming Languages, pages 1--13. ACM Press, 2005. Google Scholar
Digital Library
- M. M. T. Chakravarty, R. Leshchinskiy, S. Peyton Jones, G. Keller, and S. Marlow. Data Parallel Haskell: a status report. In DAMP 2007: Declarative Aspects of Multicore Programming. ACM Press, 2007. Google Scholar
Digital Library
- D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: from lists to streams to nothing at all. In ICFP 2007: International Conference on Functional Programming, 2007. Google Scholar
Digital Library
- M. Fluet, M. Rainey, and J. Reppy. A scheduling framework for general-purpose parallel languages. In ICFP 2008: International Conference on Functional Programming, pages 241--252. ACM, 2008. Google Scholar
Digital Library
- A. Ghuloum, T. Smith, G.Wu, X. Zhou, J. Fang, P. Guo, B. So, M. Rajagopalan, Y. Chen, and B. Chen. Future-proof data parallel algorithms and software on Intel multi-core architecture. Intel Technology Journal, November 2007.Google Scholar
Cross Ref
- J. Hill, K. M. Clarke, and R. Bornat. Vectorising a non-strict data-parallel functional language, 1994.Google Scholar
- R. Leshchincskiy. Higher-Order Nested Data Parallelism. PhD thesis, Technische Universität Berlin, 2006.Google Scholar
- R. Leshchinskiy, M. M. T. Chakravarty, and G. Keller. Higher order flattening. In ICCS 2006: International Conference on Computational Science, volume 3992, pages 920--928. Springer, 2006. Google Scholar
Digital Library
- B. Lippmeier, M. M. T. Chakravarty, G. Keller, R. Leshchinskiy, and S. P. Jones. Work efficient higher-order vectorisation (unabridged). Technical Report UNSW-CSE-TR-201208, University of New South Wales, 2012.Google Scholar
- D.W. Palmer, J. F. Prins, S. Chatterjee, and R. E. Faith. Piecewise execution of nested data-parallel programs. In Languages and Compilers for Parallel Computing, volume 1033 of Lecture Notes in Computer Science, pages 346--361. Springer-Verlag, 1995. Google Scholar
Digital Library
- D. W. Palmer, J. F. Prins, and S. Westfold. Work-efficient nested data-parallelism. In Proc. of the 5th Symposium on the Frontiers of Massively Parallel Processing, pages 186--193. IEEE, 1995. Google Scholar
Digital Library
- S. Peyton Jones,W. Partain, and A. Santos. Let-floating: Moving bindings to give faster programs. In ICFP 1996: International Conference on Functional Programming, pages 1--12, 1996. Google Scholar
Digital Library
- S. Peyton Jones, R. Leshchinskiy, G. Keller, and M. M. T. Chakravarty. Harnessing the multicores: Nested data parallelism in Haskell. In FSTTCS 2008: Foundations of Software Technology and Theoretical Computer Science, LIPIcs, pages 383--414. Schloss Dagstuhl, 2008.Google Scholar
- J. Riely and J. Prins. Flattening is an improvement. In Proc. of the 7th International Symposium on Static Analysis, pages 360--376, 2000. Google Scholar
Digital Library
- D. Spoonhower, G. E. Blelloch, R. Harper, and P. B. Gibbons. Space profiling for parallel functional programs. In ICFP 2008: International Conference on Functional Programming, 2008. Google Scholar
Digital Library
Index Terms
Work efficient higher-order vectorisation
Recommendations
Work efficient higher-order vectorisation
ICFP '12: Proceedings of the 17th ACM SIGPLAN international conference on Functional programmingExisting approaches to higher-order vectorisation, also known as flattening nested data parallelism, do not preserve the asymptotic work complexity of the source program. Straightforward examples, such as sparse matrix-vector multiplication, can suffer ...
Vectorisation avoidance
Haskell '12Flattening nested parallelism is a vectorising code transform that converts irregular nested parallelism into flat data parallelism. Although the result has good asymptotic performance, flattening thoroughly restructures the code. Many intermediate data ...
Vectorisation avoidance
Haskell '12: Proceedings of the 2012 Haskell SymposiumFlattening nested parallelism is a vectorising code transform that converts irregular nested parallelism into flat data parallelism. Although the result has good asymptotic performance, flattening thoroughly restructures the code. Many intermediate data ...







Comments