Abstract
Divide-and-conquer algorithms are suitable for modern parallel machines, tending to have large amounts of inherent parallelism and working well with caches and deep memory hierarchies. Among others, list homomorphisms are a class of recursive functions on lists, which match very well with the divide-and-conquer paradigm. However, direct programming with list homomorphisms is a challenge for many programmers. In this paper, we propose and implement a novel systemthat can automatically derive cost-optimal list homomorphisms from a pair of sequential programs, based on the third homomorphism theorem. Our idea is to reduce extraction of list homomorphisms to derivation of weak right inverses. We show that a weak right inverse always exists and can be automatically generated from a wide class of sequential programs. We demonstrate our system with several nontrivial examples, including the maximum prefix sum problem, the prefix sum computation, the maximum segment sum problem, and the line-of-sight problem. The experimental results show practical efficiency of our automatic parallelization algorithm and good speedups of the generated parallel programs.
- J. Ahn and T. Han. An analytical method for parallelization of recursive functions. Parallel Processing Letters, 10(1):87--98, 2000.Google Scholar
Cross Ref
- Y. Ben-Asher and G. Haber. Parallel solutions of simple indexed recurrence equations. IEEE Transactions on Parallel and Distributed Systems, 12(1):22--40, 2001. Google Scholar
Digital Library
- J. Bentley. Algorithm design techniques. In Programming Pearls, rm Column 7, pages 69--80. Addison-Wesley, 1986.Google Scholar
- R. S. Bird. An introduction to the theory of lists. In Logic of Programming and Calculi of Discrete Design, NATO ASI Series F 36, pages 5--42. 1987. Google Scholar
Digital Library
- R. S. Bird. Introduction to Functional Programming using Haskell. Prentice Hall, 1998. Google Scholar
Digital Library
- G. E. Blelloch. Scans as primitive operations. IEEE Transactions on Computers, 38(11):1526--1538, 1989. Google Scholar
Digital Library
- G. E. Blelloch. Prefix sums and their applications. Technical Report CMU-CS--90--190, School of Computer Science, Carnegie Mellon University, 1990.Google Scholar
- M. Cole. Algorithmic skeletons: A structured approach to the management of parallel computation. Research Monographs in Parallel and Distributed Computing, 1989.Google Scholar
Digital Library
- M. Cole. Parallel programming with list homomorphisms. Parallel Processing Letters, 5(2):191--203, 1995.Google Scholar
Cross Ref
- D. C. Cooper. Theorem proving in arithmetic without multiplication. Machine Intelligence, 7:91--99, 1972.Google Scholar
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In 6th Symposium on Operating System Design and Implementation (OSDI 2004), pages 137--150, 2004. Google Scholar
Digital Library
- E. W. Dijkstra. Program inversion. In Program Construction, LNCS 69, pages 54--57. 1978. Google Scholar
Digital Library
- D. Eppstein. A heuristic approach to program inversion. In Proceedings of the 9th International Joint Conferences on Artificial Intelligence, pages 219--221, 1985.Google Scholar
Digital Library
- A. L. Fisher and A. M. Ghuloum. Parallelizing complex scans and reductions. In Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation (PLDI '94), pages 135--146, 1994. Google Scholar
Digital Library
- A. Geser and S. Gorlatch. Parallelizing functional programs by generalization. In Algebraic and Logic Programming (ALP'97), LNCS 1298, pages 46--60. 1997. Google Scholar
Digital Library
- J. Gibbons. The third homomorphism theorem. Journal of Functional Programming, 6(4):657--665, 1996.Google Scholar
Cross Ref
- R. Glück and M. Kawabe. A program inverter for a functional language with equality and constructors. In Programming Languages and Systems. Proceedings, LNCS 2895, pages 246--264. 2003.Google Scholar
- R. Glück and M. Kawabe. Derivation of deterministic inverse programs based on LR parsing. In Functional and Logic Programming, 7th International Symposium (FLOPS 2004), Proceedings, LNCS 2998, pages 291--306. 2004.Google Scholar
- S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In Programming languages: Implementation, Logics and Programs. PLILP'96, LNCS 1140, pages 274--288. 1996. Google Scholar
Digital Library
- D. Gries. Inverting programs. In The Science of Programming, chapter 21, pages 265--274. 1981.Google Scholar
Cross Ref
- Z. Hu, H. Iwasaki, and M. Takeichi. Formal derivation of efficient parallel programs by construction of. list homomorphisms. ACM Transactions on Programming Languages and Systems, 19(3):444--461, 1997. Google Scholar
Digital Library
- Z. Hu, M. Takeichi, and W. N. Chin. Parallelization in calculational forms. In 25th ACM Symposium on Principles of Programming Languages (POPL '98), pages 316--328, 1998. Google Scholar
Digital Library
- R. E. Korf. Inversion of applicative programs. In Proceedings of the 7th International Conferences on Artificial Intelligence (IC--AI '81), pages 1007--1009, 1981.Google Scholar
- K. Matsuzaki, K. Emoto, H. Iwasaki, and Z. Hu. A library of constructive skeletons for sequential style of parallel programming (invited paper). In 1st International Conference on Scalable Information Systems (InfoScale 2006), 2006. Google Scholar
Digital Library
- M. Presburger. Uber die vollstandigkeit eines gewissen systems der arithmetik ganzer zahlen, in welchem die addition als einzige operation hervorstritt. Sprawozdanie z I Kongresu Matematikow Krajow Slowcanskich Warszawa, pages 92--101, 1929.Google Scholar
- W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pages 4--13, 1991. Google Scholar
Digital Library
- F. Rabhi and S. Gorlatch. Patterns and Skeletons for Parallel and Distributed Computing. 2002. Google Scholar
Digital Library
- R. Rugina and M. C. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the 7th ACM Symposium on Principles Practice of Parallel Programming (PPoPP '99), pages 72--83, 1999. Google Scholar
Digital Library
- I. Sasano, Z. Hu, M. Takeichi, and M. Ogawa. Make it practical: A generic linear time algorithm for solving maximum--weightsum problems. In Proceedings of the 5th ACM SIGPLAN International Conference on Functional Programming (ICFP '00), pages 137--149. 2000. Google Scholar
Digital Library
- G. Steele. Parallel programming and parallel abstractions in fortress. In Functional and Logic Programming, 8th International Symposium (FLOPS 2006), Proceedings, LNCS 3945, page 1. 2006. Google Scholar
Digital Library
- D. N. Xu, S. C. Khoo, and Z. Hu. PType system: A featherweight parallelizability detector. In Proceedings of 2nd Asian Symposium on Programming Languages and Systems (APLAS 2004), LNCS 3302, pages 197--212. 2004.Google Scholar
Cross Ref
Index Terms
Automatic inversion generates divide-and-conquer parallel programs
Recommendations
Automatic inversion generates divide-and-conquer parallel programs
PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and ImplementationDivide-and-conquer algorithms are suitable for modern parallel machines, tending to have large amounts of inherent parallelism and working well with caches and deep memory hierarchies. Among others, list homomorphisms are a class of recursive functions ...
Divide-and-conquer parallel programming with minimally synchronous parallel ML
PPAM'07: Proceedings of the 7th international conference on Parallel processing and applied mathematicsMinimally Synchronous Parallel ML (MSPML) is a functional parallel programming language. It is based on a small number of primitives on a parallel data structure. MSPML programs are written like usual sequential ML program and use this small set of ...
Grid-enabled parallel divide-and-conquer: theory and practice
SAC '02: Proceedings of the 2002 ACM symposium on Applied computingThis paper presents a general methodology for the communication-efficient parallelization of graph algorithms using the divide-and-conquer approach. The algorithm is communication-free in the conquer stage and uses only a small amount of messages while ...







Comments