ABSTRACT
For more than thirty years, the parallel programming community has used the dependence graph as the main abstraction for reasoning about and exploiting parallelism in "regular" algorithms that use dense arrays, such as finite-differences and FFTs. In this paper, we argue that the dependence graph is not a suitable abstraction for algorithms in new application areas like machine learning and network analysis in which the key data structures are "irregular" data structures like graphs, trees, and sets.
To address the need for better abstractions, we introduce a data-centric formulation of algorithms called the operator formulation in which an algorithm is expressed in terms of its action on data structures. This formulation is the basis for a structural analysis of algorithms that we call tao-analysis. Tao-analysis can be viewed as an abstraction of algorithms that distills out algorithmic properties important for parallelization. It reveals that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous in algorithms, and that, depending on the tao-structure of the algorithm, this parallelism may be exploited by compile-time, inspector-executor or optimistic parallelization, thereby unifying these seemingly unrelated parallelization techniques. Regular algorithms emerge as a special case of irregular algorithms, and many application-specific optimization techniques can be generalized to a broader context.
These results suggest that the operator formulation and tao-analysis of algorithms can be the foundation of a systematic approach to parallel programming.
References
- A. Aho, R. Sethi, and J. Ullman. Compilers: principles, techniques, and tools. Addison Wesley, 1986. Google Scholar
Digital Library
- P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger. STAPL: An adaptive, generic parallel C++ library. In LCPC, 2003. Google Scholar
Digital Library
- L. O. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Copenhagen, 1994.Google Scholar
- Arvind and R.S.Nikhil. Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. on Computers, 39(3), 1990. Google Scholar
Digital Library
- D. Bader and G. Cong. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. Journal of Parallel and Distributed Computing, 66(11):1366--1378, 2006. Google Scholar
Digital Library
- J. Barnes and P. Hut. A hierarchical o(n log n) force-calculation algorithm. Nature, 324(4), December 1986.Google Scholar
- D. K. Blandford, G. E. Blelloch, and C. Kadow. Engineering a compact parallel Delaunay algorithm in 3D. In Symposium on Computational Geometry, pages 292--300, 2006. Google Scholar
Digital Library
- G. Blelloch. Programming parallel algorithms. Communications of the ACM, 39(3), March 1996. Google Scholar
Digital Library
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30(8):207--216, 1995. Google Scholar
Digital Library
- U. Brandes and T. Erlebach, editors. Network Analysis: Methodological Foundations. Springer-Verlag, 2005. Google Scholar
Digital Library
- L. P. Chew. Guaranteed-quality mesh generation for curved surfaces. In SCG, 1993. Google Scholar
Digital Library
- T. Cormen, C. Leiserson, R. Rivest, and C. Stein, editors. Introduction to Algorithms. MIT Press, 2001. Google Scholar
Digital Library
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004. Google Scholar
Digital Library
- B. Delaunay. Sur la sphere vide. Izvestia Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestvennykh Nauk,, 7:793--800, 1934.Google Scholar
- J. Dennis. Dataflow ideas for supercomputers. In CompCon, 1984.Google Scholar
- E. Dijkstra. A Discipline of Programming. Prentice Hall, 1976. Google Scholar
Digital Library
- P. Diniz and M. Rinard. Commutativity analysis: a new analysis technique for parallelizing compilers. ACM TOPLAS, 19(6), 1997. Google Scholar
Digital Library
- H. Ehrig and M. Löwe. Parallel and distributed derivations in the single-pushout approach. Theoretical Computer Science, 109:123--143, 1993. Google Scholar
Digital Library
- R. Ghiya and L. Hendren. Is it a tree, a dag, or a cyclic graph? a shape analysis for heap-directed pointers in C. In POPL, 1996. Google Scholar
Digital Library
- J. R. Gilbert and R. Schreiber. Highly parallel sparse Cholesky factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992. Google Scholar
Digital Library
- M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarsegrained task, data, and pipeline parallelism in stream programs. In ASPLOS, 2006. Google Scholar
Digital Library
- D. Gregor and A. Lumsdaine. Lifting sequential graph algorithms for distributed-memory parallel computation. In OOPSLA, 2005. Google Scholar
Digital Library
- L. J. Guibas, D. E. Knuth, and M. Sharir. Randomized incremental construction of delaunay and voronoi diagrams. Algorithmica, 7(1):381--413, December 1992.Google Scholar
Cross Ref
- B. Hardekopf and C. Lin. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In PLDI, 2007. Google Scholar
Digital Library
- T. Harris and K. Fraser. Language support for lightweight transactions. In OOPSLA, pages 388--402, 2003. Google Scholar
Digital Library
- M. A. Hassaan, M. Burtscher, and K. Pingali. Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms. In PPoPP, 2011. Google Scholar
Digital Library
- L. Hendren and A. Nicolau. Parallelizing programs with recursive data structures. IEEE TPDS, 1(1):35--47, January 1990. Google Scholar
Digital Library
- M. Herlihy and E. Koskinen. Transactional boosting: a methodology for highly-concurrent transactional objects. In PPoPP, 2008. Google Scholar
Digital Library
- M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA, 1993. Google Scholar
Digital Library
- S. Horwitz, P. Pfieffer, and T. Reps. Dependence analysis for pointer variables. In PLDI, 1989. Google Scholar
Digital Library
- B. Hudson, G. L. Miller, and T. Phillips. Sparse parallel Delaunay mesh refinement. In SPAA, 2007. Google Scholar
Digital Library
- J. JaJa. An Introduction to Parallel Algorithms. Addison-Wesley, 1992. Google Scholar
Digital Library
- D. R. Jefferson. Virtual time. ACM TOPLAS, 7(3), 1985. Google Scholar
Digital Library
- G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. JPDC, 48(1):96--129, 1998. Google Scholar
Digital Library
- K. Kennedy and J. Allen, editors. Optimizing compilers for modern architectures. Morgan Kaufmann, 2001. Google Scholar
Digital Library
- F. Kjolstad and M. Snir. Ghost cell pattern. In Workshop on Parallel Programming Patterns, 2010. Google Scholar
Digital Library
- R. Kramer, R. Gupta, and M. L. Soffa. The combining DAG: A technique for parallel data flow analysis. IEEE Transactions on Parallel and Distributed Systems, 5(8), August 1994. Google Scholar
Digital Library
- M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Cascaval. How much parallelism is there in irregular applications? In PPoPP, 2009. Google Scholar
Digital Library
- M. Kulkarni, D. Nguyen, D. Prountzos, X. Sui, and K. Pingali. Exploiting the commutativity lattice. In PLDI, 2011. Google Scholar
Digital Library
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. Chew. Optimistic parallelism requires abstractions. In PLDI, 2007. Google Scholar
Digital Library
- Y. Lee and B. G. Ryder. A comprehensive approach to parallel data flow analysis. In Supercomputing, pages 236--247, 1992. Google Scholar
Digital Library
- M. Lowe and H. Ehrig. Algebraic approach to graph transformation based on single pushout derivations. In Workshop on Graph-theoretic concepts in computer science, 1991. Google Scholar
Digital Library
- M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM J. Comput., 15, 1986. Google Scholar
Digital Library
- D. Mackay. Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2003. Google Scholar
Digital Library
- T. Mattson, B. Sanders, and B. Massingill. Patterns for Parallel Programming. Addison-Wesley Publishers, 2004. Google Scholar
Digital Library
- M. Méndez-Lojo, A. Mathew, and K. Pingali. Parallel Anderson-style points-to analysis. In OOPSLA, 2010. Google Scholar
Digital Library
- M. Méndez-Lojo, D. Nguyen, D. Prountzos, X. Sui, M. A. Hassaan, M. Kulkarni, M. Burtscher, and K. Pingali. Structure-driven optimizations for amorphous data-parallel programs. In PPoPP, 2010. Google Scholar
Digital Library
- V. Menon, K. Pingali, and N. Mateev. Fractal symbolic analysis. ACM TOPLAS, March 2003. Google Scholar
Digital Library
- J. Misra. Distributed discrete-event simulation. ACM Comput. Surv., 18(1):39--65, 1986. Google Scholar
Digital Library
- D. Nguyen and K. Pingali. Synthesizing concurrent schedulers for irregular algorithms. In ASPLOS, 2011. Google Scholar
Digital Library
- D. Patterson, K. Keutzer, K. Asanovica, K. Yelick, and R. Bodik. Berkeley dwarfs. http://view.eecs.berkeley.edu/.Google Scholar
- M. Pharr, C. Kolb, R. Gershbein, and P. Hanrahan. Rendering complex scenes with memory-coherent ray tracing. In SIGGRAPH, 1997. Google Scholar
Digital Library
- K. Pingali and Arvind. Efficient demand-driven evaluation. part 1. ACM Trans. Program. Lang. Syst., 7, April 1985. Google Scholar
Digital Library
- D. Prountzos, R. Manevich, K. Pingali, and K. McKinley. A shape analysis for optimizing parallel graph programs. In POPL, 2011. Google Scholar
Digital Library
- L. Rauchwerger and D. A. Padua. The LRPD test: Speculative runtime parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999. Google Scholar
Digital Library
- E. E. Santos, S. Feng, and J. M. Rickman. Efficient parallel algorithms for 2-dimensional Ising spin models. In IPDPS, 2002. Google Scholar
Digital Library
- J. T. Schwartz, R. B. K. Dewar, E. Dubinsky, and E. Schonberg. Programming with sets: An introduction to SETL. Springer-Verlag, 1986. Google Scholar
Digital Library
- J. P. Singh, C. Holt, T. Totsuka, A. Gupta, and J. L. Hennessy. Load balancing and data locality in adaptive hierarchical n-body methods: Barnes-hut, fast multipole, and radiosity. Journal Of Parallel and Distributed Computing, 27, 1995. Google Scholar
Digital Library
- P.-N. Tan, M. Steinbach, and V. Kumar, editors. Introduction to Data Mining. Pearson Addison Wesley, 2005. Google Scholar
Digital Library
- C. Verbrugge. A Parallel Solution Strategy for Irregular, Dynamic Problems. PhD thesis, McGill University, 2006. Google Scholar
Digital Library
- T. N. Vijaykumar, S. Gopal, J. E. Smith, and G. Sohi. Speculative versioning cache. IEEE Trans. Parallel Distrib. Syst., 12(12):1305-- 1317, 2001. Google Scholar
Digital Library
- U. Vishkin et al. Explicit multi-threading (xmt) bridging models for instruction parallelism. In SPAA, 1998. Google Scholar
Digital Library
- N. Wirth. Algorithms + Data Structures = Programs. Prentice-Hall, 1976. Google Scholar
Digital Library
- J. Wu, R. Das, J. Saltz, H. Berryman, and S. Hiranandani. Distributed memory compiler design for sparse problems. IEEE Transactions on Computers, 44, 1995. Google Scholar
Digital Library
Index Terms
The tao of parallelism in algorithms






Comments