Abstract
Task graphs or dependence graphs are used in runtime systems to schedule tasks for parallel execution. In problem domains such as dense linear algebra and signal processing, dependence graphs can be generated from a program by static analysis. However, in emerging problem domains such as graph analytics, the set of tasks and dependences between tasks in a program are complex functions of runtime values and cannot be determined statically. In this paper, we introduce a novel approach for exploiting parallelism in such programs. This approach is based on a data structure called the kinetic dependence graph (KDG), which consists of a dependence graph together with update rules that incrementally update the graph to reflect changes in the dependence structure whenever a task is completed.
We have implemented a simple programming model that allows programmers to write these applications at a high level of abstraction, and a runtime within the Galois system [15] that builds the KDG automatically and executes the program in parallel. On a suite of programs that are difficult to parallelize otherwise, we have obtained speedups of up to 33 on 40 cores, out-performing third-party implementations in many cases.
- S. Ainsley, E. Vouga, E. Grinspun, and R. Tamstorf. Speculative Parallel Asynchronous Contact Mechanics. ACM Transactions on Graphics (SIGGRAPH Asia), 31(6):8, Nov 2012. Google Scholar
Digital Library
- B. J. Alder and T. E. Wainwright. Studies in molecular dynamics. i. general method. J. Chem. Phys. 31 (2), page 459, 1959.Google Scholar
- J. Barnes and P. Hut. A hierarchical o(n log n) force-calculation algorithm. Nature, 324(4), December 1986.Google Scholar
- J. Basch, L. J. Guibas, and J. Hershberger. Data structures for mobile data. In JOURNAL OF ALGORITHMS, pages 747--756, 1997. Google Scholar
Digital Library
- G. Blelloch. Programming parallel algorithms. Communications of the ACM, 39(3), March 1996. Google Scholar
Digital Library
- G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and J. Shun. Internally deterministic parallel algorithms can be fast. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 181--192, New York, NY, USA, 2012. ACM. Google Scholar
Digital Library
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30(8):207--216, 1995. Google Scholar
Digital Library
- M. G. Burk, K. Knobe, R. Newton, and V. Sarkar. The concurrent collections programming model. Technical Report Department of Computer Science Technical Report TR 10--12, Rice University, December, 2010.Google Scholar
- K. M. Chandy and J. Misra. Distributed simulation: A case study in design and verification of distributed programs. IEEE Trans. Software Eng., 5(5):440--452, 1979. Google Scholar
Digital Library
- P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: An object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '05, pages 519--538, New York, NY, USA, 2005. ACM. Google Scholar
Digital Library
- M. Cintra, J. F. Martınez, and J. Torrellas. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In International Symposium on Computer Architecture (ISCA), pages 13--24, Vancouver, Canada, June 2000. Google Scholar
Digital Library
- T. Cormen, C. Leiserson, R. Rivest, and C. Stein, editors. Introduction to Algorithms. MIT Press, 2001. Google Scholar
Digital Library
- T. A. Davis. Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2006. Google Scholar
Digital Library
- A. Duran, X. Teruel, R. Ferrer, X. Martorell, and E. Ayguade. Barcelona openmp tasks suite: A set of benchmarks targeting the exploitation of task parallelism in openmp. In Proceedings of the 2009 International Conference on Parallel Processing, ICPP '09, pages 124--131, Washington, DC, USA, 2009. IEEE Computer Society. Google Scholar
Digital Library
- http://iss.ices.utexas.edu/?p=projects/galois.Google Scholar
- T. Gautier, J. V. Lima, N. Maillard, and B. Raffin. Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures. Parallel and Distributed Processing Symposium, International, 0:1299--1308, 2013. Google Scholar
Digital Library
- A. Gupta, S. Koric, and T. George. Sparse matrix factorization on massively parallel computers. In SC '09, 2009. Google Scholar
Digital Library
- D. Harmon, E. Vouga, B. Smith, R. Tamstorf, and E. Grinspun. Asynchronous contact mechanics. In ACM SIGGRAPH 2009 papers, SIGGRAPH '09, pages 87:1--87:12, New York, NY, USA, 2009. ACM. Google Scholar
Digital Library
- J.-C. Huang, X. Jiao, R. M. Fujimoto, and H. Zha. Dag-guided parallel asynchronous variational integrators with super-elements. In Proceedings of the 2007 summer computer simulation conference, SCSC, pages 691--697, San Diego, CA, USA, 2007. Society for Computer Simulation International. Google Scholar
Digital Library
- Intel Corporation. Intel thread building blocks 2.0. http://osstbb.intel.com.Google Scholar
- D. R. Jefferson. Virtual time. ACM TOPLAS, 7(3), 1985. Google Scholar
Digital Library
- F. John. Partial Diffeential Equations. Springer Publishers, 1984.Google Scholar
- K. Kennedy and J. Allen, editors. Optimizing compilers for modren architectures:a dependence-based approach. Morgan Kaufmann, 2001. Google Scholar
Digital Library
- V. Krishnan and J. Torrellas. A chip-multiprocessor architecture with speculative multithreading. IEEE Trans. Comput., 48(9):866--880, Sept. 1999. Google Scholar
Digital Library
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. SIGPLAN Not. (Proceedings of PLDI), 42(6):211--222, 2007. Google Scholar
Digital Library
- C. E. Leiserson and T. B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, SPAA '10, 2010. Google Scholar
Digital Library
- A. Lew, J. E. Marsden, M. Ortiz, and M. West. Asynchronous variational integrators. ARCHIVE FOR RATIONAL MECHANICS AND ANALYSIS, 2003.Google Scholar
Cross Ref
- B. Lubachevsky. Simulating colliding rigid disks in parallel using bounded lag without time warp. In SCS Multiconference, 1990.Google Scholar
- P. Marcuello, A. González, and J. Tubella. Speculative multithreaded processors. In Proceedings of the 12th international conference on Supercomputing, ICS '98, pages 77--84, New York, NY, USA, 1998. ACM. Google Scholar
Digital Library
- J. Misra. Distributed discrete-event simulation. ACM Comput. Surv., 18(1):39--65, 1986. Google Scholar
Digital Library
- D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 456--471, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 4.0, July, 2013.Google Scholar
- J. Perez, R. Badia, and J. Labarta. A dependency-aware task-based programming environment for multi-core architectures. In Cluster Computing, 2008 IEEE International Conference on, pages 142--151, Sept 2008.Google Scholar
Cross Ref
- K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The Tao of Parallelism in Algorithms. In PLDI 2011, pages 12--25, New York, NY, USA, 2011. ACM. Google Scholar
Digital Library
- H. Plummer. On the problem of distribution in globular star clusters. Mon. Not. R. Astron. Soc., 71(460), 1911.Google Scholar
- L. Rauchwerger and D. A. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999. Google Scholar
Digital Library
- J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. In ISCA '00, 2000. Google Scholar
Digital Library
- http://icl.cs.utk.edu/dague/.Google Scholar
- http://www.cs.utexas.edu/ flame/web/.Google Scholar
- http://www.spiral.net/.Google Scholar
Index Terms
Kinetic Dependence Graphs
Recommendations
Kinetic Dependence Graphs
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating SystemsTask graphs or dependence graphs are used in runtime systems to schedule tasks for parallel execution. In problem domains such as dense linear algebra and signal processing, dependence graphs can be generated from a program by static analysis. However, ...
Kinetic Dependence Graphs
ASPLOS'15Task graphs or dependence graphs are used in runtime systems to schedule tasks for parallel execution. In problem domains such as dense linear algebra and signal processing, dependence graphs can be generated from a program by static analysis. However, ...
Equistable graphs, general partition graphs, triangle graphs, and graph products
In this paper we examine the connections between equistable graphs, general partition graphs and triangle graphs. While every general partition graph is equistable and every equistable graph is a triangle graph, not every triangle graph is equistable, ...







Comments