skip to main content
research-article

Kinetic Dependence Graphs

Published:14 March 2015Publication History
Skip Abstract Section

Abstract

Task graphs or dependence graphs are used in runtime systems to schedule tasks for parallel execution. In problem domains such as dense linear algebra and signal processing, dependence graphs can be generated from a program by static analysis. However, in emerging problem domains such as graph analytics, the set of tasks and dependences between tasks in a program are complex functions of runtime values and cannot be determined statically. In this paper, we introduce a novel approach for exploiting parallelism in such programs. This approach is based on a data structure called the kinetic dependence graph (KDG), which consists of a dependence graph together with update rules that incrementally update the graph to reflect changes in the dependence structure whenever a task is completed.

We have implemented a simple programming model that allows programmers to write these applications at a high level of abstraction, and a runtime within the Galois system [15] that builds the KDG automatically and executes the program in parallel. On a suite of programs that are difficult to parallelize otherwise, we have obtained speedups of up to 33 on 40 cores, out-performing third-party implementations in many cases.

References

  1. S. Ainsley, E. Vouga, E. Grinspun, and R. Tamstorf. Speculative Parallel Asynchronous Contact Mechanics. ACM Transactions on Graphics (SIGGRAPH Asia), 31(6):8, Nov 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. J. Alder and T. E. Wainwright. Studies in molecular dynamics. i. general method. J. Chem. Phys. 31 (2), page 459, 1959.Google ScholarGoogle Scholar
  3. J. Barnes and P. Hut. A hierarchical o(n log n) force-calculation algorithm. Nature, 324(4), December 1986.Google ScholarGoogle Scholar
  4. J. Basch, L. J. Guibas, and J. Hershberger. Data structures for mobile data. In JOURNAL OF ALGORITHMS, pages 747--756, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Blelloch. Programming parallel algorithms. Communications of the ACM, 39(3), March 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and J. Shun. Internally deterministic parallel algorithms can be fast. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 181--192, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30(8):207--216, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. G. Burk, K. Knobe, R. Newton, and V. Sarkar. The concurrent collections programming model. Technical Report Department of Computer Science Technical Report TR 10--12, Rice University, December, 2010.Google ScholarGoogle Scholar
  9. K. M. Chandy and J. Misra. Distributed simulation: A case study in design and verification of distributed programs. IEEE Trans. Software Eng., 5(5):440--452, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: An object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '05, pages 519--538, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Cintra, J. F. Martınez, and J. Torrellas. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In International Symposium on Computer Architecture (ISCA), pages 13--24, Vancouver, Canada, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Cormen, C. Leiserson, R. Rivest, and C. Stein, editors. Introduction to Algorithms. MIT Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. A. Davis. Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Duran, X. Teruel, R. Ferrer, X. Martorell, and E. Ayguade. Barcelona openmp tasks suite: A set of benchmarks targeting the exploitation of task parallelism in openmp. In Proceedings of the 2009 International Conference on Parallel Processing, ICPP '09, pages 124--131, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. http://iss.ices.utexas.edu/?p=projects/galois.Google ScholarGoogle Scholar
  16. T. Gautier, J. V. Lima, N. Maillard, and B. Raffin. Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures. Parallel and Distributed Processing Symposium, International, 0:1299--1308, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Gupta, S. Koric, and T. George. Sparse matrix factorization on massively parallel computers. In SC '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Harmon, E. Vouga, B. Smith, R. Tamstorf, and E. Grinspun. Asynchronous contact mechanics. In ACM SIGGRAPH 2009 papers, SIGGRAPH '09, pages 87:1--87:12, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J.-C. Huang, X. Jiao, R. M. Fujimoto, and H. Zha. Dag-guided parallel asynchronous variational integrators with super-elements. In Proceedings of the 2007 summer computer simulation conference, SCSC, pages 691--697, San Diego, CA, USA, 2007. Society for Computer Simulation International. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Intel Corporation. Intel thread building blocks 2.0. http://osstbb.intel.com.Google ScholarGoogle Scholar
  21. D. R. Jefferson. Virtual time. ACM TOPLAS, 7(3), 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. John. Partial Diffeential Equations. Springer Publishers, 1984.Google ScholarGoogle Scholar
  23. K. Kennedy and J. Allen, editors. Optimizing compilers for modren architectures:a dependence-based approach. Morgan Kaufmann, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. V. Krishnan and J. Torrellas. A chip-multiprocessor architecture with speculative multithreading. IEEE Trans. Comput., 48(9):866--880, Sept. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. SIGPLAN Not. (Proceedings of PLDI), 42(6):211--222, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. E. Leiserson and T. B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, SPAA '10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Lew, J. E. Marsden, M. Ortiz, and M. West. Asynchronous variational integrators. ARCHIVE FOR RATIONAL MECHANICS AND ANALYSIS, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  28. B. Lubachevsky. Simulating colliding rigid disks in parallel using bounded lag without time warp. In SCS Multiconference, 1990.Google ScholarGoogle Scholar
  29. P. Marcuello, A. González, and J. Tubella. Speculative multithreaded processors. In Proceedings of the 12th international conference on Supercomputing, ICS '98, pages 77--84, New York, NY, USA, 1998. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Misra. Distributed discrete-event simulation. ACM Comput. Surv., 18(1):39--65, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 456--471, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 4.0, July, 2013.Google ScholarGoogle Scholar
  33. J. Perez, R. Badia, and J. Labarta. A dependency-aware task-based programming environment for multi-core architectures. In Cluster Computing, 2008 IEEE International Conference on, pages 142--151, Sept 2008.Google ScholarGoogle ScholarCross RefCross Ref
  34. K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The Tao of Parallelism in Algorithms. In PLDI 2011, pages 12--25, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Plummer. On the problem of distribution in globular star clusters. Mon. Not. R. Astron. Soc., 71(460), 1911.Google ScholarGoogle Scholar
  36. L. Rauchwerger and D. A. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. In ISCA '00, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. http://icl.cs.utk.edu/dague/.Google ScholarGoogle Scholar
  39. http://www.cs.utexas.edu/ flame/web/.Google ScholarGoogle Scholar
  40. http://www.spiral.net/.Google ScholarGoogle Scholar

Index Terms

  1. Kinetic Dependence Graphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!