skip to main content
research-article

Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia

Published:12 February 2011Publication History
Skip Abstract Section

Abstract

We describe two novel constructs for programming parallel machines with multi-level memory hierarchies: call-up, which allows a child task to invoke computation on its parent, and spawn, which spawns a dynamically determined number of parallel children until some termination condition in the parent is met. Together we show that these constructs allow applications with irregular parallelism to be programmed in a straightforward manner, and furthermore these constructs complement and can be combined with constructs for expressing regular parallelism. We have implemented spawn and call-up in Sequoia and we present an experimental evaluation on a number of irregular applications.

References

  1. M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra, MPI-The Complete Reference. MIT Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W. Carlson, J. Draper, D. Culler, K. Yelick, E. Brooks, and K. Warren, "Introduction to UPC and Language Specification," Center for Computing Sciences, IDA, Technical Report CCS-TR-99-157, 1999.Google ScholarGoogle Scholar
  3. K. Yelick phet al., "Titanium: A high-performance Java dialect," in Workshop on Java for High-Performance Network Computing, 1998.Google ScholarGoogle Scholar
  4. K. Barker phet al., "Entering the PetaFLOP era: The architecture and performance of Roadrunner," in Supercomputing, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Fatahalian phet al., "Sequoia: Programming the Memory Hierarchy," in Supercomputing, November 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Aho, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Knight phet al., "Compilation for explicitly managed memory hierarchies," in Symposium on Principles and Practice of Parallel Programming, 2007, pp. 226--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Houston phet al., "A portable runtime interface for multi-level memory hierarchies," in Symposium on Principles and Practice of Parallel Programming, 2008, pp. 143--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Alpern, L. Carter, and J. Ferrante, "Modeling parallel computers as memory hierarchies," in Programming Models for Massively Parallel Computers, 1993.Google ScholarGoogle Scholar
  10. M. Ren, J. Y. Park, M. Houston, A. Aiken, and W. Dally, "A tuning framework for software-managed memory hierarchies," in Int'l Conference on Parallel Architectures and Compilation Techniques, 2008, pp. 280--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Hamadi, S. Jabbour, and L. Sais, "ManySAT: a parallel SAT solver," vol. 6, pp. 245--262, 2008.Google ScholarGoogle Scholar
  12. N. Eén and N. Sörensson, "An extensible SAT-solver," in Theory and Applications of Satisfiability Testing, 2004, pp. 333--336.Google ScholarGoogle Scholar
  13. R. Vuduc, J. Demmel, and K. Yelick, "OSKI: A library of automatically tuned sparse matrix kernels," in Inst. of Physics Publishing, 2005.Google ScholarGoogle Scholar
  14. A. Buluç phet al., "Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks," in Symposium on Parallelism in Algorithms and Architectures, 2009, pp. 233--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Lee and R. Eigenmann, "Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems," in Supercomputing, 2008, pp. 195--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. A. Davis, "University of florida sparse matrix collection," NA Digest, vol. 92, 1994.Google ScholarGoogle Scholar
  17. N. Leischner, V. Osipov, and P. Sanders, "GPU sample sort," CoRR, vol. abs/0909.5649, 2009.Google ScholarGoogle Scholar
  18. D. Culler phet al., "Parallel programming in Split-C," in Supercomputing, 1993, pp. 262--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. W. Numrich and J. Reid, "Co-array Fortran for parallel programming," SIGPLAN Fortran Forum, vol. 17, no. 2, pp. 1--31, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks, and K. Warren, "Introduction to UPC and language specification," UC Berkeley Technical Report: CCS-TR-99-157, 1999.Google ScholarGoogle Scholar
  21. P. Charles phet al., "X10: An object-oriented approach to non-uniform cluster computing," in Conference on Object Oriented Programming Systems Languages and Applications, 2005, pp. 519--538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Callahan, B. L. Chamberlain, and H. P. Zima, "The Cascade high productivity language," in Int'l Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004, pp. 52--60.Google ScholarGoogle ScholarCross RefCross Ref
  23. E. Allen, D. Chase, V. Luchangco, J.-W. Maessen, S. Ryu, G. Steele, and S. Tobin-Hochstadt., "The Fortress language specification version 0.707. Technical report," Sun Microsystems, 2005.Google ScholarGoogle Scholar
  24. S. J. Deitz, B. L. Chamberlain, and L. Snyder, "Abstractions for dynamic data distribution," in Int'l Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004, pp. 42--51.Google ScholarGoogle ScholarCross RefCross Ref
  25. Y. Yan, J. Zhao, Y. Guo, and V. Sarkar, "Hierarchical place trees: A portable abstraction for task parallelism and data movement," in Workshop on Languages and Compilers for Parallel Computing, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Bikshandi phet al., "Programming for parallelism and locality with hierarchically tiled arrays," in Symposium on Principles and Practice of Parallel Programming, 2006, pp. 48--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Mattson, "A programming system for the Imagine Media Processor," Ph.D. dissertation, Stanford University, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, "Brook for GPUs: Stream computing on graphics hardware," ACM Trans. Graph., vol. 23, no. 3, pp. 777--786, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. F. Labonte, P. Mattson, I. Buck, C. Kozyrakis, and M. Horowitz, "The stream virtual machine," in Int'l Conference on Parallel Architectures and Compilation Techniques, September 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Blumofe, C. Joerg, B. Kuszmaul, C. Leiserson, K. Randall, and Y. Zhou, "Cilk: An efficient multithreaded runtime system," in phSymposium on Principles and Practice of Parallel Programming, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Dagum and R. Menon,"OpenMP: An industry-standard API for shared-memory programming," IEEE Comput. Sci. Eng., vol. 5, no. 1, pp. 46--55, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Alpern, L. Carter, E. Feig, and T. Selker, "The uniform memory hierarchy model of computation," Algorithmica, vol. 12, no. 2/3, pp. 72--109, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Jia-Wei and H. T. Kung, "I/O complexity: The red-blue pebble game," in Symposium on Theory of Computing, 1981, pp. 326--333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. S. Vitter, "External memory algorithms," in Handbook of Massive Data Sets. Kluwer Academic Publishers, 2002, pp. 359--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. B. Alpern, L. Carter, and J. Ferrante, "Space-limited procedures: A methodology for portable high performance," in Int'l Working Conference on Massively Parallel Programming Models, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 46, Issue 8
      PPoPP '11
      August 2011
      300 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2038037
      Issue’s Table of Contents
      • cover image ACM Conferences
        PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
        February 2011
        326 pages
        ISBN:9781450301190
        DOI:10.1145/1941553
        • General Chair:
        • Calin Cascaval,
        • Program Chair:
        • Pen-Chung Yew

      Copyright © 2011 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 February 2011

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!