Abstract
We describe two novel constructs for programming parallel machines with multi-level memory hierarchies: call-up, which allows a child task to invoke computation on its parent, and spawn, which spawns a dynamically determined number of parallel children until some termination condition in the parent is met. Together we show that these constructs allow applications with irregular parallelism to be programmed in a straightforward manner, and furthermore these constructs complement and can be combined with constructs for expressing regular parallelism. We have implemented spawn and call-up in Sequoia and we present an experimental evaluation on a number of irregular applications.
- M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra, MPI-The Complete Reference. MIT Press, 1998. Google Scholar
Digital Library
- W. Carlson, J. Draper, D. Culler, K. Yelick, E. Brooks, and K. Warren, "Introduction to UPC and Language Specification," Center for Computing Sciences, IDA, Technical Report CCS-TR-99-157, 1999.Google Scholar
- K. Yelick phet al., "Titanium: A high-performance Java dialect," in Workshop on Java for High-Performance Network Computing, 1998.Google Scholar
- K. Barker phet al., "Entering the PetaFLOP era: The architecture and performance of Roadrunner," in Supercomputing, 2008. Google Scholar
Digital Library
- K. Fatahalian phet al., "Sequoia: Programming the Memory Hierarchy," in Supercomputing, November 2006. Google Scholar
Digital Library
- A. Aho, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. Google Scholar
Digital Library
- T. Knight phet al., "Compilation for explicitly managed memory hierarchies," in Symposium on Principles and Practice of Parallel Programming, 2007, pp. 226--236. Google Scholar
Digital Library
- M. Houston phet al., "A portable runtime interface for multi-level memory hierarchies," in Symposium on Principles and Practice of Parallel Programming, 2008, pp. 143--152. Google Scholar
Digital Library
- B. Alpern, L. Carter, and J. Ferrante, "Modeling parallel computers as memory hierarchies," in Programming Models for Massively Parallel Computers, 1993.Google Scholar
- M. Ren, J. Y. Park, M. Houston, A. Aiken, and W. Dally, "A tuning framework for software-managed memory hierarchies," in Int'l Conference on Parallel Architectures and Compilation Techniques, 2008, pp. 280--291. Google Scholar
Digital Library
- Y. Hamadi, S. Jabbour, and L. Sais, "ManySAT: a parallel SAT solver," vol. 6, pp. 245--262, 2008.Google Scholar
- N. Eén and N. Sörensson, "An extensible SAT-solver," in Theory and Applications of Satisfiability Testing, 2004, pp. 333--336.Google Scholar
- R. Vuduc, J. Demmel, and K. Yelick, "OSKI: A library of automatically tuned sparse matrix kernels," in Inst. of Physics Publishing, 2005.Google Scholar
- A. Buluç phet al., "Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks," in Symposium on Parallelism in Algorithms and Architectures, 2009, pp. 233--244. Google Scholar
Digital Library
- S. Lee and R. Eigenmann, "Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems," in Supercomputing, 2008, pp. 195--204. Google Scholar
Digital Library
- T. A. Davis, "University of florida sparse matrix collection," NA Digest, vol. 92, 1994.Google Scholar
- N. Leischner, V. Osipov, and P. Sanders, "GPU sample sort," CoRR, vol. abs/0909.5649, 2009.Google Scholar
- D. Culler phet al., "Parallel programming in Split-C," in Supercomputing, 1993, pp. 262--273. Google Scholar
Digital Library
- R. W. Numrich and J. Reid, "Co-array Fortran for parallel programming," SIGPLAN Fortran Forum, vol. 17, no. 2, pp. 1--31, 1998. Google Scholar
Digital Library
- W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks, and K. Warren, "Introduction to UPC and language specification," UC Berkeley Technical Report: CCS-TR-99-157, 1999.Google Scholar
- P. Charles phet al., "X10: An object-oriented approach to non-uniform cluster computing," in Conference on Object Oriented Programming Systems Languages and Applications, 2005, pp. 519--538. Google Scholar
Digital Library
- D. Callahan, B. L. Chamberlain, and H. P. Zima, "The Cascade high productivity language," in Int'l Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004, pp. 52--60.Google Scholar
Cross Ref
- E. Allen, D. Chase, V. Luchangco, J.-W. Maessen, S. Ryu, G. Steele, and S. Tobin-Hochstadt., "The Fortress language specification version 0.707. Technical report," Sun Microsystems, 2005.Google Scholar
- S. J. Deitz, B. L. Chamberlain, and L. Snyder, "Abstractions for dynamic data distribution," in Int'l Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004, pp. 42--51.Google Scholar
Cross Ref
- Y. Yan, J. Zhao, Y. Guo, and V. Sarkar, "Hierarchical place trees: A portable abstraction for task parallelism and data movement," in Workshop on Languages and Compilers for Parallel Computing, 2009. Google Scholar
Digital Library
- G. Bikshandi phet al., "Programming for parallelism and locality with hierarchically tiled arrays," in Symposium on Principles and Practice of Parallel Programming, 2006, pp. 48--57. Google Scholar
Digital Library
- P. Mattson, "A programming system for the Imagine Media Processor," Ph.D. dissertation, Stanford University, 2002. Google Scholar
Digital Library
- I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, "Brook for GPUs: Stream computing on graphics hardware," ACM Trans. Graph., vol. 23, no. 3, pp. 777--786, 2004. Google Scholar
Digital Library
- F. Labonte, P. Mattson, I. Buck, C. Kozyrakis, and M. Horowitz, "The stream virtual machine," in Int'l Conference on Parallel Architectures and Compilation Techniques, September 2004. Google Scholar
Digital Library
- R. Blumofe, C. Joerg, B. Kuszmaul, C. Leiserson, K. Randall, and Y. Zhou, "Cilk: An efficient multithreaded runtime system," in phSymposium on Principles and Practice of Parallel Programming, 1995. Google Scholar
Digital Library
- L. Dagum and R. Menon,"OpenMP: An industry-standard API for shared-memory programming," IEEE Comput. Sci. Eng., vol. 5, no. 1, pp. 46--55, 1998. Google Scholar
Digital Library
- B. Alpern, L. Carter, E. Feig, and T. Selker, "The uniform memory hierarchy model of computation," Algorithmica, vol. 12, no. 2/3, pp. 72--109, 1994.Google Scholar
Digital Library
- H. Jia-Wei and H. T. Kung, "I/O complexity: The red-blue pebble game," in Symposium on Theory of Computing, 1981, pp. 326--333. Google Scholar
Digital Library
- J. S. Vitter, "External memory algorithms," in Handbook of Massive Data Sets. Kluwer Academic Publishers, 2002, pp. 359--416. Google Scholar
Digital Library
- B. Alpern, L. Carter, and J. Ferrante, "Space-limited procedures: A methodology for portable high performance," in Int'l Working Conference on Massively Parallel Programming Models, 1995. Google Scholar
Digital Library
Index Terms
Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia
Recommendations
Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingWe describe two novel constructs for programming parallel machines with multi-level memory hierarchies: call-up, which allows a child task to invoke computation on its parent, and spawn, which spawns a dynamically determined number of parallel children ...
A portable runtime interface for multi-level memory hierarchies
PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programmingWe present a platform independent runtime interface for moving data and computation through parallel machines with multi-level memory hierarchies. We show that this interface can be used as a compiler target and can be implemented easily and efficiently ...
Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed-Memory Machines
Distributed-memory message-passing machines deliver scalable performance but are difficult to program. Shared-memory machines, on the other hand, are easier to program but obtaining scalable performance with large number of processors is difficult. ...







Comments