Abstract
Applications written for distributed-memory parallel architectures must partition their data to enable parallel execution. As memory hierarchies become deeper, it is increasingly necessary that the data partitioning also be hierarchical to match. Current language proposals perform this hierarchical partitioning statically, which excludes many important applications where the appropriate partitioning is itself data dependent and so must be computed dynamically. We describe Legion, a region-based programming system, where each region may be partitioned into subregions. Partitions are computed dynamically and are fully programmable. The division of data need not be disjoint and subregions of a region may overlap, or alias one another. Computations use regions with certain privileges (e.g., expressing that a computation uses a region read-only) and data coherence (e.g., expressing that the computation need only be atomic with respect to other operations on the region), which can be controlled on a per-region (or subregion) basis.
We present the novel aspects of the Legion design, in particular the combination of static and dynamic checks used to enforce soundness. We give an extended example illustrating how Legion can express computations with dynamically determined relationships between computations and data partitions. We prove the soundness of Legion's type system, and show Legion type checking improves performance by up to 71% by eliding provably safe memory checks. In particular, we show that the dynamic checks to detect aliasing at runtime at the region granularity have negligible overhead. We report results for three real-world applications running on distributed memory machines, achieving up to 62.5X speedup on 96 GPUs on the Keeneland supercomputer.
- M. Bauer, J. Clark, E. Schkufza, and A. Aiken. Programming the memory hierarchy revisited: Supporting irregular parallelism in Sequoia. In phProceedings of the Symposium on Principles and Practice of Parallel Programming, 2011. Google Scholar
Digital Library
- M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. Legion: Expressing locality and independence with logical regions. In phSupercomputing (SC), 2012. Google Scholar
Digital Library
- C. Bienia. phBenchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011. Google Scholar
Digital Library
- R. Bocchino et al. A type and effect system for deterministic parallel Java. In phOOPSLA, 2009. Google Scholar
Digital Library
- R. Bocchino et al. Safe nondeterminism in a deterministic-by-default parallel language. In phPOPL, 2011. Google Scholar
Digital Library
- C. Boyapati, B. Liskov, and L. Shrira. Ownership types for object encapsulation. In phPOPL, 2003. Google Scholar
Digital Library
- B.L. Chamberlain et al. Parallel programmability and the chapel language. phInt'l Journal of HPC Applications, 2007. Google Scholar
Digital Library
- P. Charles et al. X10: An object-oriented approach to non-uniform cluster computing. In phOOPSLA, 2005. Google Scholar
Digital Library
- D. Clarke and S. Drossopoulou. Ownership, encapsulation and the disjointness of type and effect. In phOOPSLA, 2002. Google Scholar
Digital Library
- K. Fatahalian et al. Sequoia: Programming the Memory Hierarchy. In phSupercomputing, November 2006. Google Scholar
Digital Library
- D. Gay and A. Aiken. Language support for regions. In phPLDI, 2001. Google Scholar
Digital Library
- D. Grossman et al. Region-based memory management in cyclone. In phPLDI, 2002. Google Scholar
Digital Library
- T. Harris, S. Marlow, S. Peyton-Jones, and M. Herlihy. Composable memory transactions. In phPPOPP, 2005. Google Scholar
Digital Library
- G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. phSIAM J. Sci. Comput., 1998. Google Scholar
Digital Library
- M. Lijewski, A. Nonaka, and J. Bell. Boxlib. https://ccse.lbl.gov/BoxLib/index.html, 2011.Google Scholar
- J. C. Reynolds. Separation logic: A logic for shared mutable data structures. In phIEEE Symposium on Logic in CS, 2002. Google Scholar
Digital Library
- M. C. Rinard and M. S. Lam. The design, implementation, and evaluation of Jade. phACM Trans. Program. Lang. Syst., 1998. Google Scholar
Digital Library
- M. Tofte and J.P. Talpin. Region-based memory management. In phPOPL, 1994.Google Scholar
- S. Treichler, M. Bauer, and A. Aiken. Language support for dynamic, hierarchical data partitioning: Extended version. http://theory.stanford.edu/ aiken/publications/papers/oopsla13a-extende%d.pdf, 2013. Technical Report.Google Scholar
- J.S. Vetter et al. Keeneland: Bringing heterogeneous gpu computing to the computational science community. phComputing in Science Engineering, pages 90 --95, 2011. Google Scholar
Digital Library
- Y. Yan, J. Zhao, Y. Guo, and V. Sarkar. Hierarchical place trees: A portable abstraction for task parallelism and data movement. In phWorkshop on Languages and Compilers for Parallel Computing, 2009. Google Scholar
Digital Library
- K. Yelick et al. Titanium: A high-performance Java dialect. In phWorkshop on Java for High-Performance Network Computing, 1998.Google Scholar
Index Terms
Language support for dynamic, hierarchical data partitioning
Recommendations
Language support for dynamic, hierarchical data partitioning
OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applicationsApplications written for distributed-memory parallel architectures must partition their data to enable parallel execution. As memory hierarchies become deeper, it is increasingly necessary that the data partitioning also be hierarchical to match. ...
Dependent partitioning
OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsA key problem in parallel programming is how data is partitioned: divided into subsets that can be operated on in parallel and, in distributed memory machines, spread across multiple address spaces.
We present a dependent partitioning framework that ...
Dependent partitioning
OOPSLA '16A key problem in parallel programming is how data is partitioned: divided into subsets that can be operated on in parallel and, in distributed memory machines, spread across multiple address spaces.
We present a dependent partitioning framework that ...







Comments