Abstract
A key problem in parallel programming is how data is partitioned: divided into subsets that can be operated on in parallel and, in distributed memory machines, spread across multiple address spaces.
We present a dependent partitioning framework that allows an application to concisely describe relationships between partitions. Applications first establish independent partitions, which may contain arbitrary subsets of application data, permitting the expression of arbitrary application-specific data distributions. Dependent partitions are then derived from these using the dependent partitioning operations provided by the framework. By directly capturing inter-partition relationships, our framework can soundly and precisely reason about programs to perform important program analyses crucial to ensuring correctness and achieving good performance. As an example of the reasoning made possible, we present a static analysis that discharges most consistency checks on partitioned data during compilation.
We describe an implementation of our framework within Regent, a language designed for the Legion programming model. The use of dependent partitioning constructs results in a 86-96% decrease in the lines of code required to describe the partitioning, eliminates many of the expensive dynamic checks required for soundness by the current Regent partitioning implementation, and speeds up the computation of partitions by 2.6-12.7X even on a single thread. Additionally, we show that a distributed implementation incorporated into the the Legion runtime system allows partitioning of data sets that are too large to fit on a single node and yields a further 29X speedup of partitioning operations on 64 nodes.
- High performance fortran language specification, version 2.0. http://hpff.rice.edu/versions/hpf2/ hpf-v20/index.html, 1997.Google Scholar
- Mantevo project. https://mantevo.org/, Nov. 2014.Google Scholar
- S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith. Efficient management of parallelism in object oriented numerical software libraries. In E. Arge, A. M. Bruaset, and H. P. Langtangen, editors, Modern Software Tools in Scientific Computing, pages 163–202. Birkhäuser Press, 1997. Google Scholar
Digital Library
- C. Barrett, C. Conway, M. Deters, L. Hadarean, D. Jovanovi´c, T. King, A. Reynolds, and C. Tinelli. CVC4. In Computer Aided Verification, 2011. Google Scholar
Digital Library
- M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. Legion: Expressing locality and independence with logical regions. In Supercomputing (SC), 2012. Google Scholar
Digital Library
- E. Börger, E. Grädel, and Y. Gurevich. The classical decision problem. Perspectives in mathematical logic. 1997.Google Scholar
Cross Ref
- W. Carlson, J. Draper, D. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and language specification. UC Berkeley Technical Report: CCS-TR-99-157, 1999.Google Scholar
- B. Chamberlain, D. Callahan, and H. Zima. Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl., 2007. Google Scholar
Digital Library
- B. Chapman, P. Mehrotra, and H. Zima. Vienna Fortran-a Fortran language extension for distributed memory multiprocessors. Technical report, DTIC Document, 1991.Google Scholar
- P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an Object-Oriented Approach to Non-uniform Cluster Computing. In Proceeings SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 519–538, 2005. Google Scholar
Digital Library
- C. Chevalier and F. Pellegrini. Pt-scotch: A tool for efficient parallel graph ordering. Parallel computing, 34(6):318–331, 2008. Google Scholar
Digital Library
- D. C. Cooper. Theorem proving in arithmetic without multiplication. Machine Intelligence 7, pages 91–99, 1972.Google Scholar
- W. Dawes, S. Harvey, S. Fellows, N. Eccles, D. Jaeggi, and W. Kellar. A practical demonstration of scalable, parallel mesh generation. In 47th AIAA Aerospace Sciences Meeting & Exhibit, pages 5–8, 2009.Google Scholar
Cross Ref
- L. De Moura and N. Bjørner. Z3: An efficient smt solver. In Tools and Algorithms for the Construction and Analysis of Systems. 2008. Google Scholar
Digital Library
- S. J. Deitz, B. L. Chamberlain, and L. Snyder. Abstractions for dynamic data distribution. In Int’l Workshop on High-Level Parallel Programming Models, 2004.Google Scholar
Cross Ref
- Z. DeVito, N. Joubert, F. Palacios, S. Oakley, M. Medina, M. Barrientos, E. Elsen, F. Ham, A. Aiken, K. Duraisamy, et al. Liszt: a domain specific language for building portable mesh-based PDE solvers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, page 9. ACM, 2011. Google Scholar
Digital Library
- P. J. Downey. Undecidability of presburger arithmetic with a single monadic predicate letter. Technical Report TR-18- 72, Harvard University (Cambridge, MA US), 1972.Google Scholar
- C. R. Ferenbaugh. PENNANT: an unstructured mesh miniapp for advanced architecture research. Concurrency and Computation: Practice and Experience, 2014.Google Scholar
- J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), pages 17–30, 2012. Google Scholar
Digital Library
- Y. Gurevich. The decision problem for standard classes. J. Symb. Log., 41(2):460–464, 1976.Google Scholar
Cross Ref
- T. Henretty, R. Veras, F. Franchetti, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan. A stencil compiler for short-vector SIMD architectures. In Proceedings of the 27th international ACM conference on International conference on supercomputing, pages 13–24. ACM, 2013. Google Scholar
Digital Library
- G. Karypis and V. Kumar. Parallel multilevel series k-way partitioning scheme for irregular graphs. Siam Review, 1999. Google Scholar
Digital Library
- M. Lange, M. G. Knepley, and G. J. Gorman. Flexible, scalable mesh and data management using petsc dmplex. In Proceedings of the 3rd International Conference on Exascale Applications and Software, pages 71–76. University of Edinburgh, 2015. Google Scholar
Digital Library
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 135– 146. ACM, 2010. Google Scholar
Digital Library
- P. McCormick, C. Sweeney, N. Moss, D. Prichard, S. K. Gutierrez, K. Davis, and J. Mohd-Yusof. Exploring the construction of a domain-aware toolchain for high-performance computing. In Proceedings of the Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, pages 1–10. IEEE Press, 2014. Google Scholar
Digital Library
- G. Mudalige, M. Giles, I. Reguly, C. Bertolli, and P. J. Kelly. Op2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures. In Innovative Parallel Computing (InPar), 2012, pages 1–12. IEEE, 2012.Google Scholar
Cross Ref
- H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms:{extended abstract}. In Proceedings of the 31st symposium on Principles of Database Systems, pages 37–48. ACM, 2012. Google Scholar
Digital Library
- S. Salihoglu and J. Widom. Gps: A graph processing system. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management, page 22. ACM, 2013. Google Scholar
Digital Library
- D. Scott. A decision method for validity of sentences in two variables. Journal of Symbolic Logic, 27:377, 1962.Google Scholar
- R. E. Shostak. A practical decision procedure for arithmetic with function symbols. J. ACM, 26(2):351–360, 1979. Google Scholar
Digital Library
- E. Slaughter, W. Lee, S. Treichler, M. Bauer, and A. Aiken. Regent: A high-productivity programming language for HPC with logical regions. In Supercomputing (SC), 2015. Google Scholar
Digital Library
- M. M. Strout, G. Georg, and C. Olschanowsky. Set and relation manipulation for the sparse polyhedral framework. In International Workshop on Languages and Compilers for Parallel Computing, pages 61–75. Springer, 2012.Google Scholar
- K. Wu, E. J. Otoo, and A. Shoshani. Optimizing bitmap indices with efficient compression. ACM Transactions on Database Systems (TODS), 31(1):1–38, 2006. Google Scholar
Digital Library
Index Terms
Dependent partitioning
Recommendations
Dependent partitioning
OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsA key problem in parallel programming is how data is partitioned: divided into subsets that can be operated on in parallel and, in distributed memory machines, spread across multiple address spaces.
We present a dependent partitioning framework that ...
Regent: a high-productivity programming language for HPC with logical regions
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisWe present Regent, a high-productivity programming language for high performance computing with logical regions. Regent users compose programs with tasks (functions eligible for parallel execution) and logical regions (hierarchical collections of ...
Control replication: compiling implicit parallelism to efficient SPMD with logical regions
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisWe present control replication, a technique for generating high-performance and scalable SPMD code from implicitly parallel programs. In contrast to traditional parallel programming models that require the programmer to explicitly manage threads and the ...







Comments