ABSTRACT
We present Regent, a high-productivity programming language for high performance computing with logical regions. Regent users compose programs with tasks (functions eligible for parallel execution) and logical regions (hierarchical collections of structured objects). Regent programs appear to execute sequentially, require no explicit synchronization, and are trivially deadlock-free. Regent's type system catches many common classes of mistakes and guarantees that a program with correct serial execution produces identical results on parallel and distributed machines.
We present an optimizing compiler for Regent that translates Regent programs into efficient implementations for Legion, an asynchronous task-based model. Regent employs several novel compiler optimizations to minimize the dynamic overhead of the runtime system and enable efficient operation. We evaluate Regent on three benchmark applications and demonstrate that Regent achieves performance comparable to hand-tuned Legion.
- High Performance Computing Center at Stanford University. http://hpcc.stanford.edu/.Google Scholar
- UPC language specification v1.2. upc.lbl.gov/docs/user/upc\_spec\_1.2.pdf, 2011.Google Scholar
- Mantevo project. https://mantevo.org/, Nov. 2014.Google Scholar
- The Open Community Runtime interface. https://xstackwiki.modelado.org/images/1/13/Ocr-v0.9-spec.pdf, 2014.Google Scholar
- T. G. Armstrong, J. M. Wozniak, M. Wilde, and I. T. Foster. Compiler techniques for massively scalable implicit task parallelism. In Supercomputing (SC), 2014. Google Scholar
Digital Library
- C. Augonnet et al. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 23:187--198, Feb. 2011. Google Scholar
Digital Library
- M. Bauer. Legion: Programming Distributed Heterogeneous Architectures with Logical Regions. PhD thesis, Stanford University, 2014.Google Scholar
- M. Bauer, J. Clark, E. Schkufza, and A. Aiken. Programming the memory hierarchy revisited: Supporting irregular parallelism in Sequoia. In Proceedings of the Symposium on Principles and Practice of Parallel Programming, 2011. Google Scholar
Digital Library
- M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. Legion: Expressing locality and independence with logical regions. In Supercomputing (SC), 2012. Google Scholar
Digital Library
- M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. Structure slicing: Extending logical regions with fields. In Supercomputing (SC), 2014. Google Scholar
Digital Library
- R. Bocchino et al. A type and effect system for deterministic parallel Java. In OOPSLA, 2009. Google Scholar
Digital Library
- R. L. Bocchino, Jr., S. Heumann, N. Honarmand, S. V. Adve, V. S. Adve, A. Welc, and T. Shpeisman. Safe nondeterminism in a deterministic-by-default parallel language. POPL, 2011. Google Scholar
Digital Library
- D. E. Burton. Consistent finite-volume discretization of hydrodynamics conservation laws for unstructured grids. Technical Report UCRL-JC-118788, Lawrence Livermore National Laboratory, Livermore, CA, 1994.Google Scholar
- W. Carlson, J. Draper, D. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and language specification. UC Berkeley Technical Report: CCS-TR-99-157, 1999.Google Scholar
- B. Chamberlain, S. Choi, S. Deitz, D. Iten, and V. Litvinov. Authoring user-defined domain maps in Chapel. 2011.Google Scholar
- B. Chamberlain et al. Parallel programmability and the Chapel language. Int'l Journal of HPC Apps., 2007. Google Scholar
Digital Library
- S. Chandra et al. Type inference for locality analysis of distributed data structures. In PPoPP, pages 11--22, 2008. Google Scholar
Digital Library
- P. Charles et al. X10: An object-oriented approach to non-uniform cluster computing. In OOPSLA, 2005. Google Scholar
Digital Library
- J. Davison de St. Germain, J. McCorquodale, S. Parker, and C. Johnson. Uintah: a massively parallel problem solving environment. In High-Performance Distributed Computing, 2000. Proceedings. The Ninth International Symposium on, pages 33--41, 2000. Google Scholar
Digital Library
- Z. DeVito, J. Hegarty, A. Aiken, P. Hanrahan, and J. Vitek. Terra: a multi-stage language for high-performance computing. PLDI, 2013. Google Scholar
Digital Library
- H. Edwards and C. Trott. Kokkos: Enabling performance portability across manycore architectures. In Extreme Scaling Workshop (XSW), 2013, pages 18--24, Aug 2013. Google Scholar
Digital Library
- K. Fatahalian et al. Sequoia: Programming the memory hierarchy. In Supercomputing, November 2006. Google Scholar
Digital Library
- C. R. Ferenbaugh. PENNANT: an unstructured mesh mini-app for advanced architecture research. Concurrency and Computation: Practice and Experience, 2014.Google Scholar
- R. Ierusalimschy, L. H. De Figueiredo, and W. Celes Filho. Lua - an extensible extension language. Softw., Pract. Exper., 1996. Google Scholar
Digital Library
- M. Joyner, Z. Budimlic, and V. Sarkar. Subregion analysis and bounds check elimination for high level arrays. In Compiler Construction, pages 246--265, 2011. Google Scholar
Digital Library
- H. Kaiser, T. Heller, B. Adelstein-Lelbach, A. Serio, and D. Fey. HPX: A task based programming model in a global address space. In Partitioned Global Address Space Programming Models, 2014. Google Scholar
Digital Library
- L. Kalé and S. Krishnan. CHARM++: A portable concurrent object oriented system based on C++. In Proceedings of OOPSLA'93, pages 91--108, 1993. Google Scholar
Digital Library
- C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar 2004. Google Scholar
Digital Library
- M. Ren, J. Y. Park, M. Houston, A. Aiken, and W. Dally. A tuning framework for software-managed memory hierarchies. In Int'l Conference on Parallel Architectures and Compilation Techniques, pages 280--291, 2008. Google Scholar
Digital Library
- M. C. Rinard and M. S. Lam. The design, implementation, and evaluation of Jade. ACM Trans. Program. Lang. Syst., 1998. Google Scholar
Digital Library
- M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra. MPI-The Complete Reference. MIT Press, 1998. Google Scholar
Digital Library
- W. Taha and T. Sheard. MetaML and multi-stage programming with explicit annotations. Theoretical Computer Science, 2000. Google Scholar
Digital Library
- S. Treichler, M. Bauer, and A. Aiken. Language support for dynamic, hierarchical data partitioning. In Object Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2013. Google Scholar
Digital Library
- J. M. Wozniak, T. G. Armstrong, M. Wilde, D. S. Katz, E. Lusk, and I. T. Foster. Swift/T: Large-scale application composition via distributed-memory dataflow processing. In Cluster, Cloud and Grid Computing (CCGrid), 2013.Google Scholar
Digital Library
- K. Yelick et al. Titanium: A high-performance Java dialect. In Workshop on Java for High-Performance Network Computing, 1998.Google Scholar
Cross Ref
Index Terms
Regent: a high-productivity programming language for HPC with logical regions
Recommendations
Control replication: compiling implicit parallelism to efficient SPMD with logical regions
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisWe present control replication, a technique for generating high-performance and scalable SPMD code from implicitly parallel programs. In contrast to traditional parallel programming models that require the programmer to explicitly manage threads and the ...
Dependent partitioning
OOPSLA '16A key problem in parallel programming is how data is partitioned: divided into subsets that can be operated on in parallel and, in distributed memory machines, spread across multiple address spaces.
We present a dependent partitioning framework that ...
A domain-specific approach to heterogeneous parallelism
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingExploiting heterogeneous parallel hardware currently requires mapping application code to multiple disparate programming models. Unfortunately, general-purpose programming models available today can yield high performance but are too low-level to be ...




Comments