Abstract
Developing parallel software using current tools can be challenging. Even experts find it difficult to reason about the use of locks and often accidentally introduce race conditions and deadlocks into parallel software. OoOJava is a compiler-assisted approach that leverages developer annotations along with static analysis to provide an easy-to-use deterministic parallel programming model. OoOJava extends Java with a task annotation that instructs the compiler to consider a code block for out-of-order execution. OoOJava executes tasks as soon as their data dependences are resolved and guarantees that the execution of an annotated program preserves the exact semantics of the original sequential program. We have implemented OoOJava and achieved an average speedup of 16.6x on our ten benchmarks.
- M. D. Allen, S. Sridharan et al. Serialization sets: A dynamic dependence-based parallel execution model. In PPoPP, 2009. Google Scholar
Digital Library
- P. Bellens, J. Perez et al. CellSs: a programming model for the Cell BE architecture. SC, 2006. Google Scholar
Digital Library
- E. D. Berger, T. Yang et al. Grace: Safe multithreaded programming for C/C++. In OOPSLA, 2009. Google Scholar
Digital Library
- R. L. Bocchino, Jr., V. S. Adve et al. A type and effect system for deterministic parallel Java. In OOPSLA, 2009. Google Scholar
Digital Library
- S. Borkar. Thousand core chips: A technology perspective. In DAC, 2007. Google Scholar
Digital Library
- B. Cahoon and K. S. McKinley. Data flow analysis for software prefetching linked data structures in Java. In PACT, 2001. Google Scholar
Digital Library
- C. Cao Minh, J. Chung et al. STAMP: Stanford transactional applications for multi-processing. In IISWC, 2008.Google Scholar
- J.-D. Choi, M. Burke et al. Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects. In POPL, 1993. Google Scholar
Digital Library
- L. Dagum and R. Menon. OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng., 1998. Google Scholar
Digital Library
- K. Dai. Code parallelization for the LGDG large-grain dataflow computation. In CONPAR 90 - VAPP IV, 1990. Google Scholar
Digital Library
- J. S. Danaher, I.-T. A. Lee et al. The JCilk language for multithreaded computing. In SCOOL, 2005.Google Scholar
- A. Deutsch. Interprocedural may-alias analysis for pointers: Beyond k-limiting. In PLDI, 1994. Google Scholar
Digital Library
- C. Ding, X. Shen et al. Software behavior oriented parallelization. In PLDI, 2007. Google Scholar
Digital Library
- Y. Etsion, F. Cabarcas et al. Task superscalar: An out-of-order task pipeline. In MICRO, 2010. Google Scholar
Digital Library
- C. Huang and L. V. Kale. Charisma: Orchestrating migratable parallel objects. In HPDC, 2007. Google Scholar
Digital Library
- J. Huang, A. Raman et al. Decoupled software pipelining creates parallelization opportunities. In CGO, 2010. Google Scholar
Digital Library
- J. C. Jenista, Y. Eom et al. Disjoint reachability analysis. Tech. Rep. UCI-ISR-10--4, University of California, Irvine, 2010.Google Scholar
- J. C. Jenista, Y. Eom et al. OoOJava: An out-of-order approach to parallel programming. In HotPar, 2010. Google Scholar
Digital Library
- P. Jouvelout and D. Gifford. The FX-87 interpreter. In ICCL, 1988.Google Scholar
Cross Ref
- H.-W. Loidl, F. Rubio et al. Comparing parallel functional languages: Programming and performance. HOSC, 2003. Google Scholar
Digital Library
- M. Olszewski, J. Ansel et al. Kendo: Efficient deterministic multithreading in software. In ASPLOS, 2009. Google Scholar
Digital Library
- K. H. Randall. Cilk: Efficient Multithreaded Computing. Massachusetts Institute of Technology, 1998. Google Scholar
Digital Library
- L. Rauchwerger, N. M. Amato et al. Run-time methods for parallelizing partially parallel loops. In ICS, 1995. Google Scholar
Digital Library
- M. C. Rinard, D. J. Scales et al. Jade: A high-level, machineindependent language for parallel programming. Computer, 1993. Google Scholar
Digital Library
- M. Sagiv, T. Reps et al. Parametric shape analysis via 3-valued logic. TOPLAS, 2002. Google Scholar
Digital Library
- J. H. Saltz and R. Mirchandaney. Run-time parallelization and scheduling of loops. TC, 1991. Google Scholar
Digital Library
- L. A. Smith, J. M. Bull et al. A parallel Java Grande benchmark suite. In SC, 2001. Google Scholar
Digital Library
- J. Subhlok and B. Yang. A new model for integrated nested task and data parallel programming. In PPoPP, 1997. Google Scholar
Digital Library
- R. M. Tomasulo. An efficient algorithm for exploiting multiple arithmetic units. IBM J. of Res. Dev., 1967. Google Scholar
Digital Library
- S. K. Venkata, I. Ahn et al. SD-VBS: The San Diego Vision Benchmark Suite. In IISWC, 2009. Google Scholar
Digital Library
- C. von Praun, L. Ceze et al. Implicit parallelism with ordered transactions. In PPoPP, 2007. Google Scholar
Digital Library
- A. Welc, S. Jagannathan et al. Safe futures for Java. In OOPSLA, 2005. Google Scholar
Digital Library
- J. Zhou and B. Demsky. Bamboo: A data-centric, object-oriented approach to multi-core software. In PLDI, 2010. Google Scholar
Digital Library
Index Terms
OoOJava: software out-of-order execution
Recommendations
OoOJava: software out-of-order execution
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingDeveloping parallel software using current tools can be challenging. Even experts find it difficult to reason about the use of locks and often accidentally introduce race conditions and deadlocks into parallel software. OoOJava is a compiler-assisted ...
Advanced performance features of the 64-bit PA-8000
COMPCON '95: Proceedings of the 40th IEEE Computer Society International ConferenceThe PA-8000 is Hewlett-Packard's first CPU to implement the new 64-bit PA2.0 architecture. It combines a high clock frequency with a number of advanced microarchitectural features to deliver industry-leading performance on commercial and technical ...
Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures
To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance ...







Comments