Abstract
For decades, compilers have relied on dependence analysis to determine the legality of their transformations. While this conservative approach has enabled many robust optimizations, when it comes to parallelization there are many opportunities that can only be exploited by changing or re-ordering the dependences in the program.
This paper presents Alter: a system for identifying and enforcing parallelism that violates certain dependences while preserving overall program functionality. Based on programmer annotations, Alter exploits new parallelism in loops by reordering iterations or allowing stale reads. Alter can also infer which annotations are likely to benefit the program by using a test-driven framework.
Our evaluation of Alter demonstrates that it uncovers parallelism that is beyond the reach of existing static and dynamic tools. Across a selection of 12 performance-intensive loops, 9 of which have loop-carried dependences, Alter obtains an average speedup of 2.0x on 4 cores.
- The Parallel Dwarfs Project. http://paralleldwarfs.codeplex.com.Google Scholar
- F. Aleen and N. Clark. Commutativity Analysis for Software Parallelization: Letting Program Transformations See the Big Picture. In ASPLOS, 2009. Google Scholar
Digital Library
- P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger. STAPL: An Adaptive, Generic Parallel C++ Library. LNCS, 2624:195--210, 2003. Google Scholar
Digital Library
- K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick. A view of the parallel computing landscape. CACM, 52(10), 2009. Google Scholar
Digital Library
- E. Berger, K. McKinley, R. Blumofe, and P. Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In ASPLOS, 2000. Google Scholar
Digital Library
- E. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe Multithreaded Programming for C/C++. In OOPSLA, 2009. Google Scholar
Digital Library
- M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In MICRO, 2007. Google Scholar
Digital Library
- S. Burckhardt, A. Baldassion, and D. Leijen. Concurrent Programming with Revisions and Isolation Types. In OOPSLA, 2010. Google Scholar
Digital Library
- M. Burke and R. Cytron. Interprocedural dependence analysis and parallelization. SIGPLAN Notices, 2004. Google Scholar
Digital Library
- M. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University, June 1996. Google Scholar
Digital Library
- T. Chen, F. Min, V. Nagarajan, and R. Gupta. Copy or Discard Execution Model for Speculative Parallelization on Multicores. In MICRO, 2008. Google Scholar
Digital Library
- A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhauser Boston, 1st edition, 2000. Google Scholar
Digital Library
- K. Datta. Auto-tuning Stencil Codes for Cache-Based Multicore Platforms. PhD thesis, UC Berkeley, December 2009. Google Scholar
Digital Library
- R. Dias, J. Seco, and J. Lourenco. Snapshot isolation anomalies detection in software transactional memory. In INForum, 2010.Google Scholar
- C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software Behavior Oriented Parallelization. In PLDI, 2007. Google Scholar
Digital Library
- R. Guerraoui and M. Kapalka. On the Correctness of Transactional Memory. In PPoPP, 2008. Google Scholar
Digital Library
- M. Hall, J. Anderson, S. Amarasinghe, B. Murphy, S. Liao, E. Bugnion, and M. Lam. Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer, 29:84--89, 1996. Google Scholar
Digital Library
- M. Herlihy and J. E. B. Moss. Transactional Memory: Architectural Support for Lock-free Data Structures. In ISCA, 1993. Google Scholar
Digital Library
- W. Hwu, S. Ryoo, S. Ueng, J. Kelm, I. Gelado, S. Stone, R. Kidd, S. Baghsorkhi, A. Mahesri, S. Tsao, N. Navarro, S. Lumetta, M. Frank, and S. Patel. Implicitly parallel programming models for thousand-core microprocessors. In DAC, 2007. Google Scholar
Digital Library
- K. Kennedy and J. Allen. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., 2002. Google Scholar
Digital Library
- V. Krishnan and J. Torrellas. A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Trans. on Computers, 48(9), 1999. Google Scholar
Digital Library
- M. Kulkarni, M. Burtscher, C. Cascaval, and K. Pingali. Lonestar: A Suite of Parallel Irregular Programs. In ISPASS, 2009.Google Scholar
Cross Ref
- M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic Parallelism Benefits from Data Partitioning. In ASPLOS, 2008. Google Scholar
Digital Library
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic Parallelism Requires Abstractions. In PLDI, 2007. Google Scholar
Digital Library
- J. Larus and R. Rajwar. Transactional Memory (Synthesis Lectures on Computer Architecture). Morgan & Claypool Publishers, 2007. Google Scholar
Digital Library
- D. Maydan, S. Amarasinghe, and M. Lam. Array data-flow analysis and its use in array privatization. In POPL, 1993. Google Scholar
Digital Library
- M. Mehrara, J. Hao, P. Hsu, and S. Mahlke. Parallelizing Sequential applications on Commodity Hardware using a Low-cost Software Transactional Memory. In PLDI, 2009. Google Scholar
Digital Library
- C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In IISWC, 2008.Google Scholar
- S. Misailovic, D. Kim, and M. Rinard. Parallelizing Sequential Programs With Statistical Accuracy Tests. Technical Report MIT-CSAIL-TR-2010-038, MIT, Aug 2010.Google Scholar
- S. Moon and M. W. Hall. Evaluation of Predicated Array Data-flow Analysis for Automatic Parallelization. In PPoPP, 1999. Google Scholar
Digital Library
- W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C. Cambridge University Press, 2nd edition, 1992.Google Scholar
- W. Pugh and D. Wonnacott. Constraint-based array dependence analysis. ACM TOPLAS., 1998. Google Scholar
Digital Library
- E. Raman, N. Vachharajani, R. Rangan, and D. August. SPICE: Speculative Parallel Iteration Chunk Execution. In CGO, 2008. Google Scholar
Digital Library
- L. Rauchwerger and D. Padua. The Privatizing DOALL Test: A Run-Time Technique for DOALL Loop Identification and Array Privatization. In ICS, 1994. Google Scholar
Digital Library
- L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. In PLDI, 1995. Google Scholar
Digital Library
- T. Reps. Undecidability of context-sensitive data-independence analysis. ACM TOPLAS, 2000. Google Scholar
Digital Library
- T. Riegel, C. Fetzer, and P. Felber. Snapshot isolation for software transactional memory. In TRANSACT, 2006.Google Scholar
- M. Rinard and P. Diniz. Commutativity Analysis: A New Analysis Technique for Parallelizing Compilers. ACM TOPLAS, 19(6), 1997. Google Scholar
Digital Library
- G. Sohi, S. Breach, and T. N. Vijaykumar. Multiscalar Processors. In ISCA, 1995. Google Scholar
Digital Library
- R. Tarjan. A Unified Approach to Path Problems. J. ACM, 28(3), 1981. Google Scholar
Digital Library
- W. Thies, V. Chandrasekhar, and S. Amarasinghe. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs. In MICRO, 2007. Google Scholar
Digital Library
- C. Tian, M. Feng, and R. Gupta. Supporting Speculative Parallelization in the Presence of Dynamic Data Structures. In PLDI, 2010. Google Scholar
Digital Library
- G. Tournavitis, Z. Wang, B. Franke, and M. O'Boyle. Towards a Holistic Approach to Auto-parallelization: Integrating Profile-driven Parallelism Detection and Machine-learning Based Mapping. PLDI, 2009. Google Scholar
Digital Library
- N. Vachharajani, R. Rangan, E. Raman, M. Bridges, G. Ottoni, and D. August. Speculative Decoupled Software Pipelining. In PACT, 2007. Google Scholar
Digital Library
- H. Vandierendonck, S. Rul, and K. Bosschere. The Paralax infrastructure: automatic parallelization with a helping hand. In PACT, 2010. Google Scholar
Digital Library
Index Terms
ALTER: exploiting breakable dependences for parallelization
Recommendations
ALTER: exploiting breakable dependences for parallelization
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationFor decades, compilers have relied on dependence analysis to determine the legality of their transformations. While this conservative approach has enabled many robust optimizations, when it comes to parallelization there are many opportunities that can ...
Speculative separation for privatization and reductions
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and ImplementationAutomatic parallelization is a promising strategy to improve application performance in the multicore era. However, common programming practices such as the reuse of data structures introduce artificial constraints that obstruct automatic ...
Parallel-stage decoupled software pipelining
CGO '08: Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimizationIn recent years, the microprocessor industry has embraced chip multiprocessors (CMPs), also known as multi-core architectures, as the dominant design paradigm. For existing and new applications to make effective use of CMPs, it is desirable that ...







Comments