Abstract
Software-based thread-level parallelization has been widely studied for exploiting data parallelism in purely computational loops to improve program performance on multiprocessors. However, none of the previous efforts deal with efficient parallelization of hybrid loops, i.e., loops that contain a mix of computation and I/O operations. In this paper, we propose a set of techniques for efficiently parallelizing hybrid loops. Our techniques apply DOALL parallelism to hybrid loops by breaking the cross-iteration dependences caused by I/O operations. We also support speculative execution of I/O operations to enable speculative parallelization of hybrid loops. Helper threading is used to reduce the I/O bus contention caused by the improved parallelism. We provide an easy-to-use programming model for exploiting parallelism in loops with I/O operations. Parallelizing hybrid loops using our model requires few modifications to the code. We have developed a prototype implementation of our programming model. We have evaluated our implementation on a 24-core machine using eight applications, including a widely-used genomic sequence assembler and a multi-player game server, and others from PARSEC and SPEC CPU2000 benchmark suites. The hybrid loops in these applications take 23%-99% of the total execution time on our 24-core machine. The parallelized applications achieve speedups of 3.0x-12.8x with hybrid loop parallelization over the sequential versions of the same applications. Compared to the versions of applications where only computation loops are parallelized, hybrid loop parallelization improves the application performance by 68% on average.
- DDBJ sequence read archive.\\ http://trace.ddbj.nig.ac.jp/dra/index_e.shtml.Google Scholar
- Space tyrant. http://spacetyrant.com/st.c.Google Scholar
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 72--81, 2008. Google Scholar
Digital Library
- C. Blundell, E. C. Lewis, and M. M. K. Martin. Unrestricted transactional memory: Supporting I/O and system calls within transactions. Technical Report TR-CIS-06-09, University of Pennsylvania, 2006.Google Scholar
- A. D. Brown, T. C. Mowry, and O. Krieger. Compiler-based I/O prefetching for out-of-core applications. ACM Transactions on Computer Systems, 19: 111--170, May 2001. Google Scholar
Digital Library
- P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications (OOPSLA), pages 519--538, 2005. Google Scholar
Digital Library
- U. Consortium. UPC language specifications, v1.2. Berkeley Lab Technical Report LBNL-59208, 2005.Google Scholar
- L. Dagum and R. Menon. Openmp: An industry-standard api for shared-memory programming. IEEE computational science & engineering, 5 (1): 46--55, 1998. Google Scholar
Digital Library
- C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In Proceedings of the ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pages 223--234, 2007. Google Scholar
Digital Library
- M. Feng, R. Gupta, and Y. Hu. SpiceC: scalable parallelism via implicit copying and explicit commit. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 69--80, 2011. Google Scholar
Digital Library
- W. Gropp, E. Lusk, and A. Skjellum. Using MPI: Portable Parallel Programming with the Message Passing Interface. The MIT Press, 1994. Google Scholar
Digital Library
- J. L. Henning. SPEC CPU2000: Measuring cpu performance in the new millennium. Computer, 33: 28--35, July 2000. Google Scholar
Digital Library
- K. Kelsey, T. Bai, C. Ding, and C. Zhang. Fast track: A software system for speculative program optimization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pages 157--168, 2009. Google Scholar
Digital Library
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In Proceedings of the ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pages 211--222, 2007. Google Scholar
Digital Library
- M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic parallelism benefits from data partitioning. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 233--243, 2008. Google Scholar
Digital Library
- M. Kulkarni, M. Burtscher, C. Cascaval, and K. Pingali. Lonestar: A suite of parallel irregular programs. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 65--76, 2009.Google Scholar
Cross Ref
- C. D. Polychronopoulos and D. J. Kuck. Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers, 36: 1425--1439, 1987. Google Scholar
Digital Library
- D. Quinlan. Rose: Compiler support for object-oriented framework. In Proceedings of the Workshop on Compilers for Parallel Computers (CPC), 2000.Google Scholar
Cross Ref
- A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 65--76, 2010. Google Scholar
Digital Library
- J. Reinders. Intel threading building blocks. O'Reilly Media, 2007. Google Scholar
Digital Library
- M. Scott, M. F. Spear, L. Dalessandro, and V. J. Marathe. Delaunay triangulation with transactions and barriers. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), 2007. Google Scholar
Digital Library
- A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts. Wiley Publishing, 2008. Google Scholar
Digital Library
- S. W. Son, S. P. Muralidhara, O. Ozturk, M. Kandemir, I. Kolcu, and M. Karakoy. Profiler and compiler assisted adaptive I/O prefetching for shared storage caches. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 112--121, 2008. Google Scholar
Digital Library
- R. Thakur, W. Gropp, and E. Lusk. An abstract-device interface for implementing portable parallel-I/O interfaces. In Proceedings of the Symposium on the Frontiers of Massively Parallel Computation (FRONTIERS), pages 180--187, 1996. Google Scholar
Digital Library
- R. Thakur, W. Gropp, and E. Lusk. Data sieving and collective I/O in ROMIO. In Proceedings of the Symposium on the Frontiers of Massively Parallel Computation (FRONTIERS), pages 182--191, 1999. Google Scholar
Digital Library
- C. Tian, M. Feng, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 330--341, 2008. Google Scholar
Digital Library
- C. Tian, M. Feng, and R. Gupta. Supporting speculative parallelization in the presence of dynamic data structures. In Proceedings of the ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pages 62--73, 2010. Google Scholar
Digital Library
- C. Tian, C. Lin, M. Feng, and R. Gupta. Enhanced speculative parallelization via incremental recovery. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 189--200, 2011. Google Scholar
Digital Library
- N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August. Speculative decoupled software pipelining. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 49--59, 2007. Google Scholar
Digital Library
- D. R. Zerbino and E. Birney. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome Research, 18: 821--829, 2008.Google Scholar
Cross Ref
Index Terms
Effective parallelization of loops in the presence of I/O operations
Recommendations
Effective parallelization of loops in the presence of I/O operations
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and ImplementationSoftware-based thread-level parallelization has been widely studied for exploiting data parallelism in purely computational loops to improve program performance on multiprocessors. However, none of the previous efforts deal with efficient ...
A cost-driven compilation framework for speculative parallelization of sequential programs
PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementationThe emerging hardware support for thread-level speculation opens new opportunities to parallelize sequential programs beyond the traditional limits. By speculating that many data dependences are unlikely during runtime, consecutive iterations of a ...
A cost-driven compilation framework for speculative parallelization of sequential programs
PLDI '04The emerging hardware support for thread-level speculation opens new opportunities to parallelize sequential programs beyond the traditional limits. By speculating that many data dependences are unlikely during runtime, consecutive iterations of a ...







Comments