skip to main content
research-article

Effective parallelization of loops in the presence of I/O operations

Authors Info & Claims
Published:11 June 2012Publication History
Skip Abstract Section

Abstract

Software-based thread-level parallelization has been widely studied for exploiting data parallelism in purely computational loops to improve program performance on multiprocessors. However, none of the previous efforts deal with efficient parallelization of hybrid loops, i.e., loops that contain a mix of computation and I/O operations. In this paper, we propose a set of techniques for efficiently parallelizing hybrid loops. Our techniques apply DOALL parallelism to hybrid loops by breaking the cross-iteration dependences caused by I/O operations. We also support speculative execution of I/O operations to enable speculative parallelization of hybrid loops. Helper threading is used to reduce the I/O bus contention caused by the improved parallelism. We provide an easy-to-use programming model for exploiting parallelism in loops with I/O operations. Parallelizing hybrid loops using our model requires few modifications to the code. We have developed a prototype implementation of our programming model. We have evaluated our implementation on a 24-core machine using eight applications, including a widely-used genomic sequence assembler and a multi-player game server, and others from PARSEC and SPEC CPU2000 benchmark suites. The hybrid loops in these applications take 23%-99% of the total execution time on our 24-core machine. The parallelized applications achieve speedups of 3.0x-12.8x with hybrid loop parallelization over the sequential versions of the same applications. Compared to the versions of applications where only computation loops are parallelized, hybrid loop parallelization improves the application performance by 68% on average.

References

  1. DDBJ sequence read archive.\\ http://trace.ddbj.nig.ac.jp/dra/index_e.shtml.Google ScholarGoogle Scholar
  2. Space tyrant. http://spacetyrant.com/st.c.Google ScholarGoogle Scholar
  3. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 72--81, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Blundell, E. C. Lewis, and M. M. K. Martin. Unrestricted transactional memory: Supporting I/O and system calls within transactions. Technical Report TR-CIS-06-09, University of Pennsylvania, 2006.Google ScholarGoogle Scholar
  5. A. D. Brown, T. C. Mowry, and O. Krieger. Compiler-based I/O prefetching for out-of-core applications. ACM Transactions on Computer Systems, 19: 111--170, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications (OOPSLA), pages 519--538, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. U. Consortium. UPC language specifications, v1.2. Berkeley Lab Technical Report LBNL-59208, 2005.Google ScholarGoogle Scholar
  8. L. Dagum and R. Menon. Openmp: An industry-standard api for shared-memory programming. IEEE computational science & engineering, 5 (1): 46--55, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In Proceedings of the ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pages 223--234, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Feng, R. Gupta, and Y. Hu. SpiceC: scalable parallelism via implicit copying and explicit commit. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 69--80, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Gropp, E. Lusk, and A. Skjellum. Using MPI: Portable Parallel Programming with the Message Passing Interface. The MIT Press, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. L. Henning. SPEC CPU2000: Measuring cpu performance in the new millennium. Computer, 33: 28--35, July 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Kelsey, T. Bai, C. Ding, and C. Zhang. Fast track: A software system for speculative program optimization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pages 157--168, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In Proceedings of the ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pages 211--222, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic parallelism benefits from data partitioning. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 233--243, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Kulkarni, M. Burtscher, C. Cascaval, and K. Pingali. Lonestar: A suite of parallel irregular programs. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 65--76, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  17. C. D. Polychronopoulos and D. J. Kuck. Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers, 36: 1425--1439, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Quinlan. Rose: Compiler support for object-oriented framework. In Proceedings of the Workshop on Compilers for Parallel Computers (CPC), 2000.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 65--76, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Reinders. Intel threading building blocks. O'Reilly Media, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Scott, M. F. Spear, L. Dalessandro, and V. J. Marathe. Delaunay triangulation with transactions and barriers. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts. Wiley Publishing, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. W. Son, S. P. Muralidhara, O. Ozturk, M. Kandemir, I. Kolcu, and M. Karakoy. Profiler and compiler assisted adaptive I/O prefetching for shared storage caches. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 112--121, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Thakur, W. Gropp, and E. Lusk. An abstract-device interface for implementing portable parallel-I/O interfaces. In Proceedings of the Symposium on the Frontiers of Massively Parallel Computation (FRONTIERS), pages 180--187, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Thakur, W. Gropp, and E. Lusk. Data sieving and collective I/O in ROMIO. In Proceedings of the Symposium on the Frontiers of Massively Parallel Computation (FRONTIERS), pages 182--191, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Tian, M. Feng, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 330--341, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Tian, M. Feng, and R. Gupta. Supporting speculative parallelization in the presence of dynamic data structures. In Proceedings of the ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pages 62--73, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Tian, C. Lin, M. Feng, and R. Gupta. Enhanced speculative parallelization via incremental recovery. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 189--200, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August. Speculative decoupled software pipelining. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 49--59, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. R. Zerbino and E. Birney. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome Research, 18: 821--829, 2008.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Effective parallelization of loops in the presence of I/O operations

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 47, Issue 6
      PLDI '12
      June 2012
      534 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2345156
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2012
        572 pages
        ISBN:9781450312059
        DOI:10.1145/2254064

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2012

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!