skip to main content
research-article

Online feedback-directed optimizations for parallel Java code

Published:29 October 2013Publication History
Skip Abstract Section

Abstract

The performance of parallel code significantly depends on the parallel task granularity (PTG). If the PTG is too coarse, performance suffers due to load imbalance; if the PTG is too fine, performance suffers from the overhead that is induced by parallel task creation and scheduling.

This paper presents a software platform that automatically determines the PTG at run-time. Automatic PTG selection is enabled by concurrent calls, which are special source language constructs that provide a late decision (at run-time) of whether concurrent calls are executed sequentially or concurrently (as a parallel task). Furthermore, the execution semantics of concurrent calls permits the runtime system to merge two (or more) concurrent calls thereby coarsening the PTG. We present an integration of concurrent calls into the Java programming language, the Java Memory Model, and show how the Java Virtual Machine can adapt the PTG based on dynamic profiling. The performance evaluation shows that our runtime system performs competitively to Java programs for which the PTG is tuned manually. Compared to an unfortunate choice of the PTG, this approach performs up to 3x faster than standard Java code.

References

  1. OpenMP Application Program Interface, 2012. http://openmp.org/wp/.Google ScholarGoogle Scholar
  2. B. Alpern, C. R. Attanasio, J. J. Barton, M. G. Burke, P. Cheng, J.-D. Choi, A. Cocchi, S. J. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, M. F. Mergen, T. Ngo, J. R. Russell, V. Sarkar, M. J. Serrano, J. C. Shepherd, S. E. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. The Jalape, no virtual machine. IBM Syst. J., 39(1), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. Adaptive optimization in the jalapeno JVM.Google ScholarGoogle Scholar
  4. R. Barik and V. Sarkar. Interprocedural load elimination for dynamic optimization of parallel programs. In PACT '09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN Not., 40(10), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. SIGPLAN Not., 33(5), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Gosling, B. Joy, G. Steele, and G. Bracha. Java(TM) Language Specification, The 3rd Edition. Addison-Wesley Professional, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Grunwald and H. Srinivasan. Data flow equations for explicitly parallel programs. SIGPLAN Not., 28, July 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. G. Joisha, R. S. Schreiber, P. Banerjee, H. J. Boehm, and D. R. Chakrabarti. A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code. In POPL '11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Kaminsky. Parallel Java: A unified API for shared memory and cluster parallel programming in 100% Java. In IPDPS '07.Google ScholarGoogle Scholar
  11. J. Knoop, B. Steffen, and J. Vollmer. Parallelism for free: efficient and optimal bitvector analyses for parallel programs. ACM Trans. Program. Lang. Syst., 18, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput., 28(9), 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Lee, S. P. Midkiff, and D. A. Padua. Concurrent static single assignment form and constant propagation for explicitly parallel programs. In LCPC '97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Lee, D. A. Padua, and S. P. Midkiff. Basic compiler algorithms for parallel programs. In PPoPP '99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. SIGPLAN Not., 44(10), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Manson, W. Pugh, and S. V. Adve. The Java memory model. In POPL '05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. D. Matsakis and T. R. Gross. A time-aware type system for data-race protection and guaranteed initialization. In OOPSLA '10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. S. Muchnick. Advanced compiler design and implementation. Morgan Kaufmann Publishers Inc., 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Rugina and M. C. Rinard. Pointer analysis for structured parallel programs. ACM Trans. Program. Lang. Syst., 25, January 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. Sarkar. Analysis and optimization of explicitly parallel programs using the parallel program graph representation. In LCPC '97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Satoh, K. Kusano, and M. Sato. Compiler optimization techniques for openmp programs. Sci. Program., 9, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. A. Smith, J. M. Bull, and J. Obdrzálek. A parallel Java grande benchmark suite. In Supercomputing '01. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Srinivasan, J. Hook, and M. Wolfe. Static single assignment for explicitly parallel programs. In POPL '93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Z. Sura, X. Fang, C.-L. Wong, S. P. Midkiff, J. Lee, and D. Padua. Compiler techniques for high performance sequentially consistent java programs. In PPoPP '05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Tian, M. Girkar, S. Shah, D. Armstrong, E. Su, and P. Petersen. Compiler and runtime support for running OpenMP programs on pentium-and itanium-architectures. In HIPS '03. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Whaley. Partial method compilation using dynamic profile information. SIGPLAN Not., 36, October 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Zhang, E. Duesterwald, and G. R. Gao. Concurrency analysis for shared memory programs with textually unaligned barriers. In LCPC'08.Google ScholarGoogle Scholar
  28. J. Zhao, J. Shirako, V. K. Nandivada, and V. Sarkar. Reducing task creation and termination overhead in explicitly parallel programs. In PACT '10. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Online feedback-directed optimizations for parallel Java code

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 48, Issue 10
      OOPSLA '13
      October 2013
      867 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2544173
      Issue’s Table of Contents
      • cover image ACM Conferences
        OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
        October 2013
        904 pages
        ISBN:9781450323741
        DOI:10.1145/2509136

      Copyright © 2013 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2013

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!