skip to main content
10.1145/1297027.1297055acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
Article

Using early phase termination to eliminate load imbalances at barrier synchronization points

Published:21 October 2007Publication History

ABSTRACT

We present a new technique, early phase termination, for eliminating idle processors in parallel computations that use barrier synchronization. This technique simply terminates each parallel phaseas soon as there are too few remaining tasks to keep all of the processors busy.

Although this technique completely eliminates the idling that would other wise occur at barrier synchronization points, it may also change the computation and therefore the result that the computation produces. We address this issue by providing probabilistic distortion models that characterize how the use of early phase termination distorts the result that the computation produces. Our experimental results show that for our set of benchmark applications, 1) early phase termination can improve the performance of the parallel computation, 2) the distortion is small (or can be made to be small with the use of an appropriate compensation technique) and 3) the distortion models provide accurate and tight distortion bounds. These bounds can enable users to evaluate the effect of early phase termination and confidently accept results from parallel computations that use this technique if they find the distortion bounds to be acceptable.

Finally, we identify a general computational pattern that works well with early phase termination and explain why computations that exhibit this pattern can tolerate the early termination of parallel tasks without producing unacceptable results.

References

  1. C. Ananian and M. Rinard. Efficient object-based software transactions. In Proceedings of the Workshop on Synchronization and Concurrency in Object-Oriented Languages, San Diego, CA, Oct. 2005.Google ScholarGoogle Scholar
  2. C. S. Ananian. Architectural and Compiler Support for Strongly Atomic Transactional Memory. PhD thesis, Dept. of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Barnes and P. Hut. A hierarchical O(NlogN) force calculation algorithm. Nature, 324(4):446--449, Dec. 1986.Google ScholarGoogle ScholarCross RefCross Ref
  4. W. Blume and R. Eigenmann. Performance analysis of parallelizing compilers on the Perfect Benchmarks programs. IEEE Transactions on Parallel and Distributed Systems, 3(6):643--656, Nov. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Browning, T. Li, B. Chui, J. Ye, R. Pease, Z. Czyzewski, and D. Joy. Empirical forms for the electron/atom elastic scattering cross sections from 0.1-30keV. J. Appl. Phys., 76(4):2016--2022, Aug. 1994.Google ScholarGoogle ScholarCross RefCross Ref
  6. R. Browning, T. Li, B. Chui, J. Ye, R. Pease, Z. Czyzewski, and D. Joy. Low-energy electron/atom elastic scattering cross sections for 0.1-30keV. Scanning, 17(4):250--253, July/August 1995.Google ScholarGoogle ScholarCross RefCross Ref
  7. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation, San Francisco, CA, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Freund and R. Littell. SAS System for Regression. SAS Publishing, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Frommer and D. Szyld. On asynchronous iterations.Google ScholarGoogle Scholar
  10. J. Goodman. Chemical Applications of Molecular Modeling. Royal Society of Chemistry, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Gupta. The fuzzy barrier: A mechanism for high speed synchronization of processors. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, MA, Apr. 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Harris, S. Lazaratos, and R. Michelena. Tomographic string inversion. In Proceedings of the 60th Annual International Meeting, Society of Exploration and Geophysics, Extended Abstracts, pages 82--85, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  13. M. Herlihy and J. Moss. Transactional memory: architectural support for lock-free data structures. In Proceedings of the 20th International Symposium on Computer Architecture, San Diego, CA, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Kay and J. Kajiya. Ray tracing complex scenes. Computer Graphics (Proceedings of SIGGRAPH'86), 20(4):269--78, Aug. 1986.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Moler. Numerical Computing with Matlab. Society for Industrial and Applied Mathematics, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Nieh and M. Levoy. Volume rendering on scalable shared-memory MIMD architectures. Technical Report CSL-TR-92-537, Computer Systems Laboratory, Stanford Univ., Stanford, Calif., Aug. 1992.Google ScholarGoogle Scholar
  17. M. Rinard. The Design, Implementation and Evaluation of Jade, a Portable, Implicitly Parallel Programming Language. PhD thesis, Dept. of Computer Science, Stanford Univ., Stanford, Calif., 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Rinard. Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives. ACM Transactions on Computer Systems, 19(4), Nov. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Rinard. Exploring the acceptability envelope. In 2005 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications Companion (OOPSLA'05 Companion) Onwards! Session, Oct. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Rinard. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. In Proceedings of the 2006 ACM International Conference on Supercomputing, Cairns, Australia, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Scales and M. S. Lam. Transparent fault tolerance for parallel applications on networks of workstations. In Proceedings of the 1996 Usenix Technical Conference, Jan. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Using early phase termination to eliminate load imbalances at barrier synchronization points

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!