skip to main content
research-article

General data structure expansion for multi-threading

Published:16 June 2013Publication History
Skip Abstract Section

Abstract

Among techniques for parallelizing sequential codes, privatization is a common and significant transformation performed by both compilers and runtime parallelizing systems. Without privatization, repetitive updates to the same data structures often introduce spurious data dependencies that hide the inherent parallelism. Unfortunately, it remains a significant challenge to compilers to automatically privatize dynamic and recursive data structures which appear frequently in real applications written in languages such as C/C++. This is because such languages lack a naming mechanism to define the address range of a pointer-based data structure, in contrast to arrays with explicitly declared bounds. In this paper we present a novel solution to this difficult problem by expanding general data structures such that memory accesses issued from different threads to contentious data structures are directed to different data fields. Based on compile-time type checking and a data dependence graph, this aggressive extension to the traditional scalar and array expansion isolates the address ranges among different threads, without struggling with privatization based on thread-private stacks, such that the targeted loop can be effectively parallelized. With this method fully implemented in GCC, experiments are conducted on a set of programs from well-known benchmark suites such as Mibench, MediaBench II and SPECint. Results show that the new approach can lead to a high speedup when executing the transformed code on multiple cores.

References

  1. http://http://www.spec.org/cpu/.Google ScholarGoogle Scholar
  2. http://gcc.gnu.org/projects/gomp/.Google ScholarGoogle Scholar
  3. http://software.intel.com/en-us/intel-compilers/.Google ScholarGoogle Scholar
  4. M. G. Burke, R. Cytron, J. Ferrante, and W. C. Hsieh. Automatic generation of nested, fork-join parallelism. The Journal of Supercomputing, pages 71--88, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  5. R. Cytron and J. Ferrante. What's in a name? -or- the value of renaming for parallelism detection and storage allocation. In ICPP'87, pages 19--27, 1987.Google ScholarGoogle Scholar
  6. L. Dagum and R. Menon. OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng., 5(1):46--55, Jan. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Dang, H. Yu, and L. Rauchwerger. The R-LRPD test: Speculative parallelization of partially parallel loops. In Proceedings of the 16th International Symposium on Parallel and Distributed Processing, IPDPS'02, pages 20--, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. de Melo. The new linuxperftools. In Slides from Linux Kongress, 2010.Google ScholarGoogle Scholar
  9. C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'07, pages 223--234, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Eigenmann, J. Hoeflinger, Z. Li, and D. A. Padua. Experience in the automatic parallelization of four perfect-benchmark programs. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing, LCPC'92, pages 65--83, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Feautrier. Array expansion. In Proceedings of the 2nd International Conference on Supercomputing, ICS'88, pages 429--441, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Feng, R. Gupta, and Y. Hu. SpiceC: scalable parallelism via implicit copying and explicit commit. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP'11, pages 69--80, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Feng, R. Gupta, and I. Neamtiu. Effective parallelization of loops in the presence of I/O operations. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'12, pages 487--498, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. E. Fritts, F. W. Steiling, J. A. Tucek, and W. Wolf. Mediabench II video: Expediting the next generation of video systems research. Microprocess. Microsyst., 33(4):301--318, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Gu, Z. Li, and G. Lee. Experience with efficient array data flow analysis for array privatization. In Proceedings of the 6th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP'97, pages 157--167, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Gupta. On privatization of variables for data-parallel execution. In Proceedings of the 11th International Symposium on Parallel Processing, IPPS'97, pages 533--541, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, WWC'01, pages 3--14, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. H. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. In Proceedings of the 1995 ACM/IEEE Conference on Supercomputing (CDROM), Supercomputing'95, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. N. P. Johnson, H. Kim, P. Prabhu, A. Zaks, and D. I. August. Speculative separation for privatization and reductions. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'12, pages 359--370, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Kelsey, T. Bai, C. Ding, and C. Zhang. Fast track: A software system for speculative program optimization. In Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO'09, pages 157--168, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Kim, H. Kim, and C.-K. Luk. SD3: A scalable approach to dynamic data-dependence profiling. In phProceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO'43, pages 535--546, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Z. Li. Array privatization for parallel execution of loops. In Proceedings of the 6th International Conference on Supercomputing, ICS'92, pages 313--322, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. E. Maydan, S. P. Amarasinghe, and M. S. Lam. Array-data flow analysis and its use in array privatization. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL'93, pages 2--15, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'09, pages 166--176, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. A. Padua and M. J. Wolfe. Advanced compiler optimizations for supercomputers. Commun. ACM, 29(12):1184--1201, Dec. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Prabhu, S. Ghosh, Y. Zhang, N. P. Johnson, and D. I. August. Commutative set: a language extension for implicit parallel programming. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'11, pages 1--11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Rauchwerger, N. M. Amato, and D. A. Padua. A scalable method for run-time loop parallelization. Int. J. Parallel Program., 23(6):537--576, Dec. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Rauchwerger and D. Padua. The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization. In Proceedings of the 8th International Conference on Supercomputing, ICS'94, pages 33--43, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Rauchwerger and D. Padua. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the 1995 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'95, pages 218--232, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. In Proceedings of the 16th International Conference on Supercomputing, ICS'02, pages 274--284, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. W. Thies, V. Chandrasekhar, and S. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In phProceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO'40, pages 356--369, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Tian, M. Feng, and R. Gupta. Speculative parallelization using state separation and multiple value prediction. In Proceedings of the 2010 International Symposium on Memory Management, ISMM'10, pages 63--72, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Tian, M. Feng, and R. Gupta. Supporting speculative parallelization in the presence of dynamic data structures. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'10, pages 62--73, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO'41, pages 330--341, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. Tournavitis, Z. Wang, B. Franke, and M. F. O'Boyle. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'09, pages 177--187, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Tu and D. A. Padua. Automatic array privatization. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, LCPC'94, pages 500--521, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. H. Vandierendonck, S. Rul, and K. De Bosschere. The Paralax infrastructure: automatic parallelization with a helping hand. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT'10, pages 389--400, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. H. Yu and Z. Li. Fast loop-level data dependence profiling. In Proceedings of the 26th ACM International Conference on Supercomputing, ICS'12, pages 37--46, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. H. Yu and Z. Li. Multi-slicing: a compiler-supported parallel approach to data dependence profiling. In Proceedings of the 2012 International Symposium on Software Testing and Analysis, ISSTA'12, pages 23--33, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. X. Zhang, A. Navabi, and S. Jagannathan. Alchemist: A transparent dependence distance profiling infrastructure. In Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO'09, pages 47--58, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. General data structure expansion for multi-threading

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 48, Issue 6
      PLDI '13
      June 2013
      515 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2499370
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2013
        546 pages
        ISBN:9781450320146
        DOI:10.1145/2491956

      Copyright © 2013 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 June 2013

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!