skip to main content
research-article

Automatic parallelization via matrix multiplication

Published:04 June 2011Publication History
Skip Abstract Section

Abstract

Existing work that deals with parallelization of complicated reductions and scans focuses only on formalism and hardly dealt with implementation. To bridge the gap between formalism and implementation, we have integrated parallelization via matrix multiplication into compiler construction. Our framework can deal with complicated loops that existing techniques in compilers cannot parallelize. Moreover, we have sophisticated our framework by developing two sets of techniques. One enhances its capability for parallelization by extracting max-operators automatically, and the other improves the performance of parallelized programs by eliminating redundancy. We have also implemented our framework and techniques as a parallelizer in a compiler. Experiments on examples that existing compilers cannot parallelize have demonstrated the scalability of programs parallelized by our implementation.

References

  1. A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison Wesley, second edition, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. J. C. Bik, M. Girkar, P. M. Grey, and X. Tian. Automatic Intra-Register Vectorization for the Intel® Architecture. Int. J. Parallel Program., 30 (2): 65--98, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. S. Bird. An Introduction to the Theory of Lists. In Logic of Programming and Calculi of Discrete Design, volume 36 of NATO ASI Series F, pages 3--42. Springer-Verlag, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Callahan, S. Carr, and K. Kennedy. Improving Register Allocation for Subscripted Variables. In Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation (PLDI '90), pages 177--187. ACM, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W.-N. Chin, A. Takano, and Z. Hu. Parallelization via Context Preservation. In Proceedings of IEEE International Conference on Computer Languages (ICCL '98), pages 153--162. IEEE CS Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Emoto, K. Matsuzaki, Z. Hu, and M. Takeichi. Domain-Specific Optimization Strategy for Skeleton Programs. In Euro-Par 2007 Parallel Processing, volume 4641 of Lecture Notes in Computer Science, pages 705--714. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. L. Fisher and A. M. Ghuloum. Parallelizing Complex Scans and Reductions. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI '94), pages 135--146. ACM, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Gander and G. H. Golub. Cyclic Reduction -- History and Applications. In Proceedings of the Workshop on Scientific Computing, 1997.Google ScholarGoogle Scholar
  10. P. M. Kogge and H. S. Stone. A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations. IEEE Trans. Comput., 22 (8): 786--793, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Matsuzaki. Parallel Programming with Tree Skeletons. PhD thesis, Graduate School of Information Science and Technology, The University of Tokyo, 2007.Google ScholarGoogle Scholar
  12. K. Matsuzaki and K. Emoto. Implementing Fusion-Equipped Parallel Skeletons by Expression Templates. In Implementation and Application of Functional Languages (IFL '09), volume 6041 of Lecture Notes in Computer Science, pages 72--89. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Matsuzaki, Z. Hu, and M. Takeichi. Towards Automatic Parallelization of Tree Reductions in Dynamic Programming. In Proceedings of the 18th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '06), pages 39--48. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Morihata and K. Matsuzaki. Automatic Parallelization of Recursive Functions using Quantifier Elimination. In Functional and Logic Programming (FLOPS '10), volume 6009 of Lecture Notes in Computer Science, pages 321--336. Springer, 2010. Google ScholarGoogle Scholar
  15. K. Morita, A. Morihata, K. Matsuzaki, Z. Hu, and M. Takeichi. Automatic Inversion Generates Divide-and-Conquer Parallel Programs. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '07), pages 146--155, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Nistor, W.-N. Chin, T.-S. Tan, and N. Tapus. Optimizing the parallel computation of linear recurrences using compact matrix representations. J. Parallel Distrib. Comput., 69 (4): 373--381, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. X. Redon and P. Feautrier. Detection of Scans in the Polytope Model. Parallel Algorithms Appl., 15 (3--4): 229--263, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. H. Reif, editor. Synthesis of Parallel Algorithms. Morgan Kaufmann Pub, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Sato. Automatic Parallelization via Matrix Multiplication. Master's thesis, The University of Electro-Communications, 2011.Google ScholarGoogle Scholar
  20. H. S. Stone. An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations. J. ACM, 20 (1): 27--38, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. N. Xu, S.-C. Khoo, and Z. Hu. PType System: A Featherweight Parallelizability Detector. In Programming Languages and Systems (APLAS '04), volume 3302 of Lecture Notes in Computer Science, pages 197--212. Springer, 2004.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Automatic parallelization via matrix multiplication

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!