skip to main content
research-article

A framework for enhancing data reuse via associative reordering

Published:09 June 2014Publication History
Skip Abstract Section

Abstract

The freedom to reorder computations involving associative operators has been widely recognized and exploited in designing parallel algorithms and to a more limited extent in optimizing compilers.

In this paper, we develop a novel framework utilizing the associativity and commutativity of operations in regular loop computations to enhance register reuse. Stencils represent a particular class of important computations where the optimization framework can be applied to enhance performance. We show how stencil operations can be implemented to better exploit register reuse and reduce load/stores. We develop a multi-dimensional retiming formalism to characterize the space of valid implementations in conjunction with other program transformations. Experimental results demonstrate the effectiveness of the framework on a collection of high-order stencils.

References

  1. M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, 1964. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Aleen and N. Clark. Commutativity analysis for software parallelization: letting program transformations see the big picture. In ASPLOS, pages 241--252, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS parallel benchmarks - summary and preliminary results. In SC, pages 158--165, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. W. Banks and W. D. Henshaw. Upwind schemes for the wave equation in second-order form. J. Comput. Phys., 231(17):5854--5889, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Bastoul. Code generation in the polyhedral model is easier than you think. In PACT, pages 7--16, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. E. Blelloch. Scans as primitive parallel operations. IEEE TC, 38 (11):1526--1538, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P.-Y. Calland, A. Darte, and Y. Robert. Circuit retiming applied to decomposed software pipelining. IEEE TPDS, 9(1):24--35, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chombo. https://commons.lbl.gov/display/chombo.Google ScholarGoogle Scholar
  9. R. Cruz, M. Araya-Polo, and J. Cela. Introducing the semi-stencil algorithm. In PPAM, pages 496--506. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. PPL, 7(4):379--392, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  11. K. Datta. Auto-tuning Stencil Codes for Cache-Based Multicore Platforms. PhD thesis, EECS, University of California, Berkeley, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. J. Deitz, B. L. Chamberlain, and L. Snyder. Eliminating redundancies in sum-of-product array computations. In ICS, pages 65--77, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Dotsenko, N. K. Govindaraju, P.-P. Sloan, C. Boyd, and J. Manferdelli. Fast scan algorithms on graphics processors. In ICS, pages 205--213, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Dursun, M. Kunaseth, K. ichi Nomura, J. Chame, R. F. Lucas, C. Chen, M. W. Hall, R. K. Kalia, A. Nakano, and P. Vashishta. Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters. The Journal of Supercomputing, 62(2): 946--966, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Feautrier. Dataflow analysis of scalar and array references. IJPP, 20(1):23--53, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  16. L. Han, W. Liu, and J. Tuck. Speculative parallelization of partial reduction variables. In CGO, pages 141--150, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Haralick and L. Shapiro. Computer and robot vision. Computer and Robot Vision. Addison-Wesley, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Henretty, K. Stock, L.-N. Pouchet, F. Franchetti, J. Ramanujam, and P. Sadayappan. Data layout transformation for stencil computations on short simd architectures. In CC, pages 225--245, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Henretty, R. Veras, F. Franchetti, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan. A stencil compiler for short-vector simd architectures. In ICS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Holewinski, L.-N. Pouchet, and P. Sadayappan. High-performance code generation for stencil computations on gpu architectures. In ICS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Kim and S.-M. Moon. Rotating register allocation for enhanced pipeline scheduling. In PACT, pages 60--72, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Kong, R. Veras, K. Stock, F. Franchetti, L.-N. Pouchet, and P. Sadayappan. When polyhedral transformations meet simd code generation. In PLDI, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Kulkarni, D. Nguyen, D. Prountzos, X. Sui, and K. Pingali. Exploiting the commutativity lattice. In PLDI, pages 542--555, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Liebig. openEMS - Open Electromagnetic Field Solver. URL http://openEMS.de.Google ScholarGoogle Scholar
  25. J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE TCCA, pages 19--25, 1995.Google ScholarGoogle Scholar
  26. Overture. Overture: An Object-Oriented Toolkit for Solving Partial Differential Equations in Complex Geometry; version 25, 2012. http://www.overtureframework.org/.Google ScholarGoogle Scholar
  27. N. L. Passos and E. H.-M. Sha. Achieving full parallelism using multidimensional retiming. IEEE TPDS, 7(11):1150--1163, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. L. Passos, E. H.-M. Sha, and S. C. Bass. Optimizing dsp flow graphs via schedule-based multidimensional retiming. IEEE TSP, 44 (1):150--155, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L.-N. Pouchet. PoCC 1.2: the Polyhedral Compiler Collection. http://pocc.sourceforge.net, 2012.Google ScholarGoogle Scholar
  30. L.-N. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos. Iterative optimization in the polyhedral model: Part II, multidimensional time. In PLDI, pages 90--100, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, and N. Vasilache. Loop transformations: Convexity, pruning and optimization. In POPL, pages 549--562, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-based data reuse optimization for configurable computing. In FPGA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. Prabhu, S. Ghosh, Y. Zhang, N. P. Johnson, and D. I. August. Commutative set: A language extension for implicit parallel programming. In PLDI, pages 1--11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. F. Quilleré, S. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. IJPP, 28(5):469--498, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In PLDI, pages 519--530, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. X. Redon and P. Feautrier. Detection of recurrences in sequential programs with loops. In PARLE, pages 132--145, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. C. Rinard and P. C. Diniz. Commutativity analysis: A new analysis technique for parallelizing compilers. TOPLAS, 19(6):942--991, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. N. Sedaghati, R. Thomas, L. Pouchet, R. Teodorescu, and P. Sadayappan. StVEC: A vector instruction extension for high performance stencil computation. In PACT, pages 276--287, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for gpu computing. In GH, pages 97--106, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. L. T. Simpson. Value-driven Redundancy Elimination. PhD thesis, Houston, TX, USA, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. N. Vasilache, A. Cohen, and L.-N. Pouchet. Automatic correction of loop transformations. In PACT, pages 292--304, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Verdoolaege. ISL: An integer set library for the polyhedral model. In Mathematical Software--ICMS 2010, pages 299--302. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. H. Weller. OpenFOAM. URL http://www.openfoam.org/.Google ScholarGoogle Scholar
  44. S. Williams, A. Waterman, and D. Patterson. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65--76, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Y. Zou and S. Rajopadhye. Scan detection and parallelization in "inherently sequential" nested loop programs. In CGO, pages 74--83, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

(auto-classified)
  1. A framework for enhancing data reuse via associative reordering

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 49, Issue 6
      PLDI '14
      June 2014
      598 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2666356
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2014
        619 pages
        ISBN:9781450327848
        DOI:10.1145/2594291

      Copyright © 2014 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 June 2014

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!