skip to main content
article
Free Access

A step towards unifying schedule and storage optimization

Published:01 October 2007Publication History
Skip Abstract Section

Abstract

We present a unified mathematical framework for analyzing the tradeoffs between parallelism and storage allocation within a parallelizing compiler. Using this framework, we show how to find a good storage mapping for a given schedule, a good schedule for a given storage mapping, and a good storage mapping that is valid for all legal (one-dimensional affine) schedules. We consider storage mappings that collapse one dimension of a multidimensional array, and programs that are in a single assignment form and accept a one-dimensional affine schedule. Our method combines affine scheduling techniques with occupancy vector analysis and incorporates general affine dependences across statements and loop nests. We formulate the constraints imposed by the data dependences and storage mappings as a set of linear inequalities, and apply numerical programming techniques to solve for the shortest occupancy vector. We consider our method to be a first step towards automating a procedure that finds the optimal tradeoff between parallelism and storage space.

References

  1. Balev, S., Quinton, P., Rajopadhye, S., and Risset, T. 1998. Linear programming models for scheduling systems of affine recurrence equations, a comparative study. In Proceedings of the 10th ACM Symposium on Parallel Algorithms and Architectures. 250--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Barthou, D., Cohen, A., and Collard, J. 2000. Maximal static expansion. Int. J. Parl. Program. 28, 3, 213--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Clauss, P. 1996. Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs. In Proceedings of the 10th ACM International Conference on Supercomputing. 278--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cohen, A. 1999. Parallelization via constrained storage mapping optimization. In Proceedings of the 2nd International Symposium on High Performance Computing. 83--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cohen, A. and Lefebvre, V. 1999. Storage mapping optimization for parallel programs. In Proceedings of the 5th International Euro-Par Conference. 375--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Darte, A. 1991. Regular partitioning for synthesizing fixed-size systolic arrays. INTEGRATION, VLSI J. 12, 293--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Darte, A. 1998. Mathematical tools for loop transformations: From systems of uniform recurrence equations to the polytope model. In Algorithms for Parallel Processing, M. H. Heath, A. Ranade, and R. S. Schreiber, Eds. IMA Volumes in Mathematics and its Applications, vol. 105. Springer-Verlag, 147--183.Google ScholarGoogle Scholar
  8. Darte, A., Robert, Y., and Vivien, F. 2000. Scheduling and Automatic Parallelization. Birkhäuser, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Darte, A., Schreiber, R., and Villard, G. 2005. Lattice-based memory allocation. IEEE Trans. Comput. 54, 10, 1242--1257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Darte, A., Silber, G.-A., and Vivien, F. 1997. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Paral. Process. Lett. 7, 4, 379--392.Google ScholarGoogle ScholarCross RefCross Ref
  11. De Greef, E., Catthoor, F., and De Man, H. 1997a. Array placement for storage size reduction in embedded multimedia systems. In Proceedings of the 8th IEEE International Conference on Application-Specific Systems, Architectures and Processors. 66--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. De Greef, E., Catthoor, F., and De Man, H. 1997b. Memory size reduction through storage order optimization for embedded parallel multimedia applications. Parall. Comput. 23, 12, 1811--1837. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Feautrier, P. 1988. Array expansion. In Proceedings of the 2nd ACM International Conference on Supercomputing. 429--441. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Feautrier, P. 1991. Dataflow analysis of array and scalar references. Int. J. Paral. Program. 20, 1, 23--51.Google ScholarGoogle ScholarCross RefCross Ref
  15. Feautrier, P. 1992a. Some efficient solutions to the affine scheduling problem. I. One-dimensional time. Int. J. Paral. Program. 21, 5, 313--347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Feautrier, P. 1992b. Some efficient solutions to the affine scheduling problem. Part II. Multi-dimensional time. Int. J. Paral. Program. 21, 6, 389--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Feautrier, P. 1996. The Data Parallel Programming Model. LNCS Tutorial, vol. 1132 Chapter Automatic Parallelization in the Polytope Model. Springer Verlag, 79--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Feautrier, P. 2001a. Array dataflow analysis. In Compiler Optimizations for Scalable Parallel Systems: Languages, Compilation Techniques, and Run Time Systems, D. P. Agrawal and S. Pande, Eds. Lecture Notes in Computer Science, vol. 1808. Springer, 173--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Feautrier, P. 2001b. The use of Farkas lemma in memory optimization. Unpublished note, June, 2001.Google ScholarGoogle Scholar
  20. Feautrier, P., Collard, J.-F., Barreteau, M., Barthou, D., Cohen, A., and Lefebvre, V. 1998. The interplay of expansion and scheduling in PAF. Tech. rep. PRiSM, University of Versailles.Google ScholarGoogle Scholar
  21. Irigoin, F. and Triolet, R. 1988. Supernode partitioning. In Proceedings of the 15th ACM Symposium on Principles of Programming Languages. 319--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kouache, R. 2002. Durées de vie et compression mémoire. M.S. thesis, Université Louis Pasteur, Strasbourg. (In French).Google ScholarGoogle Scholar
  23. Lefebvre, V. and Feautrier, P. 1998. Automatic storage management for parallel programs. Paral. Comput. 24, 3--4, 649--671. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lim, A. W. and Lam, M. S. 1998. Maximizing parallelism and minimizing synchronization with affine partitions. Paral. Comput. 24, 3--4, 445--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lim, A. W., Liao, S.-W., and Lam, M. S. 2001. Blocking and array contraction across arbitrarily nested loops using affine partitioning. In Proceedings of the 8th ACM Symposium on Principles and Practices of Parallel Programming, 103--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Loechner, V. and Wilde, D. K. 1997. Parameterized polyhedra and their vertices. Int. J. Paral. Program. 25, 6, 525--549. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Maydan, D. E., Amarasinghe, S. P., and Lam, M. S. 1993. Array data-flow analysis and its use in array privatization. In Proceedings of the 20th ACM Symposium on Principles of Programming Languages. 2--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Murthy, P. K. and Bhattacharyya, S. S. 2001. Shared buffer implementations of signal processing systems using lifetime analysis techniques. IEEE Trans. Comput.-Aid. Des. Integr. Circuits Syst. 20, 2, 177--198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Murthy, P. K. and Bhattacharyya, S. S. 2004. Buffer merging--a powerful technique for reducing memory requirements of synchronous dataflow specifications. ACM Trans. Des. Autom. Elect. Sys. 9, 2, 212--237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Needleman, S. B. and Wunsch, C. D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443--453.Google ScholarGoogle ScholarCross RefCross Ref
  31. Pike, G. 2002. Reordering and storage optimizations for scientific programs. Ph.D. thesis, University of California, Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Pugh, W. 1992. The Omega test: a fast and practical integer programming algorithm for dependence analysis. Comm. ACM 8, 102--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Quilleré, F. and Rajopadhye, S. 2000. Optimizing memory usage in the polyhedral model. ACM Trans. Program. Lang. Sys. 22, 5, 773--815. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Quilleré, F., Rajopadhye, S., and Wilde, D. 2000. Generation of efficient nested loops from polyhedra. Int. J. Paral. Program. 28, 5, 469--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Quinton, P. 1987. Automata Networks in Computer Science, Chapter The systematic design of systolic arrays, 229--260. Manchester University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Quinton, P. and Dongen, V. V. 1989. The mapping of linear recurrence equations on regular arrays. J. VLSI Sign. Process. 1, 2, 95--113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Rajopadhye, S., Purushothaman, S., and Fujimoto. 1986. Synthesizing systolic arrays from recurrence equations with linear dependencies. In Proceedings of the 6th International Conference on Foundations of Software Technology and Theoretical Computer Science. 488--503. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Saouter, Y. 1992. À propos de systèmes d'équations récurrentes. Ph.D. thesis, Université de Rennes 1.Google ScholarGoogle Scholar
  39. Schrijver, A. 1986. Theory of Linear and Integer Programming. John Wiley and Sons, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Sheldon, J. W., Lee, W., Greenwald, B., and Amarasinghe, S. 2001. Strength reduction of integer divison and modulo operations. In Proceedings of the 14th International Workshop on Languages and Compilers for Parallel Computing. 254--273.Google ScholarGoogle Scholar
  41. Strout, M. M., Carter, L., Ferrante, J., and Simon, B. 1998. Schedule-independent storage mapping for loops. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems. 24--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Wilde, D. and Rajopadhye, S. 1997. Memory reuse analysis in the polyhedral model. Paral. Proces. Lett. 7, 2, 203--215.Google ScholarGoogle ScholarCross RefCross Ref
  43. Wong, Y. 1989. Algorithms for systolic array synthesis. Ph.D. thesis, Yale University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Wong, Y. and Delosme, J.-M. 1992. Space-optimal linear processor allocation for systolic arrays synthesis. In Proceedings of the 6th International Parallel Processing Symposium. 275--282. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A step towards unifying schedule and storage optimization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!