Abstract
We present a unified mathematical framework for analyzing the tradeoffs between parallelism and storage allocation within a parallelizing compiler. Using this framework, we show how to find a good storage mapping for a given schedule, a good schedule for a given storage mapping, and a good storage mapping that is valid for all legal (one-dimensional affine) schedules. We consider storage mappings that collapse one dimension of a multidimensional array, and programs that are in a single assignment form and accept a one-dimensional affine schedule. Our method combines affine scheduling techniques with occupancy vector analysis and incorporates general affine dependences across statements and loop nests. We formulate the constraints imposed by the data dependences and storage mappings as a set of linear inequalities, and apply numerical programming techniques to solve for the shortest occupancy vector. We consider our method to be a first step towards automating a procedure that finds the optimal tradeoff between parallelism and storage space.
- Balev, S., Quinton, P., Rajopadhye, S., and Risset, T. 1998. Linear programming models for scheduling systems of affine recurrence equations, a comparative study. In Proceedings of the 10th ACM Symposium on Parallel Algorithms and Architectures. 250--258. Google Scholar
Digital Library
- Barthou, D., Cohen, A., and Collard, J. 2000. Maximal static expansion. Int. J. Parl. Program. 28, 3, 213--243. Google Scholar
Digital Library
- Clauss, P. 1996. Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs. In Proceedings of the 10th ACM International Conference on Supercomputing. 278--285. Google Scholar
Digital Library
- Cohen, A. 1999. Parallelization via constrained storage mapping optimization. In Proceedings of the 2nd International Symposium on High Performance Computing. 83--94. Google Scholar
Digital Library
- Cohen, A. and Lefebvre, V. 1999. Storage mapping optimization for parallel programs. In Proceedings of the 5th International Euro-Par Conference. 375--382. Google Scholar
Digital Library
- Darte, A. 1991. Regular partitioning for synthesizing fixed-size systolic arrays. INTEGRATION, VLSI J. 12, 293--304. Google Scholar
Digital Library
- Darte, A. 1998. Mathematical tools for loop transformations: From systems of uniform recurrence equations to the polytope model. In Algorithms for Parallel Processing, M. H. Heath, A. Ranade, and R. S. Schreiber, Eds. IMA Volumes in Mathematics and its Applications, vol. 105. Springer-Verlag, 147--183.Google Scholar
- Darte, A., Robert, Y., and Vivien, F. 2000. Scheduling and Automatic Parallelization. Birkhäuser, Boston, MA. Google Scholar
Digital Library
- Darte, A., Schreiber, R., and Villard, G. 2005. Lattice-based memory allocation. IEEE Trans. Comput. 54, 10, 1242--1257. Google Scholar
Digital Library
- Darte, A., Silber, G.-A., and Vivien, F. 1997. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Paral. Process. Lett. 7, 4, 379--392.Google Scholar
Cross Ref
- De Greef, E., Catthoor, F., and De Man, H. 1997a. Array placement for storage size reduction in embedded multimedia systems. In Proceedings of the 8th IEEE International Conference on Application-Specific Systems, Architectures and Processors. 66--75. Google Scholar
Digital Library
- De Greef, E., Catthoor, F., and De Man, H. 1997b. Memory size reduction through storage order optimization for embedded parallel multimedia applications. Parall. Comput. 23, 12, 1811--1837. Google Scholar
Digital Library
- Feautrier, P. 1988. Array expansion. In Proceedings of the 2nd ACM International Conference on Supercomputing. 429--441. Google Scholar
Digital Library
- Feautrier, P. 1991. Dataflow analysis of array and scalar references. Int. J. Paral. Program. 20, 1, 23--51.Google Scholar
Cross Ref
- Feautrier, P. 1992a. Some efficient solutions to the affine scheduling problem. I. One-dimensional time. Int. J. Paral. Program. 21, 5, 313--347. Google Scholar
Digital Library
- Feautrier, P. 1992b. Some efficient solutions to the affine scheduling problem. Part II. Multi-dimensional time. Int. J. Paral. Program. 21, 6, 389--420. Google Scholar
Digital Library
- Feautrier, P. 1996. The Data Parallel Programming Model. LNCS Tutorial, vol. 1132 Chapter Automatic Parallelization in the Polytope Model. Springer Verlag, 79--103. Google Scholar
Digital Library
- Feautrier, P. 2001a. Array dataflow analysis. In Compiler Optimizations for Scalable Parallel Systems: Languages, Compilation Techniques, and Run Time Systems, D. P. Agrawal and S. Pande, Eds. Lecture Notes in Computer Science, vol. 1808. Springer, 173--220. Google Scholar
Digital Library
- Feautrier, P. 2001b. The use of Farkas lemma in memory optimization. Unpublished note, June, 2001.Google Scholar
- Feautrier, P., Collard, J.-F., Barreteau, M., Barthou, D., Cohen, A., and Lefebvre, V. 1998. The interplay of expansion and scheduling in PAF. Tech. rep. PRiSM, University of Versailles.Google Scholar
- Irigoin, F. and Triolet, R. 1988. Supernode partitioning. In Proceedings of the 15th ACM Symposium on Principles of Programming Languages. 319--329. Google Scholar
Digital Library
- Kouache, R. 2002. Durées de vie et compression mémoire. M.S. thesis, Université Louis Pasteur, Strasbourg. (In French).Google Scholar
- Lefebvre, V. and Feautrier, P. 1998. Automatic storage management for parallel programs. Paral. Comput. 24, 3--4, 649--671. Google Scholar
Digital Library
- Lim, A. W. and Lam, M. S. 1998. Maximizing parallelism and minimizing synchronization with affine partitions. Paral. Comput. 24, 3--4, 445--475. Google Scholar
Digital Library
- Lim, A. W., Liao, S.-W., and Lam, M. S. 2001. Blocking and array contraction across arbitrarily nested loops using affine partitioning. In Proceedings of the 8th ACM Symposium on Principles and Practices of Parallel Programming, 103--112. Google Scholar
Digital Library
- Loechner, V. and Wilde, D. K. 1997. Parameterized polyhedra and their vertices. Int. J. Paral. Program. 25, 6, 525--549. Google Scholar
Digital Library
- Maydan, D. E., Amarasinghe, S. P., and Lam, M. S. 1993. Array data-flow analysis and its use in array privatization. In Proceedings of the 20th ACM Symposium on Principles of Programming Languages. 2--15. Google Scholar
Digital Library
- Murthy, P. K. and Bhattacharyya, S. S. 2001. Shared buffer implementations of signal processing systems using lifetime analysis techniques. IEEE Trans. Comput.-Aid. Des. Integr. Circuits Syst. 20, 2, 177--198.Google Scholar
Digital Library
- Murthy, P. K. and Bhattacharyya, S. S. 2004. Buffer merging--a powerful technique for reducing memory requirements of synchronous dataflow specifications. ACM Trans. Des. Autom. Elect. Sys. 9, 2, 212--237. Google Scholar
Digital Library
- Needleman, S. B. and Wunsch, C. D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443--453.Google Scholar
Cross Ref
- Pike, G. 2002. Reordering and storage optimizations for scientific programs. Ph.D. thesis, University of California, Berkeley. Google Scholar
Digital Library
- Pugh, W. 1992. The Omega test: a fast and practical integer programming algorithm for dependence analysis. Comm. ACM 8, 102--114. Google Scholar
Digital Library
- Quilleré, F. and Rajopadhye, S. 2000. Optimizing memory usage in the polyhedral model. ACM Trans. Program. Lang. Sys. 22, 5, 773--815. Google Scholar
Digital Library
- Quilleré, F., Rajopadhye, S., and Wilde, D. 2000. Generation of efficient nested loops from polyhedra. Int. J. Paral. Program. 28, 5, 469--498. Google Scholar
Digital Library
- Quinton, P. 1987. Automata Networks in Computer Science, Chapter The systematic design of systolic arrays, 229--260. Manchester University Press. Google Scholar
Digital Library
- Quinton, P. and Dongen, V. V. 1989. The mapping of linear recurrence equations on regular arrays. J. VLSI Sign. Process. 1, 2, 95--113.Google Scholar
Digital Library
- Rajopadhye, S., Purushothaman, S., and Fujimoto. 1986. Synthesizing systolic arrays from recurrence equations with linear dependencies. In Proceedings of the 6th International Conference on Foundations of Software Technology and Theoretical Computer Science. 488--503. Google Scholar
Digital Library
- Saouter, Y. 1992. À propos de systèmes d'équations récurrentes. Ph.D. thesis, Université de Rennes 1.Google Scholar
- Schrijver, A. 1986. Theory of Linear and Integer Programming. John Wiley and Sons, New York. Google Scholar
Digital Library
- Sheldon, J. W., Lee, W., Greenwald, B., and Amarasinghe, S. 2001. Strength reduction of integer divison and modulo operations. In Proceedings of the 14th International Workshop on Languages and Compilers for Parallel Computing. 254--273.Google Scholar
- Strout, M. M., Carter, L., Ferrante, J., and Simon, B. 1998. Schedule-independent storage mapping for loops. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems. 24--33. Google Scholar
Digital Library
- Wilde, D. and Rajopadhye, S. 1997. Memory reuse analysis in the polyhedral model. Paral. Proces. Lett. 7, 2, 203--215.Google Scholar
Cross Ref
- Wong, Y. 1989. Algorithms for systolic array synthesis. Ph.D. thesis, Yale University. Google Scholar
Digital Library
- Wong, Y. and Delosme, J.-M. 1992. Space-optimal linear processor allocation for systolic arrays synthesis. In Proceedings of the 6th International Parallel Processing Symposium. 275--282. Google Scholar
Digital Library
Index Terms
A step towards unifying schedule and storage optimization
Recommendations
A practical automatic polyhedral parallelizer and locality optimizer
PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and ImplementationWe present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this ...
PLUTO+: near-complete modeling of affine transformations for parallelism and locality
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingAffine transformations have proven to be very powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multi-dimensional affine function can represent a long and complex sequence of simpler ...
PLUTO+: near-complete modeling of affine transformations for parallelism and locality
PPoPP '15Affine transformations have proven to be very powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multi-dimensional affine function can represent a long and complex sequence of simpler ...






Comments