Abstract
We consider the analysis and optimization of code utilizing operations and functions operating on entire arrays. Models are developed for studying the minimization of the number of materializations of array-valued temporaries in basic blocks, each consisting of a sequence of assignment statements involving array-valued variables. We derive lower bounds on the number of materializations required, and develop several algorithms minimizing the number of materializations, subject to a simple constraint on allowable statement rearrangement. In contrast, we also show that when statement rearrangement is unconstrained, minimizing the number of materializations becomes NP-complete, even for very simple basic blocks.
- Abrams, P. S. 1970. An APL machine. Ph.D. dissertation. Stanford University, Stanford, CA.]] Google Scholar
- Aho, A. V., Sethi, R., and Ullman, J. D. 1986. Compilers: Principles, Techniques, and Tools. Addison--Wesley, Reading, MA.]] Google Scholar
- Allen, R. and Kennedy, K. 2002. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan-Kaufmann Publishers, San Francisco, CA.]] Google Scholar
- Bacon, D. F., Graham, S. L., and Sharp, O. J. 1994. Compiler transformations for high-performance computing. ACM Comput. Surv. 26, 4 (Dec.), 345--420.]] Google Scholar
- Budd, T. A. 1984. An APL compiler for a vector processor. ACM Trans. Prog. Lang. Syst. 6, 3 (July), 297--313.]] Google Scholar
- Chamberlain, B. L., Choi, S.-E., Lewis, E. C., Lin, C., Snyder, L., and Weathersby, W. D. 1996. Factor-join: A unique approach to compiling array languages for parallel machines. In Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing, D. Padua, A. Nicolau, D. Gelernter, U. Banerjee, and D. Sehr, Eds. Lecture Notes in Computer Science, vol. 1239. Springer-Verlag, New York, pp. 481--500.]] Google Scholar
- Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Prog. Lang. Syst. 13, 4 (Oct.), 451--490.]] Google Scholar
- Dinesh, T. B., Haveraaen, M., and Heering, J. 2000. An algebraic programming style for numerical software and its optimization. Sci. Prog. 8, 4, 247--259.]] Google Scholar
- Gao, G. R., Olsen, R., Sarkar, V., and Thekkath, R. 1992. Collective loop fusion for array contraction. In Proceedings of 5th International Workshop on Languages and Compilers for Parallel Computing (New Haven, CT, Aug.), U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, Eds. Lecture Notes in Computer Science, vol. 757. Springer-Verlag, pp. 281--295.]] Google Scholar
- Guibas, L. J. and Wyatt, D. K. 1978. Compilation and delayed evaluation in APL. In Conference Record of the 5th Annual ACM SIGACT--SIGPLAN Symposium on Principles of Programming Languages (POPL '78) (Tucson, AZ, Jan). ACM, New York, pp. 1--8.]] Google Scholar
- Gupta, M., Midkiff, S., Schonberg, E., Seshadri, V., Shields, D., Wang, K.-Y., Ching, W.-M., and Ngo, T. 1995. An HPF compiler for the IBM SP2. In Proceedings of Supercomputing '95. (San Diego, CA, Dec.). ACM, New York.]] Google Scholar
- Hassitt, A. and Lyon, L. E. 1972. Efficient evaluation of array subscripts of arrays. IBM J. Res. Devl. 16, 1 (Jan.), 45--57.]]Google Scholar
- Humphrey, W., Karmesin, S., Bassetti, F., and Reynders, J. 1997. Optimization of data-parallel field expressions in the POOMA framework. In Proceedings of the 1st International Conference on Scientific Computing in Object--Oriented Parallel Environments (ISCOPE '97) (Marina del Rey, CA, Dec.), Y. Ishikawa, R. R. Oldehoeft, J. Reynders, and M. Tholburn, Eds. Lecture Notes in Computer Science, vol. 1343. Springer-Verlag, New York, pp. 185--194.]] Google Scholar
- Hwang, G.-H., Lee, J. K., and Ju, D.-C. 1995. An array operation synthesis scheme to optimize Fortran 90 programs. ACM SIGPLAN Notices, Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 30, 8 (Aug.), 112--122.]] Google Scholar
- Hwang, G.-H., Lee, J. K., and Ju, R. D.-C. 1998. A function-composition approach to synthesize Fortran 90 array operations. J. Paral. Dist. Comput. 54, 1 (Oct.), 1--47.]] Google Scholar
- Hwang, G.-H., Lee, J. K., and Ju, R. D.-C. 2001. Array operation synthesis to optimize HPF programs on distributed memory machines. J. Paral. Dist. Comput. 61, 4 (Apr.), 467--500.]] Google Scholar
- Ju, D.-C. 1992. The optimization and parallelization of array language programs. Ph.D. dissertation, University of Texas at Austin, Austin.]] Google Scholar
- Kennedy, K. 2001. Fast greedy weighted fusion. Int. J. Paral. Prog. (IJPP) 29, 5 (Oct.), 463--491.]] Google Scholar
- Kennedy, K. and McKinley, K. S. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing (Portland, OR, Aug.), U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, Eds. Lecture Notes in Computer Science, vol. 768. Springer-Verlag, New York, pp. 301--320.]] Google Scholar
- Kennedy, K., Mellor-Crummey, J., and Roth, G. 1995. Optimizing Fortran 90 shift operations on distributed-memory multicomputers. In Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing (Columbus, OH, Aug.). Lecture Notes in Computer Science, vol. 1033. Springer-Verlag, New York, pp. 161--175.]] Google Scholar
- Knobe, K. and Sarkar, V. 1998. Array SSA form and its use in parallelization. In Conference Record 25th ACM SIGACT--SIGPLAN Symposium on Principles of Programming Languages (POPL '98) (San Diego, CA, Jan.). ACM, New York, pp. 107--120.]] Google Scholar
- Lewis, E. C., Lin, C., and Snyder, L. 1998. The implementation and evaluation of fusion and contraction in array languages. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (Montreal, Que., Canada, June). ACM, New York, pp. 50--59.]] Google Scholar
- Lin, C. and Snyder, L. 1993. ZPL: An array sublanguage. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing (Portland, OR, Aug.), U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, Eds. Lecture Notes in Computer Science, vol. 768. Springer-Verlag, New York, pp. 96--114.]] Google Scholar
- Manjikian, N. and Abdelrahman, T. S. 1997. Fusion of loops for parallelism and locality. IEEE Trans. Paral. Dist. Syst. 8, 2 (Feb.), 193--209.]] Google Scholar
- Mullin, L. 1993. The Psi compiler project. In Workshop on Compilers for Parallel Computers. TU Delft, Holland.]]Google Scholar
- Mullin, L. M. R. 1988. A mathematics of arrays. Ph.D. dissertation. Syracuse University, Syracuse, New York.]]Google Scholar
- Roth, G. 1997. Optimizing Fortran90D/HPF for distributed-memory computers. Ph.D. dissertation, Dept. of Computer Science, Rice University.]] Google Scholar
- Roth, G. 2000. Advanced scalarization of array syntax. In Proceedings of the 9th International Compiler Construction Conference (CC '2000) (Berlin, Germany, Mar.). Lecture Notes in Computer Science, vol. 2017. Springer-Verlag, New York, pp. 219--231.]] Google Scholar
- Roth, G. and Kennedy, K. 1996. Dependence analysis of Fortran90 array syntax. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA '96) (Sunnyvale, CA, Aug.). CSREA Press, pp. 1225--1235.]]Google Scholar
- Roth, G. and Kennedy, K. 1998. Loop fusion in high-performance Fortran. In Proceedings of the 12th International Conference on Supercomputing (ICS '98) (Melbourne, Australia, July). ACM, New York, pp. 125--132.]] Google Scholar
- Roth, G., Mellor-Crummey, J., Kennedy, K., and Brickner, R. G. 1997. Compiling stencils in high performance Fortran. In Proceedings of the 1997 ACM/IEEE Conference on Supercomputing (SC '97): High Performance Networking and Computing (San Jose, CA, Nov.). ACM, New York.]] Google Scholar
- Schwartz, J. T. 1975. Optimization of very high level languages---I. Value transmission and its corollaries. Comput. Lang. 1, 2 (June), 161--194.]]Google Scholar
- Siek, J. G. and Lumsdaine, A. 1998. The matrix template library: A generic programming approach to high-performance numerical linear algebra. In Proceedings of the 2nd International Symposium on Computing in Object-Oriented Parallel Environments (ISCOPE '98) (Santa Fe, NM, Dec.), D. Caromel, R. R. Oldehoeft, and M. Tholburn, Eds. Lecture Notes in Computer Science, vol. 1505. Springer-Verlag, New York, pp. 59--70.]] Google Scholar
- Veldhuizen, T. 1995a. Using C++ template metaprograms. C++ Report 7, 4 (May), 36--43. (Reprinted in C++ Gems: Programming Pearls from the C++ Report, S. R. Lippman, Ed. Cambridge University Press, Cambridge, UK, pp. 459--474.)]] Google Scholar
- Veldhuizen, T. L. 1995b. Expression templates. C++ Report 7, 5 (June), 26--31. (Reprinted in C++ Gems: Programming Pearls from the C++ Report, S. S. Lippman, Ed. Cambridge University Press, Cambridge, UK, pp. 459--474.)]]Google Scholar
- Veldhuizen, T. L. 1998. Arrays in Blitz++. In Proceedings of the 2nd International Symposium on Scientific Computing in Object-Oriented Parallel Environments (ISCOPE '98) (Santa Fe, NM. Dec.). D. Caromel, R. R. Oldehoeft, and M. Tholburn, Eds. Lecture Notes in Computer Science, vol. 1505. Springer-Verlag, New York, pp. 223--230.]] Google Scholar
- Veldhuizen, T. L. and Gannon, D. 1998. Active libraries: Rethinking the roles of compilers and libraries. In Proceedings of the SIAM Workshop on Object Oriented Methods for Interoperable Scientific and Engineering Computing (OO '98) (Yorktown Heights, NY.). SIAM, Philadelphia, PA.]]Google Scholar
- Wolfe, M. 1996. High Performance Compilers for Parallel Computing. Addison-Wesley, Reading, MA.]] Google Scholar
Index Terms
On minimizing materializations of array-valued temporaries
Recommendations
On Materializations of Array-Valued Temporaries
LCPC '00: Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised PapersWe present results demonstrating the usefulness of monolithic program analysis and optimization prior to scalarization. In particular, models are developed for studying nonmaterialization in basic blocks consisting ofa sequence of assignment statements ...
Efficient Representation Scheme for Multidimensional Array Operations
Array operations are used in a large number of important scientific codes, such as molecular dynamics, finite element methods, climate modeling, etc. To implement these array operations efficiently, many methods have been proposed in the literature. ...
Loop optimization for aggregate array computations
ICCL '98: Proceedings of the 1998 International Conference on Computer LanguagesAn aggregate array computation is a loop that computes accumulated quantities over array elements. Such computations are common in programs that use arrays, and the array elements involved in such computations often overlap, especially across iterations ...






Comments