Abstract
Technology trends will cause data movement to account for the majority of energy expenditure and execution time on emerging computers. Therefore, computational complexity will no longer be a sufficient metric for comparing algorithms, and a fundamental characterization of data access complexity will be increasingly important. The problem of developing lower bounds for data access complexity has been modeled using the formalism of Hong and Kung's red/blue pebble game for computational directed acyclic graphs (CDAGs). However, previously developed approaches to lower bounds analysis for the red/blue pebble game are very limited in effectiveness when applied to CDAGs of real programs, with computations comprised of multiple sub-computations with differing DAG structure. We address this problem by developing an approach for effectively composing lower bounds based on graph decomposition. We also develop a static analysis algorithm to derive the asymptotic data-access lower bounds of programs, as a function of the problem size and cache size.
Supplemental Material
- G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in numerical linear algebra. SIAM J. Matrix Analysis Applications, 32(3):866--901, 2011.Google Scholar
Cross Ref
- G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Brief announcement: Strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. In Proc. SPAA'12, pages 77--79, 2012. Google Scholar
Digital Library
- G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Graph expansion and communication costs of fast matrix multiplication. J. ACM, 59(6): 32, 2012. Google Scholar
Digital Library
- A. Barvinok. Computing the Ehrhart polynomial of a convex l attice polytope. Discrete and Computational Geometry, 12:35--48, 1994.Google Scholar
Digital Library
- J. Bennett, A. Carbery, M. Christ, and T. Tao. Finite bounds for Holder-Brascamp-Lieb multilinear inequalities. Mathematical Research Letters, 55(4):647--666, 2010.Google Scholar
Cross Ref
- K. Bergman, S. Borkar, et al. Exascale computing study: Technology challenges in achieving exascale systems. DARPA IPTO, Tech. Rep, 2008.Google Scholar
- G. Bilardi and E. Peserico. A characterization of temporal locality and its portability across memory hierarchies. Automata, Languages and Programming, pages 128--139, 2001. Google Scholar
Digital Library
- G. Bilardi and F. P. Preparata. Processor - Time Tradeoffs under Bounded-Speed Message Propagation: Part II, Lower Bounds. Theory Comput. Syst., 32(5):531--559, 1999.Google Scholar
Cross Ref
- G. Bilardi, A. Pietracaprina, and P. D'Alberto. On the space and access complexity of computation DAGs. In Graph-Theoretic Concepts in Computer Science, volume 1928 of LNCS, pages 81--92. 2000. Google Scholar
Digital Library
- G. Bilardi, M. Scquizzato, and F. Silvestri. A lower bound technique for communication on bsp with application to the fft. In Euro-Par, pages 676--687, 2012. Google Scholar
Digital Library
- M. Christ, J. Demmel, N. Knight, T. Scanlon, and K. Yelick. Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays Part 1. EECS Technical Report EECS'2013--6 1, UC Berkeley, May 2013.Google Scholar
- S. A. Cook. An observation on time-storage trade off. J. Comput. Syst. Sci., 9(3):308--316, 1974. Google Scholar
Digital Library
- J. Demmel, L. Grigori, M. Hoemmen, and J. Langou. Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Scientific Computing, 34(1), 2012. Google Scholar
Digital Library
- V. Elango, F. Rastello, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan. Data access complexity: The red/blue pebble game revisited. Technical report, OSU/INRIA/LSU/UCLA, Sept. 2013. OSU-CI SRC-7/13-TR16.Google Scholar
- V. Elango, F. Rastello, L. Pouchet, J. Ramanujam, and P. Sadayappan. On characterizing the data movement complexity of computational dags for parallel execution. In 26th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '14, pages 296--306, 2014. Google Scholar
Digital Library
- P. Feautrier. Parametric integer programming. RAIRO Recherche Operationnelle, 22(3):243--268, 1988.Google Scholar
Cross Ref
- S. H. Fuller and L. I. Millett. The Future of Computing Performance: Game Over or Next Level? The National Academies Press, 2011. Google Scholar
Digital Library
- S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Intl. J. of Parallel Programming, 34(3), 2006. Google Scholar
Digital Library
- G. Gupta and S. Rajopadhye. The Z-polyhedral model. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'07), pages 237--248. ACM, 2007. Google Scholar
Digital Library
- J.-W. Hong and H. T. Kung. I/O complexity: The red-blue pebble game. In Proc. of the 13th annual ACM sympo. on Theory of computing (STOC'81), pages 326--333. ACM, 1981. Google Scholar
Digital Library
- D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017--1026, 2004. Google Scholar
Digital Library
- A. Ketterlin and P. Clauss. Profiling data-dependence to assist parallelization: Framework, scope, and optimization. In MICRO, pages 437--448, 2012. Google Scholar
Digital Library
- L. Loomis and H. Whitney. An inequality related to the isoperimetric inequality. Bull. Am. Math. Soc., 55:961--962, 1949.Google Scholar
Cross Ref
- D. Ranjan and M. Zubair. Vertex isoperimetric parameter of a computation graph. Int. J. Found. Comput. Sci., 23(4):941--964, 2012.Google Scholar
Cross Ref
- D. Ranjan, J. Savage, and M. Zubair. Strong I/O lower bounds for binomial and FFT computation graphs. In Computing and Combinatorics, volume 6842 of LNCS, pages 134--145. Springer, 2011. Google Scholar
Digital Library
- D. Ranjan, J. E. Savage, and M. Zubair. Upper and lower I/O bounds for pebbling r-pyramids. J. Discrete Algorithms, 14:2--12, 2012. Google Scholar
Digital Library
- J. Savage. Extending the Hong-Kung model to memory hierarchies. In Computing and Combinatorics, volume 959 of LNCS, pages 270--281. 1995. Google Scholar
Digital Library
- J. E. Savage. Models of Computation. Addison-Wesley, 1998.Google Scholar
- J. E. Savage and M. Zubair. A unified model for multicore arc hi-tectures. In Proceedings of the 1st International Forum on Next-generation Multicore/Manycore Technologies, page 9. ACM, 2008. Google Scholar
Digital Library
- J. E. Savage and M. Zubair. Cache-optimal algorithms for option pricing. ACM Trans. Math. Softw., 37(1), 2010. Google Scholar
Digital Library
- M. Scquizzato and F. Silvestri. Communication lower bounds for distributed-memory computations. CoRR, abs/1307.1805, 2013.Google Scholar
- J. Shalf, S. Dosanjh, and J. Morrison. Exascale computing technology challenges. High Performance Computing for Computational Science-VECPAR 2010, pages 1--25, 2011. Google Scholar
Digital Library
- E. Solomonik, A. Buluc, and J. Demmel. Minimizing communication in all-pairs shortest paths. In IPDPS, 2013. Google Scholar
Digital Library
- S. I. Valdimarsson. The Brascamp-Lieb polyhedron. Can. J. Math., 62(4):870--888, 2010.Google Scholar
Cross Ref
- L. G. Valiant. A bridging model for multi-core computing. J. Comput. Syst. Sci., 77:154--166, Jan. 2011. Google Scholar
Digital Library
- S. Verdoolaege. isl: An integer set library for the poly hedral model. In Mathematical Software-ICMS 2010, pages 299--302. Springer, 2010. Google Scholar
Digital Library
Index Terms
On Characterizing the Data Access Complexity of Programs
Recommendations
On Characterizing the Data Access Complexity of Programs
POPL '15: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesTechnology trends will cause data movement to account for the majority of energy expenditure and execution time on emerging computers. Therefore, computational complexity will no longer be a sufficient metric for comparing algorithms, and a fundamental ...
I/O complexity: The red-blue pebble game
STOC '81: Proceedings of the thirteenth annual ACM symposium on Theory of computingIn this paper, the red-blue pebble game is proposed to model the input-output complexity of algorithms. Using the pebble game formulation, a number of lower bound results for the I/O requirement are proven. For example, it is shown that to perform the n-...
Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication
SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisWe propose COSMA: a parallel matrix-matrix multiplication algorithm that is near communication-optimal for all combinations of matrix dimensions, processor counts, and memory sizes. The key idea behind COSMA is to derive an optimal (up to a factor of ...







Comments