skip to main content
research-article
Free Access

Automatic Storage Optimization for Arrays

Published:08 April 2016Publication History
Skip Abstract Section

Abstract

Efficient memory allocation is crucial for data-intensive applications, as a smaller memory footprint ensures better cache performance and allows one to run a larger problem size given a fixed amount of main memory. In this article, we describe a new automatic storage optimization technique to minimize the dimensionality and storage requirements of arrays used in sequences of loop nests with a predetermined schedule. We formulate the problem of intra-array storage optimization as one of finding the right storage partitioning hyperplanes: each storage partition corresponds to a single storage location. Our heuristic is driven by a dual-objective function that minimizes both the dimensionality of the mapping and the extents along those dimensions. The technique is dimension optimal for most codes encountered in practice. The storage requirements of the mappings obtained also are asymptotically better than those obtained by any existing schedule-dependent technique. Storage reduction factors and other results that we report from an implementation of our technique demonstrate its effectiveness on several real-world examples drawn from the domains of image processing, stencil computations, high-performance computing, and the class of tiled codes in general.

References

  1. Samah Abu-Mahmeed, Cheryl McCosh, Zoran Budimli, Ken Kennedy, Kaushik Ravindran, Kevin Hogan, Paul Austin, Steve Rogers, and Jacob Kornerup. 2009. Scheduling tasks to maximize usage of aggregate variables in place. In Proceedings of the International Conference on Compiler Construction (CC’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman, and Monica S. Lam. 2006. Compilers: Principles, Techniques, and Tools (2nd ed.). Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Christophe Alias. 2007. [email protected]. Available at http://compsys-tools.ens-lyon.fr/.Google ScholarGoogle Scholar
  4. Christophe Alias, Fabrice Baray, and Alain Darte. 2007. [email protected]: An implementation of lattice-based array contraction in the source-to-source translator Rose. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. 73--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Vinayaka Bandishti, Irshad Pananilath, and Uday Bondhugula. 2012. Tiling stencil computations to maximize parallelism. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis. Article No. 40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. 2008. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Proceedings of the Joint European Conferences on Theory and Practice of Software 17th International Conference on Compiler Construction (CC’08/ETAPS’08). 132--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Philippe Clauss, Federico Javier Fernandez, Diego Garbervetsky, and Sven Verdoolaege. 2009. Symbolic polynomial maximization over convex sets and its application to memory requirement estimation. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 17, 8, 983--996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Alain Darte, Robert Schreiber, and Gilles Villard. 2005. Lattice-based memory allocation. IEEE Transactions on Computing 54, 10, 1242--1257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Eddy de Greef, Francky Catthoor, and Hugo De Man. 1997. Memory size reduction through storage order optimization for embedded parallel multimedia applications. Parallel Computing 23, 12, 1811--1837. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Feautrier. 1992. Some efficient solutions to the affine scheduling problem: Part I, one-dimensional time. International Journal of Parallel Programming 21, 5, 313--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. GNU. 2010. GLPK (GNU Linear Programming Kit). Retrieved February 27, 2016, from https://www.gnu. org/software/glpk/.Google ScholarGoogle Scholar
  12. Tobias Grosser, Albert Cohen, Justin Holewinski, Ponuswamy Sadayappan, and Sven Verdoolaege. 2014. Hybrid hexagonal/classical tiling for GPUs. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization. ACM, New York, NY, 66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chris Harris and Mike Stephens. 1988. A combined corner and edge detector. In Proceedings of the 4th Alvey Vision Conference. 147--151.Google ScholarGoogle ScholarCross RefCross Ref
  14. Intel. 2013. Using Intel VTune Amplifier XE to Tune Software on the Intel Xeon Processor E5 Family. Retrieved February 27, 2016, from https://software.intel.com/en-us/articles/using-intel-vtune-amplifier-xe-to-tune-software-on-the-intel-xeon-processor-e5-family.Google ScholarGoogle Scholar
  15. Intel. 2015. Intel VTune Amplifier XE 2015 (build 367957). Retrieved December 20, 2015, from https://software.intel.com/en-us/intel-vtune-amplifier-xe.Google ScholarGoogle Scholar
  16. Vincent Lefebvre and Paul Feautrier. 1998. Automatic storage management for parallel programs. Parallel Computing 24, 3--4, 649--671. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. 2015. PolyMage: Automatic optimization for image processing pipelines. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Irshad Pananilath, Aravind Acharya, Vinay Vasista, and Uday Bondhugula. 2015. An optimizing code generator for a class of Lattice-Boltzmann computations. ACM Transactions on Architecture and Code Optimization 12, 2, Article No. 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Pluto. 2008. PLUTO: An Automatic Polyhedral parallelizer and locality optimizer for multicores. Available at http://pluto-compiler.sourceforge.net.Google ScholarGoogle Scholar
  20. Fabien Quilleré and Sanjay V. Rajopadhye. 2000. Optimizing memory usage in the polyhedral model. ACM Transactions on Programming Languages and Systems 22, 5, 773--815. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman P. Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). 519--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alexander Schrijver. 1986. Theory of Linear and Integer Programming. John Wiley & Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Strout, L. Carter, J. Ferrante, and B. Simon. 1998. Schedule-independent storage mapping for loops. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 24--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Succi. 2001. The Lattice Boltzmann Equation: For Fluid Dynamics and Beyond. Oxford University Press.Google ScholarGoogle Scholar
  25. William Thies, Frédéric Vivien, and Saman Amarasinghe. 2007. A step towards unifying schedule and storage optimization. ACM Transactions on Programming Languages and Systems 29, 6, Article No. 34. DOI:http://dx.doi.org/10.1145/1286821.1286825 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. William Thies, Frédéric Vivien, Jeffrey Sheldon, and Saman P. Amarasinghe. 2001. A unified framework for schedule and storage optimization. In Proceedings of the ACM SIGPLAN Symposium on Programming Language Design and Implementation. 232--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sven Verdoolaege. 2010. isl: An integer set library for the polyhedral model. In Mathematical Software—ICMS 2010. Lecture Notes in Computer Science, Vol. 6327. Springer, 299--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Doran Wilde and Sanjay V. Rajopadhye. 1996. Memory reuse analysis in the polyhedral model. In Proceedings of the 2nd International Euro-Par Conference on Parallel Processing. 389--397. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!