Abstract
Efficient memory allocation is crucial for data-intensive applications, as a smaller memory footprint ensures better cache performance and allows one to run a larger problem size given a fixed amount of main memory. In this article, we describe a new automatic storage optimization technique to minimize the dimensionality and storage requirements of arrays used in sequences of loop nests with a predetermined schedule. We formulate the problem of intra-array storage optimization as one of finding the right storage partitioning hyperplanes: each storage partition corresponds to a single storage location. Our heuristic is driven by a dual-objective function that minimizes both the dimensionality of the mapping and the extents along those dimensions. The technique is dimension optimal for most codes encountered in practice. The storage requirements of the mappings obtained also are asymptotically better than those obtained by any existing schedule-dependent technique. Storage reduction factors and other results that we report from an implementation of our technique demonstrate its effectiveness on several real-world examples drawn from the domains of image processing, stencil computations, high-performance computing, and the class of tiled codes in general.
- Samah Abu-Mahmeed, Cheryl McCosh, Zoran Budimli, Ken Kennedy, Kaushik Ravindran, Kevin Hogan, Paul Austin, Steve Rogers, and Jacob Kornerup. 2009. Scheduling tasks to maximize usage of aggregate variables in place. In Proceedings of the International Conference on Compiler Construction (CC’09). Google Scholar
Digital Library
- Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman, and Monica S. Lam. 2006. Compilers: Principles, Techniques, and Tools (2nd ed.). Prentice Hall. Google Scholar
Digital Library
- Christophe Alias. 2007. [email protected]. Available at http://compsys-tools.ens-lyon.fr/.Google Scholar
- Christophe Alias, Fabrice Baray, and Alain Darte. 2007. [email protected]: An implementation of lattice-based array contraction in the source-to-source translator Rose. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. 73--82. Google Scholar
Digital Library
- Vinayaka Bandishti, Irshad Pananilath, and Uday Bondhugula. 2012. Tiling stencil computations to maximize parallelism. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis. Article No. 40. Google Scholar
Digital Library
- U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. 2008. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Proceedings of the Joint European Conferences on Theory and Practice of Software 17th International Conference on Compiler Construction (CC’08/ETAPS’08). 132--146. Google Scholar
Digital Library
- Philippe Clauss, Federico Javier Fernandez, Diego Garbervetsky, and Sven Verdoolaege. 2009. Symbolic polynomial maximization over convex sets and its application to memory requirement estimation. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 17, 8, 983--996. Google Scholar
Digital Library
- Alain Darte, Robert Schreiber, and Gilles Villard. 2005. Lattice-based memory allocation. IEEE Transactions on Computing 54, 10, 1242--1257. Google Scholar
Digital Library
- Eddy de Greef, Francky Catthoor, and Hugo De Man. 1997. Memory size reduction through storage order optimization for embedded parallel multimedia applications. Parallel Computing 23, 12, 1811--1837. Google Scholar
Digital Library
- P. Feautrier. 1992. Some efficient solutions to the affine scheduling problem: Part I, one-dimensional time. International Journal of Parallel Programming 21, 5, 313--348. Google Scholar
Digital Library
- GNU. 2010. GLPK (GNU Linear Programming Kit). Retrieved February 27, 2016, from https://www.gnu. org/software/glpk/.Google Scholar
- Tobias Grosser, Albert Cohen, Justin Holewinski, Ponuswamy Sadayappan, and Sven Verdoolaege. 2014. Hybrid hexagonal/classical tiling for GPUs. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization. ACM, New York, NY, 66. Google Scholar
Digital Library
- Chris Harris and Mike Stephens. 1988. A combined corner and edge detector. In Proceedings of the 4th Alvey Vision Conference. 147--151.Google Scholar
Cross Ref
- Intel. 2013. Using Intel VTune Amplifier XE to Tune Software on the Intel Xeon Processor E5 Family. Retrieved February 27, 2016, from https://software.intel.com/en-us/articles/using-intel-vtune-amplifier-xe-to-tune-software-on-the-intel-xeon-processor-e5-family.Google Scholar
- Intel. 2015. Intel VTune Amplifier XE 2015 (build 367957). Retrieved December 20, 2015, from https://software.intel.com/en-us/intel-vtune-amplifier-xe.Google Scholar
- Vincent Lefebvre and Paul Feautrier. 1998. Automatic storage management for parallel programs. Parallel Computing 24, 3--4, 649--671. Google Scholar
Digital Library
- Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. 2015. PolyMage: Automatic optimization for image processing pipelines. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). Google Scholar
Digital Library
- Irshad Pananilath, Aravind Acharya, Vinay Vasista, and Uday Bondhugula. 2015. An optimizing code generator for a class of Lattice-Boltzmann computations. ACM Transactions on Architecture and Code Optimization 12, 2, Article No. 14. Google Scholar
Digital Library
- Pluto. 2008. PLUTO: An Automatic Polyhedral parallelizer and locality optimizer for multicores. Available at http://pluto-compiler.sourceforge.net.Google Scholar
- Fabien Quilleré and Sanjay V. Rajopadhye. 2000. Optimizing memory usage in the polyhedral model. ACM Transactions on Programming Languages and Systems 22, 5, 773--815. Google Scholar
Digital Library
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman P. Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). 519--530. Google Scholar
Digital Library
- Alexander Schrijver. 1986. Theory of Linear and Integer Programming. John Wiley & Sons. Google Scholar
Digital Library
- M. Strout, L. Carter, J. Ferrante, and B. Simon. 1998. Schedule-independent storage mapping for loops. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 24--33. Google Scholar
Digital Library
- S. Succi. 2001. The Lattice Boltzmann Equation: For Fluid Dynamics and Beyond. Oxford University Press.Google Scholar
- William Thies, Frédéric Vivien, and Saman Amarasinghe. 2007. A step towards unifying schedule and storage optimization. ACM Transactions on Programming Languages and Systems 29, 6, Article No. 34. DOI:http://dx.doi.org/10.1145/1286821.1286825 Google Scholar
Digital Library
- William Thies, Frédéric Vivien, Jeffrey Sheldon, and Saman P. Amarasinghe. 2001. A unified framework for schedule and storage optimization. In Proceedings of the ACM SIGPLAN Symposium on Programming Language Design and Implementation. 232--242. Google Scholar
Digital Library
- Sven Verdoolaege. 2010. isl: An integer set library for the polyhedral model. In Mathematical Software—ICMS 2010. Lecture Notes in Computer Science, Vol. 6327. Springer, 299--302. Google Scholar
Digital Library
- Doran Wilde and Sanjay V. Rajopadhye. 1996. Memory reuse analysis in the polyhedral model. In Proceedings of the 2nd International Euro-Par Conference on Parallel Processing. 389--397. Google Scholar
Digital Library
Recommendations
SMO: an integrated approach to intra-array and inter-array storage optimization
POPL '16The polyhedral model provides an expressive intermediate representation that is convenient for the analysis and subsequent transformation of affine loop nests. Several heuristics exist for achieving complex program transformations in this model. ...
SMO: an integrated approach to intra-array and inter-array storage optimization
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesThe polyhedral model provides an expressive intermediate representation that is convenient for the analysis and subsequent transformation of affine loop nests. Several heuristics exist for achieving complex program transformations in this model. ...
LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud ComputingKey-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...






Comments