Abstract
Embedded computing platforms are often resource constrained, requiring great design and implementation attention to memory-power-, and heat-related parameters. An important task for a compiler in such platforms is to simplify the process of developing applications for limited memory devices and resource-constrained clients. Focusing on array-intensive embedded applications to be executed on single CPU-based architectures, this work explores how loop-based compiler optimizations can be used for increasing memory location reuse. Our goal is to transform a given application in such a way that the resulting code has fewer cases (as compared to the original code), where the lifetimes of array elements overlap. The reduction in lifetimes of array elements can then be exploited by reusing memory locations as much as possible. Our experimental results indicate that the proposed strategy reduces data space requirements of 15 resource constrained applications by more than 40%, on average. We also demonstrate how this strategy can be combined with data locality (cache behavior)--enhancing techniques so that a compiler can take advantage of both, that is, reduce data memory requirements and improve data locality at the same time.
- Ahmed, N., Mateev, N. and Pingali, K. 2000. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the International Conference on Supercomputing (ICS'00). ACM, New York. Google Scholar
Digital Library
- Amarasinghe, S. P., Anderson, J. M., Lam, M. S., and Tseng, C. W. 1995. The SUIF compiler for scalable parallel machines. In Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing. Society for Industrial and Applied Mathematics, Philadelphia, PA.Google Scholar
- Ancourt, C., and Irigoin, F. 1991. Scanning polyhedra with DO loops. In Proceedings of the Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 39--50. Google Scholar
Digital Library
- Barthou, D., Cohen, A., and Collard, J-F. 1998. Maximal static expansion. In Proceedings of the 25th Annual ACM Symposium on Principles of Programming Languages. ACM, New York. Google Scholar
Digital Library
- Briggs, P. 1992. Register allocation via graph coloring. Ph.D. thesis, Rice University, Houston, TX. Google Scholar
Digital Library
- Catthoor, F., Danckaert, K., Kulkarni, C., Brockmeyer, E., Kjeldsberg, P. G., Achteren, T. V., and Omnes, T. 2002. Data Access and Storage Management for Embedded Programmable Processors. Kluwer Academic Publishers, Berlin, Germany.Google Scholar
- Catthoor, F., Wuytack, S., Greef, E. D., Balasa, F., Nachtergaele, L., and Vandecappelle, A. 1998. Custom Memory Management Methodology: Exploration of Memory Organization for Embedded Multimedia System Design. Kluwer Academic Publishers, Berlin, Germany. Google Scholar
Digital Library
- Darte, A., Schreiber, R., and Villard, G. 2003. Lattice based memory allocation. In Proceedings of the International Conference on Compilers, Architecture, and Embedded Systems (CASES'03). ACM, New York, 298--308. Google Scholar
Digital Library
- Fraboulet, A., Kodary, K., and Mignotte, A. 2001. Loop fusion for memory space optimization. In Proceedings of the 14th International Symposium on System Synthesis. IEEE, Los Alamitos, CA. Google Scholar
Digital Library
- Franke, B. and O'Boyle, M. F. P. 2001. Compiler transformation of pointers to explicit arrayaccesses in DSP applications. In Proceedings of the International Conference on Compiler Construction (CC'01). Springer, Berlin, Germany. Google Scholar
Digital Library
- Gannon, D., Jalby, W., and Gallivan, K. 1988. Strategies for cache and local memory management by global program transformations. J. Parall. Distrib. Comput., 5, 587--616. Google Scholar
Digital Library
- P. Grun, F. Balasa, and N. Dutt. 1998. Memory size estimation for multimedia applications. In Proceedings of CODES/CACHE. Google Scholar
Digital Library
- Hall, M. W., Amarasinghe, S., Murphy, B., Liao, S., and Lam, M. 1995. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. In Proceedings of the IEEE/ACM Conference on Supercomputing. IEEE, Los Alamitos, CA. Google Scholar
Digital Library
- Irigoin, F. and Triolet, R. 1988. Supernode partitioning. In Proceedings of the 15th Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 319--329. Google Scholar
Digital Library
- Kandemir, M. 2001. A compiler technique for improving whole program locality. In Proceedings of the 28th Annual ACM Symposium on Principles of Programming Languages. ACM, New York. Google Scholar
Digital Library
- Kennedy, K. and McKinley, K. S. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing. Springer, Berlin, Germany, 301--321. Google Scholar
Digital Library
- Knobe, K. and Sarkar, V. 1998. Array SSA form and its use in parallelization. In Proceedings of the 15th Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 107--120. Google Scholar
Digital Library
- Kolcu, I. Personal communication.Google Scholar
- Lefebvre V. and Feautrier, P. 1997. Automatic storage management for parallel programs. Res. rep. PRiSM 97/8, France.Google Scholar
- Li, W. 1993. Compiling for NUMA parallel machines. Ph.D. Thesis, Computer Science Department, Cornell University, Ithaca, New York. Google Scholar
Digital Library
- Marchal, P., Gomez, J. I., Verdoolaege, S., Pinuel, L., and Catthoor, F. 2004. Optimizing the memory bandwidth with loop fusion. In Proceedings of the the 2nd IEEE/ACM/IFIP International Conference on Hardware-Software Codesign and System Synthesis. IEEE, Los Alamitos, CA, 188--193. Google Scholar
Digital Library
- Maydan, D. E., Amarasinghe, S. P., and Lam, M. S. 1993. Array dataflow analysis and its use in array privatization. In Proceedings of the Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 2--15. Google Scholar
Digital Library
- McKinley, K., Carr, S., and Tseng, C.-W. 1996. Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst., 8, 424--453. Google Scholar
Digital Library
- MediaBench. http://cares.icsl.ucla.edu/MediaBench/.Google Scholar
- MiBench. http://www.eecs.umich.edu/mibench/.Google Scholar
- MIPSpro Family of Compilers. http://www.sgi.com/developers/devtools/languages/mipspro.html.Google Scholar
- Pugh, W. and Wonnacott, D. 1993. An exact method for analysis of value-based array data dependences. In Proceedings of the 6th Workshop on Languages and Compilers for Parallel Computing, ACM, New York. Google Scholar
Digital Library
- Song, Y., Xu, R., Wang, C., and Li, Z. 2001. Data locality enhancement by memory reduction. In Proceedings of the 15th ACM International Conference on Supercomputing. ACM, New York. Google Scholar
Digital Library
- Strout, M., Carter, L., Ferrante, J., and Simon, B. 1998. Schedule-independent storage mapping in loops. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York. Google Scholar
Digital Library
- Thies, W., Vivien, F., Sheldon, J., and Amarasinghe, S. 2001. A unified framework for schedule and storage optimization. In Proceedings of the ACM Conference on Programming Language Design and Implementation. ACM, New York. Google Scholar
Digital Library
- Tu, P. and Padua, D. 1993. Automatic array privatization. In Proceedings of the 6th Workshop on Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, ACM, New York, 500--521. Google Scholar
Digital Library
- Unnikrishnan, P., Chen, G., Kandemir, M., Karakoy, M., and Kolcu, I. 2003. Loop transformations for reducing data space requirements of resource-constrained applications. In Proceedings of the 10th Annual International Static Analysis Symposium. Google Scholar
Digital Library
- Verdoolaege, S., Beyls, K., Bruynooghe, M., and Catthoor, F. 2005. Experiences with enumeration of integer projections of parametric polytops. In Proceedings of the 14th International Conference on Compiler Construction. Springer, Berlin, Germany, 91--105. Google Scholar
Digital Library
- Verdoolaege, S., Seghir, R., Beyls, K., Loechner, V., and Bruynooghe, M. 2004. Analytical computation of Ehrhart polynomials: enabling more compiler analyses and optimizations. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'04). ACM, New York, 248--258. Google Scholar
Digital Library
- Wilde, D. and Rajopadhye, S. 1997. Memory reuse analysis in the polyhedral model. In Parallel Processing Letters. Springer-Verlag, Berlin, Germany. Google Scholar
Digital Library
- Wolf, M. and Lam, M. 1991. A data locality optimizing algorithm. In Proceedings of the ACM Conference on Programming Language Design and Implementation. ACM, New York, 30--44. Google Scholar
Digital Library
- Wolfe, M. 1996. High Performance Compilers for Parallel Computing. Addison-Wesley, New York. Google Scholar
Digital Library
- Zhao, Y. and Malik, S. 1999. Exact memory size estimation for array computations without loop unrolling. In Proceedings of the ACM/IEEE Design Automation Conference. ACM, New York. Google Scholar
Digital Library
- Zervas, N. D., Masselos, K., and Goutis, C. 1998. Code transformations for embedded multimedia applications: impact on power and performance. In Proceedings of the ISCA Power-Driven Microarchitecture Workshop. ACM, New York.Google Scholar
Index Terms
Reducing memory requirements of resource-constrained applications
Recommendations
A Novel Memory Block Management Scheme for PCM Using WOM-Code
HPCC-CSS-ICESS '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and SystemsPhase Change Memory (PCM) is a promising DRAM replacement in embedded systems due to its attractive characteristics including low static power consumption and high density. However, long write latency is one of the major drawbacks in current PCM ...
A compiler optimization to reduce execution time of loop nest
In this paper, a compiler optimization to reduce the execution time of loop nest is proposed. Loop tiling is used to optimize loop nest. Loop tiling is the well-known optimization for improving locality. However, it has a count result that increases the ...
Accurate age counter for wear leveling on non-volatile based main memory
Limited lifetime has been a key challenge in development of emerging non-volatile memories (NVM). Age counter based wear leveling is the most effective approach in the extension of their lifetime. The age counters in these approaches are determined by ...






Comments