skip to main content
research-article

Reducing memory requirements of resource-constrained applications

Published:22 April 2009Publication History
Skip Abstract Section

Abstract

Embedded computing platforms are often resource constrained, requiring great design and implementation attention to memory-power-, and heat-related parameters. An important task for a compiler in such platforms is to simplify the process of developing applications for limited memory devices and resource-constrained clients. Focusing on array-intensive embedded applications to be executed on single CPU-based architectures, this work explores how loop-based compiler optimizations can be used for increasing memory location reuse. Our goal is to transform a given application in such a way that the resulting code has fewer cases (as compared to the original code), where the lifetimes of array elements overlap. The reduction in lifetimes of array elements can then be exploited by reusing memory locations as much as possible. Our experimental results indicate that the proposed strategy reduces data space requirements of 15 resource constrained applications by more than 40%, on average. We also demonstrate how this strategy can be combined with data locality (cache behavior)--enhancing techniques so that a compiler can take advantage of both, that is, reduce data memory requirements and improve data locality at the same time.

References

  1. Ahmed, N., Mateev, N. and Pingali, K. 2000. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the International Conference on Supercomputing (ICS'00). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amarasinghe, S. P., Anderson, J. M., Lam, M. S., and Tseng, C. W. 1995. The SUIF compiler for scalable parallel machines. In Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing. Society for Industrial and Applied Mathematics, Philadelphia, PA.Google ScholarGoogle Scholar
  3. Ancourt, C., and Irigoin, F. 1991. Scanning polyhedra with DO loops. In Proceedings of the Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 39--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Barthou, D., Cohen, A., and Collard, J-F. 1998. Maximal static expansion. In Proceedings of the 25th Annual ACM Symposium on Principles of Programming Languages. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Briggs, P. 1992. Register allocation via graph coloring. Ph.D. thesis, Rice University, Houston, TX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Catthoor, F., Danckaert, K., Kulkarni, C., Brockmeyer, E., Kjeldsberg, P. G., Achteren, T. V., and Omnes, T. 2002. Data Access and Storage Management for Embedded Programmable Processors. Kluwer Academic Publishers, Berlin, Germany.Google ScholarGoogle Scholar
  7. Catthoor, F., Wuytack, S., Greef, E. D., Balasa, F., Nachtergaele, L., and Vandecappelle, A. 1998. Custom Memory Management Methodology: Exploration of Memory Organization for Embedded Multimedia System Design. Kluwer Academic Publishers, Berlin, Germany. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Darte, A., Schreiber, R., and Villard, G. 2003. Lattice based memory allocation. In Proceedings of the International Conference on Compilers, Architecture, and Embedded Systems (CASES'03). ACM, New York, 298--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fraboulet, A., Kodary, K., and Mignotte, A. 2001. Loop fusion for memory space optimization. In Proceedings of the 14th International Symposium on System Synthesis. IEEE, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Franke, B. and O'Boyle, M. F. P. 2001. Compiler transformation of pointers to explicit arrayaccesses in DSP applications. In Proceedings of the International Conference on Compiler Construction (CC'01). Springer, Berlin, Germany. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gannon, D., Jalby, W., and Gallivan, K. 1988. Strategies for cache and local memory management by global program transformations. J. Parall. Distrib. Comput., 5, 587--616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Grun, F. Balasa, and N. Dutt. 1998. Memory size estimation for multimedia applications. In Proceedings of CODES/CACHE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hall, M. W., Amarasinghe, S., Murphy, B., Liao, S., and Lam, M. 1995. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. In Proceedings of the IEEE/ACM Conference on Supercomputing. IEEE, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Irigoin, F. and Triolet, R. 1988. Supernode partitioning. In Proceedings of the 15th Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 319--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kandemir, M. 2001. A compiler technique for improving whole program locality. In Proceedings of the 28th Annual ACM Symposium on Principles of Programming Languages. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kennedy, K. and McKinley, K. S. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing. Springer, Berlin, Germany, 301--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Knobe, K. and Sarkar, V. 1998. Array SSA form and its use in parallelization. In Proceedings of the 15th Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 107--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kolcu, I. Personal communication.Google ScholarGoogle Scholar
  19. Lefebvre V. and Feautrier, P. 1997. Automatic storage management for parallel programs. Res. rep. PRiSM 97/8, France.Google ScholarGoogle Scholar
  20. Li, W. 1993. Compiling for NUMA parallel machines. Ph.D. Thesis, Computer Science Department, Cornell University, Ithaca, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Marchal, P., Gomez, J. I., Verdoolaege, S., Pinuel, L., and Catthoor, F. 2004. Optimizing the memory bandwidth with loop fusion. In Proceedings of the the 2nd IEEE/ACM/IFIP International Conference on Hardware-Software Codesign and System Synthesis. IEEE, Los Alamitos, CA, 188--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Maydan, D. E., Amarasinghe, S. P., and Lam, M. S. 1993. Array dataflow analysis and its use in array privatization. In Proceedings of the Annual ACM Symposium on Principles of Programming Languages. ACM, New York, 2--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. McKinley, K., Carr, S., and Tseng, C.-W. 1996. Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst., 8, 424--453. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. MediaBench. http://cares.icsl.ucla.edu/MediaBench/.Google ScholarGoogle Scholar
  25. MiBench. http://www.eecs.umich.edu/mibench/.Google ScholarGoogle Scholar
  26. MIPSpro Family of Compilers. http://www.sgi.com/developers/devtools/languages/mipspro.html.Google ScholarGoogle Scholar
  27. Pugh, W. and Wonnacott, D. 1993. An exact method for analysis of value-based array data dependences. In Proceedings of the 6th Workshop on Languages and Compilers for Parallel Computing, ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Song, Y., Xu, R., Wang, C., and Li, Z. 2001. Data locality enhancement by memory reduction. In Proceedings of the 15th ACM International Conference on Supercomputing. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Strout, M., Carter, L., Ferrante, J., and Simon, B. 1998. Schedule-independent storage mapping in loops. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Thies, W., Vivien, F., Sheldon, J., and Amarasinghe, S. 2001. A unified framework for schedule and storage optimization. In Proceedings of the ACM Conference on Programming Language Design and Implementation. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tu, P. and Padua, D. 1993. Automatic array privatization. In Proceedings of the 6th Workshop on Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, ACM, New York, 500--521. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Unnikrishnan, P., Chen, G., Kandemir, M., Karakoy, M., and Kolcu, I. 2003. Loop transformations for reducing data space requirements of resource-constrained applications. In Proceedings of the 10th Annual International Static Analysis Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Verdoolaege, S., Beyls, K., Bruynooghe, M., and Catthoor, F. 2005. Experiences with enumeration of integer projections of parametric polytops. In Proceedings of the 14th International Conference on Compiler Construction. Springer, Berlin, Germany, 91--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Verdoolaege, S., Seghir, R., Beyls, K., Loechner, V., and Bruynooghe, M. 2004. Analytical computation of Ehrhart polynomials: enabling more compiler analyses and optimizations. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'04). ACM, New York, 248--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wilde, D. and Rajopadhye, S. 1997. Memory reuse analysis in the polyhedral model. In Parallel Processing Letters. Springer-Verlag, Berlin, Germany. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Wolf, M. and Lam, M. 1991. A data locality optimizing algorithm. In Proceedings of the ACM Conference on Programming Language Design and Implementation. ACM, New York, 30--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wolfe, M. 1996. High Performance Compilers for Parallel Computing. Addison-Wesley, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhao, Y. and Malik, S. 1999. Exact memory size estimation for array computations without loop unrolling. In Proceedings of the ACM/IEEE Design Automation Conference. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zervas, N. D., Masselos, K., and Goutis, C. 1998. Code transformations for embedded multimedia applications: impact on power and performance. In Proceedings of the ISCA Power-Driven Microarchitecture Workshop. ACM, New York.Google ScholarGoogle Scholar

Index Terms

  1. Reducing memory requirements of resource-constrained applications

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!