skip to main content
research-article

Direct address translation for virtual memory in energy-efficient embedded systems

Published:04 January 2009Publication History
Skip Abstract Section

Abstract

This article presents a methodology for virtual memory support in energy-efficient embedded systems. A holistic approach is proposed, where the combined efforts of compiler, operating system, and hardware architecture achieve a significant system power reductions. The application information extracted and analyzed by the compiler is utilized dynamically by the microarchitecture and the operating system to perform energy-efficient and, for many memory references, time-deterministic address translations. We demonstrate that by using application information regarding virtual memory layout, an efficient and conflict-free translation process can be implemented through the utilization of a small hardware direct translation table (DTT) accessed in an application-specific manner. The set of virtual pages is partitioned into groups, such that for each group only a few of the least significant bits are used as an index to obtain the physical page number. We outline an efficient compile-time algorithm for identifying these groups and allocate their translation entries optimally into the DTT. The introduced hardware is minimal in terms of area, performance, and power overhead, while offering the flexibility of software programmability. This is achieved through a small set of registers and tables, which are made software accessible. We have quantitatively evaluated the proposed methodology on a number of embedded applications, including voice, image, and video processing.

References

  1. Absar, M. J. and Catthoor, F. 2005. Compiler-based approach for exploiting scratch-pad in presence of irregular array access. In Proceedings of the Conference on Design: Automation and Test in Europe (DATE'05). IEEE Computer Society, Los Alamitos, CA. 1162--1167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ARM Ltd. ARM920T Technical Reference Manual. ARM Ltd.Google ScholarGoogle Scholar
  3. Austin, T., Larson, E., and Ernst, D. 2002. Simplescalar: An infrastructure for computer system modeling. IEEE Computer 35, 2, 59--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Baase, S. and Gelder, A. 2000. Computer Algorithms. Addison-Wesley, Boston, MA.Google ScholarGoogle Scholar
  5. Ballapuram, C., Lee, H., and Prvulovic, M. 2005. Synonymous address compaction for energy reduction in data tlb. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'05). IEEE, Los Alamitos, CA, 357--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Banakr, R., Steinke, S., Lee, B., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES'02). ACM, New York, 73--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cekleov, M. and Dubois, M. 1997. Virtual-address caches. Part 1: problems and solutions in uniprocessors. IEEE Micro 17, 5, 64--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chiodo, M., Giusto, P., Jurecska, A., Hsieh, H., Sangiovanni-Vincentelli, A., and Lavagno, L. 1994. Hardware-software codesign of embedded systems. IEEE Micro 14, 4, 26--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Denning, P. 1996. Virtual memory. ACM Comp. Surv. 28, 1, 213--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ekman, M., Dahlgren, F., and Stenstrom, P. 2002. Tlb and snoop energy-reduction using virtual caches in low-power chip-microprocessors. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'02). IEEE, Los Alamitos, CA, 243--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fan, D., Tang, Z., Huang, H., and Gao, G. 2005. An energy efficient tlb design methodology. In Proccedings of the International Symposium on Low Power Electronics and Design (ISLPED'05). IEEE, Los Alamitos, CA, 351--356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Flautner, K., Kim, N., Martin, S., Blaauw, D., and Mudge, T. 2002. Drowsy caches: simple techniques for reducing leakage power. In Proceedings of the International Symposium on Computer Architecture (ISCA'02). IEEE, Los Alamitos, CA, 148--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Francesco, P., Marchal, P., Atienza, D., Benini, L., Catthoor, F., and Mendias, J. M. 2004. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the 41st Annual Conference on Design Automation (DAC'04). ACM Press, New York, 238--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Furber, S. B. 2000. ARM System-on-Chip Architecture. Addison-Wesley Publishing Co, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Givargis, T. 2006. Zero cost indexing for improved embedded processor cache performance. ACM Trans. Des. Autom. Electron. Syst. 11, 1, 3--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Guthaus, M., Ringenberg, J. S., Ernst, D., Austin, T., Mudge, T., and Brown, R. 2001. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the 4th Annual Workshop on Workload Characterization. IEEE, Los Alamitos, CA, 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Heckmann, R., Langenbach, M., Thesing, S., and Wilhelm, R. 2003. The influence of processor architecture on the design and the results of wcet tools. IEEE Proc. 91, 7, 1038--1054.Google ScholarGoogle ScholarCross RefCross Ref
  18. Hu, J. S., Nadgir, A., Vijaykrishnan, N., Irwin, M. J., and Kandemir, M. 2003. Exploiting program hotspots and code sequentiality for instruction cache leakage management. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'03). IEEE, Los Alamitos, CA, 402--407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Intel Corporation. Intel XScale microarchitecture. Intel Corporation.Google ScholarGoogle Scholar
  20. Jacob, B. and Mudge, T. 1998. Virtual memory: issues of implementation. IEEE Computer 31, 6 (June), 33--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Juan, T., Lang, T., and Navarro, J. J. 1997. Reducing tlb power requirements. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'97). IEEE, Los Alamitos, CA, 196--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kadayif, I., Sivasubramaniam, A., Kandemir, M., Kandiraju, G., and Chen, G. 2002. Generating physical addresses directly for saving instruction tlb energy. In Proceedings of the 35th Annual International Symposium on Microarchitecture (MICRO-35). IEEE, Los Alamitos, CA, 185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kandemir, M., Kadayif, I., and Chen, G. 2004. Compiler-directed code restructuring for reducing data tlb energy. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 98--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kandemir, M., Ramanujam, J., Irwin, M., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2004. A compiler-based approach for dynamically managing scratch-pad memories in embedded systems. IEEE Trans Comput.-Aid. Design Integr. Circ. Syst. 23, 2, 243--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kirner, R. and Puschner, P. 2001. Transformation of path information for wcet analysis during compilation. In Proceedings of the 13th Euromicro Conference on Real-Time Systems (ECRTS). IEEE, Los Alamitos, CA, 29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings from the 13th Annual International Symposium on Microarchitecture (MICRO-13). IEEE, Los Alamitos, CA, 330--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Lee, H. and Ballapuram, C. 2003. Energy efficient d-tlb and data cache using semantic-aware multilateral partitioning. In Proceedings from the International Symposium on Low Power Electronics and Design (ISLPED'03). 306--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lee, J. H., Lee, J. S., Jeong, S., and Kim, S. 2001. A banked-promotion tlb for high performance and low power. In Proceedings of the 19th International Conference on Computer Design (ICCD'01). IEEE, Los Alamitos, CA, 118--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Martello, S. and Toth, P. 1990. Knapsack Problems: Algorithms and Computer Implementations. John Wiley & Sons Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Merten, M., Trick, A., and Barnes, R. 2001. An architectural framework for runtime optimization. IEEE Trans. Comput. 50, 6, 567--589. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Montanaro, J., Witek, R., Anne, K., Black, A., Cooper, E., Dobberpuhl, D., Donahue, P., Eno, J., Farell, A., Hoeppner, G., et al. 1996. A 160mhz, 32b 0.5w cmos risc microprocessor. In Proceedings of the International Symposium on Computers and Communication (ISCC'96). IEEE, Los Alamitos, CA, 214--229.Google ScholarGoogle Scholar
  32. Qiu, X. and Dubois, M. 2001. Towards virtually-addressed memory hierarchies. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA'01). IEEE, Los Alamitos, CA, 51--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sherwood, T., Perelman, E., Sair, G. H. S., and Calder, B. 2003. Discovering and exploiting program phases. IEEE Micro 23, 6, 84--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Shivakumar, P. and Jouppi, N. 2001. Cacti 3.0: An integrated cache timing, power and area model. Tech. rep., Western Research Lab.Google ScholarGoogle Scholar
  35. Simpson, M., Middha, B., and Barua, R. 2005. Segment protection for embedded systems using run-time checks. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'05). 66--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Stojanovic, V. and Oklobdzija, V. 1999. Comparative analysis of master-slave latches and flip-flops for high-performance and low-power systems. IEEE J. Solid-State Circ. 34, 4, 536--548.Google ScholarGoogle ScholarCross RefCross Ref
  37. Udayakumaran, S. and Barua, R. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings from the ACM International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Verma, M., Wehmeyer, L., and Marwedel, P. 2004. Dynamic overlay of scratchpad memory for energy minimization. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 104--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zhang, C. 2006. Balanced cache: Reducing conflict misses of direct-mapped caches. In Proceedings of the International Symposium on Computer Architecture (ISCA'06). IEEE, Los Alamitos, CA, 155--166. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Direct address translation for virtual memory in energy-efficient embedded systems

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!