Abstract
This article presents a methodology for virtual memory support in energy-efficient embedded systems. A holistic approach is proposed, where the combined efforts of compiler, operating system, and hardware architecture achieve a significant system power reductions. The application information extracted and analyzed by the compiler is utilized dynamically by the microarchitecture and the operating system to perform energy-efficient and, for many memory references, time-deterministic address translations. We demonstrate that by using application information regarding virtual memory layout, an efficient and conflict-free translation process can be implemented through the utilization of a small hardware direct translation table (DTT) accessed in an application-specific manner. The set of virtual pages is partitioned into groups, such that for each group only a few of the least significant bits are used as an index to obtain the physical page number. We outline an efficient compile-time algorithm for identifying these groups and allocate their translation entries optimally into the DTT. The introduced hardware is minimal in terms of area, performance, and power overhead, while offering the flexibility of software programmability. This is achieved through a small set of registers and tables, which are made software accessible. We have quantitatively evaluated the proposed methodology on a number of embedded applications, including voice, image, and video processing.
- Absar, M. J. and Catthoor, F. 2005. Compiler-based approach for exploiting scratch-pad in presence of irregular array access. In Proceedings of the Conference on Design: Automation and Test in Europe (DATE'05). IEEE Computer Society, Los Alamitos, CA. 1162--1167. Google Scholar
Digital Library
- ARM Ltd. ARM920T Technical Reference Manual. ARM Ltd.Google Scholar
- Austin, T., Larson, E., and Ernst, D. 2002. Simplescalar: An infrastructure for computer system modeling. IEEE Computer 35, 2, 59--67. Google Scholar
Digital Library
- Baase, S. and Gelder, A. 2000. Computer Algorithms. Addison-Wesley, Boston, MA.Google Scholar
- Ballapuram, C., Lee, H., and Prvulovic, M. 2005. Synonymous address compaction for energy reduction in data tlb. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'05). IEEE, Los Alamitos, CA, 357--362. Google Scholar
Digital Library
- Banakr, R., Steinke, S., Lee, B., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES'02). ACM, New York, 73--78. Google Scholar
Digital Library
- Cekleov, M. and Dubois, M. 1997. Virtual-address caches. Part 1: problems and solutions in uniprocessors. IEEE Micro 17, 5, 64--71. Google Scholar
Digital Library
- Chiodo, M., Giusto, P., Jurecska, A., Hsieh, H., Sangiovanni-Vincentelli, A., and Lavagno, L. 1994. Hardware-software codesign of embedded systems. IEEE Micro 14, 4, 26--36. Google Scholar
Digital Library
- Denning, P. 1996. Virtual memory. ACM Comp. Surv. 28, 1, 213--216. Google Scholar
Digital Library
- Ekman, M., Dahlgren, F., and Stenstrom, P. 2002. Tlb and snoop energy-reduction using virtual caches in low-power chip-microprocessors. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'02). IEEE, Los Alamitos, CA, 243--246. Google Scholar
Digital Library
- Fan, D., Tang, Z., Huang, H., and Gao, G. 2005. An energy efficient tlb design methodology. In Proccedings of the International Symposium on Low Power Electronics and Design (ISLPED'05). IEEE, Los Alamitos, CA, 351--356. Google Scholar
Digital Library
- Flautner, K., Kim, N., Martin, S., Blaauw, D., and Mudge, T. 2002. Drowsy caches: simple techniques for reducing leakage power. In Proceedings of the International Symposium on Computer Architecture (ISCA'02). IEEE, Los Alamitos, CA, 148--157. Google Scholar
Digital Library
- Francesco, P., Marchal, P., Atienza, D., Benini, L., Catthoor, F., and Mendias, J. M. 2004. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the 41st Annual Conference on Design Automation (DAC'04). ACM Press, New York, 238--243. Google Scholar
Digital Library
- Furber, S. B. 2000. ARM System-on-Chip Architecture. Addison-Wesley Publishing Co, Boston, MA. Google Scholar
Digital Library
- Givargis, T. 2006. Zero cost indexing for improved embedded processor cache performance. ACM Trans. Des. Autom. Electron. Syst. 11, 1, 3--25. Google Scholar
Digital Library
- Guthaus, M., Ringenberg, J. S., Ernst, D., Austin, T., Mudge, T., and Brown, R. 2001. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the 4th Annual Workshop on Workload Characterization. IEEE, Los Alamitos, CA, 3--14. Google Scholar
Digital Library
- Heckmann, R., Langenbach, M., Thesing, S., and Wilhelm, R. 2003. The influence of processor architecture on the design and the results of wcet tools. IEEE Proc. 91, 7, 1038--1054.Google Scholar
Cross Ref
- Hu, J. S., Nadgir, A., Vijaykrishnan, N., Irwin, M. J., and Kandemir, M. 2003. Exploiting program hotspots and code sequentiality for instruction cache leakage management. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'03). IEEE, Los Alamitos, CA, 402--407. Google Scholar
Digital Library
- Intel Corporation. Intel XScale microarchitecture. Intel Corporation.Google Scholar
- Jacob, B. and Mudge, T. 1998. Virtual memory: issues of implementation. IEEE Computer 31, 6 (June), 33--43. Google Scholar
Digital Library
- Juan, T., Lang, T., and Navarro, J. J. 1997. Reducing tlb power requirements. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'97). IEEE, Los Alamitos, CA, 196--201. Google Scholar
Digital Library
- Kadayif, I., Sivasubramaniam, A., Kandemir, M., Kandiraju, G., and Chen, G. 2002. Generating physical addresses directly for saving instruction tlb energy. In Proceedings of the 35th Annual International Symposium on Microarchitecture (MICRO-35). IEEE, Los Alamitos, CA, 185. Google Scholar
Digital Library
- Kandemir, M., Kadayif, I., and Chen, G. 2004. Compiler-directed code restructuring for reducing data tlb energy. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 98--103. Google Scholar
Digital Library
- Kandemir, M., Ramanujam, J., Irwin, M., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2004. A compiler-based approach for dynamically managing scratch-pad memories in embedded systems. IEEE Trans Comput.-Aid. Design Integr. Circ. Syst. 23, 2, 243--260. Google Scholar
Digital Library
- Kirner, R. and Puschner, P. 2001. Transformation of path information for wcet analysis during compilation. In Proceedings of the 13th Euromicro Conference on Real-Time Systems (ECRTS). IEEE, Los Alamitos, CA, 29. Google Scholar
Digital Library
- Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings from the 13th Annual International Symposium on Microarchitecture (MICRO-13). IEEE, Los Alamitos, CA, 330--335. Google Scholar
Digital Library
- Lee, H. and Ballapuram, C. 2003. Energy efficient d-tlb and data cache using semantic-aware multilateral partitioning. In Proceedings from the International Symposium on Low Power Electronics and Design (ISLPED'03). 306--311. Google Scholar
Digital Library
- Lee, J. H., Lee, J. S., Jeong, S., and Kim, S. 2001. A banked-promotion tlb for high performance and low power. In Proceedings of the 19th International Conference on Computer Design (ICCD'01). IEEE, Los Alamitos, CA, 118--123. Google Scholar
Digital Library
- Martello, S. and Toth, P. 1990. Knapsack Problems: Algorithms and Computer Implementations. John Wiley & Sons Inc. Google Scholar
Digital Library
- Merten, M., Trick, A., and Barnes, R. 2001. An architectural framework for runtime optimization. IEEE Trans. Comput. 50, 6, 567--589. Google Scholar
Digital Library
- Montanaro, J., Witek, R., Anne, K., Black, A., Cooper, E., Dobberpuhl, D., Donahue, P., Eno, J., Farell, A., Hoeppner, G., et al. 1996. A 160mhz, 32b 0.5w cmos risc microprocessor. In Proceedings of the International Symposium on Computers and Communication (ISCC'96). IEEE, Los Alamitos, CA, 214--229.Google Scholar
- Qiu, X. and Dubois, M. 2001. Towards virtually-addressed memory hierarchies. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA'01). IEEE, Los Alamitos, CA, 51--62. Google Scholar
Digital Library
- Sherwood, T., Perelman, E., Sair, G. H. S., and Calder, B. 2003. Discovering and exploiting program phases. IEEE Micro 23, 6, 84--93. Google Scholar
Digital Library
- Shivakumar, P. and Jouppi, N. 2001. Cacti 3.0: An integrated cache timing, power and area model. Tech. rep., Western Research Lab.Google Scholar
- Simpson, M., Middha, B., and Barua, R. 2005. Segment protection for embedded systems using run-time checks. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'05). 66--77. Google Scholar
Digital Library
- Stojanovic, V. and Oklobdzija, V. 1999. Comparative analysis of master-slave latches and flip-flops for high-performance and low-power systems. IEEE J. Solid-State Circ. 34, 4, 536--548.Google Scholar
Cross Ref
- Udayakumaran, S. and Barua, R. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings from the ACM International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM, New York. Google Scholar
Digital Library
- Verma, M., Wehmeyer, L., and Marwedel, P. 2004. Dynamic overlay of scratchpad memory for energy minimization. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 104--109. Google Scholar
Digital Library
- Zhang, C. 2006. Balanced cache: Reducing conflict misses of direct-mapped caches. In Proceedings of the International Symposium on Computer Architecture (ISCA'06). IEEE, Los Alamitos, CA, 155--166. Google Scholar
Digital Library
Index Terms
Direct address translation for virtual memory in energy-efficient embedded systems
Recommendations
Exploring Configurable Non-Volatile Memory-based Caches for Energy-Efficient Embedded Systems
GLSVLSI '16: Proceedings of the 26th edition on Great Lakes Symposium on VLSINon-volatile memory (NVM) technologies have recently emerged as alternatives to traditional SRAM-based cache memories, since NVMs offer advantages such as non-volatility, low leakage power, fast read speed, and high density. However, NVMs also have ...
Energy-aware flash memory management in virtual memory system
The traditional virtual memory system is designed for decades assuming a magnetic disk as the secondary storage. Recently, flash memory becomes a popular storage alternative for many portable devices with the continuing improvements on its capacity, ...
An efficient garbage collection for flash memory-based virtual memory systems
As more consumer electronics adopt monolithic kernels, NAND flash memory is used for the swap space in virtual memory systems. While flash memory has the advantages of low-power consumption, shock-resistance and non-volatility, it requires garbage ...






Comments