Abstract
Today multicore platforms are already prevalent solutions for modern embedded systems. In the future, embedded platforms will have an even more increased processor core count, composing many-core platforms. In addition, applications are becoming more complex and dynamic and try to efficiently utilize the amount of available resources on the embedded platforms. Efficient memory utilization is a key challenge for application developers, especially since memory is a scarce resource and often becomes the system's bottleneck. To cope with this dynamism and achieve better memory footprint utilization (low memory fragmentation) application developers resort to the usage of dynamic memory (heap) management techniques, by allocating and deallocating data at runtime. Moreover, overall power consumption is another key challenge that needs to be taken into consideration. Towards this, designers employ the usage of Dynamic Voltage and Frequency Scaling (DVFS) mechanisms, adapting to the application's computational demands at runtime. In this article, we propose the combination of dynamic memory management techniques with DVFS ones. This is performed by integrating, within the memory manager, runtime monitoring mechanisms that steer the DVFS mechanisms to adjust clock frequency and voltage supply based on heap performance. The proposed approach has been evaluated on a distributed shared-memory many-core platform composed of multiple LEON3 processors interconnected by a Network-on-Chip infrastructure, supporting DVFS. Experimental results show that by using the proposed method for monitoring and applying DVFS mechanisms the power consumption concerning dynamic memory management was reduced by approximately 37%. In addition we present the trade-offs the proposed approach. Last, by combining the developed method with heap fragmentation-aware dynamic memory managers, we achieve low heap fragmentation values combined with low power consumption.
- Aeroflex Gaisler. 2012. Leon3 processor. online.Google Scholar
- Agarwala, S., Rajagopal, A., et al. 2007. A 65nm c64x+ multi-core dsp platform for communications infrastructure. In Proceedings of the IEEE International Solid-State Circuits Conference. 262--601.Google Scholar
- Anagnostopoulos, I., Xydis, S., Bartzas, A., Lu, Z., Soudris, D., and Jantsch, A. 2011. Custom microcoded dynamic memory management for distributed on-chip memory organizations. IEEE Embedded Sys. Lett. 3, 2, 66--69. Google Scholar
Digital Library
- Beigné, E., Clermidy, F., Miermont, S., and Vivet, P. 2008. Dynamic voltage and frequency scaling architecture for units integration within a GALS NoC. In Proceedings of the 2nd ACM/IEEE International Symposium on Networks-on-Chip. IEEE, 129--138. Google Scholar
Digital Library
- Berger, E. D., McKinley, K. S., Blumofe, R. D., and Wilson, P. R. 2000. Hoard: A scalable memory allocator for multithreaded applications. SIGPLAN Not. 35, 11. Google Scholar
Digital Library
- Bhatti, M., Belleudi, C., and Auguin, M. 2010. An inter-task real time DVFS scheme for multiprocessor embedded systems. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing. 136--143.Google Scholar
- Borkar, S. 2007. Thousand core chips: A technology perspective. In Proceedings of the IEEE/ACM Design Automation Conference. 746--749. Google Scholar
Digital Library
- Chabloz, J.-M. and Hemani, A. 2009. A flexible communication scheme for rationally-related clock frequencies. In Proceedings of the IEEE International Conference on Computer Design. IEEE. 109--116. Google Scholar
Digital Library
- Chabloz, J.-M. and Hemani, A. 2010a. Distributed dvfs using rationally-related frequencies and discrete voltage levels. In Proceedings of the International Symposium on Low-Power Electronics and Design. ACM, 247--252. Google Scholar
Digital Library
- Chabloz, J.-M. and Hemani, A. 2010b. Lowering the latency of interfaces for rationally-related frequencies. In Proceedings of the IEEE International Conference on Computer Design. 23--30.Google Scholar
- Chabloz, J.-M. and Hemani, A. 2012. Power Management Architecture in McNoC. Springer, 55.Google Scholar
- Chang, J. M. and Gehringer, E. F. 1996. A high-performance memory allocator for object-oriented systems. IEEE Trans. Comput. 45, 3, 357--366. Google Scholar
Digital Library
- Chapiro, D. M. 1985. Globally-asynchronous locally-synchronous systems (performance, reliability, digital). Ph.D. thesis. AAI8506166. Google Scholar
Digital Library
- Chen, X., Lu, Z., Jantsch, A., and Chen, S. 2010. Supporting distributed shared memory on multi-core network-on-chips using a dual microcoded controller. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe. 39--44. Google Scholar
Digital Library
- Dean, J. and Ghemawat, S. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1, 107--113. Google Scholar
Digital Library
- Gutnik, V. and Chandrakasan, A. P. 1997. Embedded power supply for low-power dsp. IEEE Trans. Very Large Scale Integr. Syst. 5, 425--435. Google Scholar
Digital Library
- Herbert, S. and Marculescu, D. 2007. Analysis of dynamic voltage/frequency scaling in chipmultiprocessors. In Proceedings of the International Symposium on Low-Power Electronics and Design. ACM, 38--43. Google Scholar
Digital Library
- Hirata, K. and Goodacre, J. 2007. ARM MPCore; The streamlined and scalable ARM11 processor core. In Proceedings of the Asia and South Pacific Design Automation Conference. IEEE, 747--748. Google Scholar
Digital Library
- Horowitz, M., Indermaur, T., and Gonzalez, R. 1994. Low-power digital design. In Proceedings of the IEEE Symposium on Low Power Electronics. 8--11.Google Scholar
- Iyengar, A. K. 1993. Parallel dynamic storage allocation algorithms. In Proceedings of the 5th IEEE Symposium on Parallel and Distributed Processing. Google Scholar
Digital Library
- Larson, P. and Krishnan, M. 1998. Memory allocation for long-running server applications. In Proceedings of the International Symposium on Memory Management. Google Scholar
Digital Library
- Lea, D. 2007. A memory allocator. online, http://gee.cs.oswego.edu/dl/html/malloc.Google Scholar
- Mamagkakis, S., Atienza, D., Poucet, C., Catthoor, F., and Soudris, D. 2006. Energy-efficient dynamic memory allocators at the middleware level of embedded systems. In Proceedings of the ACM & IEEE International Conference on Embedded Software. ACM, 215--222. Google Scholar
Digital Library
- Mendias, J. M., Mamagkakis, S., Soudris, D., and Catthoor, F. 2006. Systematic dynamic memory management design methodology for reduced memory footprint. ACM Trans. Des. Autom. Electron. Syst. 11, 2, 465--489. Google Scholar
Digital Library
- Monchiero, M., Palermo, G., Silvano, C., and Villa, O. 2007. Exploration of distributed shared memory architectures for NoC-based multiprocessors. J. Syst. Archit. 53, 10, 719--732. Google Scholar
Digital Library
- Sakurai, T. and Newton, A. 1990. Alpha-power law mosfet model and its applications to cmos inverter delay and other formulas. IEEE J. Solid-State Circ. 25, 2, 584--594.Google Scholar
Cross Ref
- Shalan, M. and Mooney, V. J. 2002. Hardware support for real-time embedded multiprocessor system-on-a-chip memory management. In Proceedings of the International Workshop on Hardware/Software Codesign. ACM, 79--84. Google Scholar
Digital Library
- Shin, Y., Choi, K., and Sakurai, T. 2000. Power optimization of real-time embedded systems on variable speed processors. In Proceedings of the IEEE International Conference on Computer-Aided Design. IEEE, 365--368. Google Scholar
Digital Library
- SIA. 2011. International Technology Roadmap for Semiconductors. Semiconductor Industry Association.Google Scholar
- Talbot, J., Yoo, R. M., and Kozyrakis, C. 2011. Phoenix++: Modular MapReduce for shared-memory systems. In Proceedings of the 2nd International Workshop on MapReduce. ACM, 9--16. Google Scholar
Digital Library
- Teehan, P., Greenstreet, M., and Lemieux, G. 2007. A survey and taxonomy of GALS design styles. IEEE Des. Test 24, 418--428. Google Scholar
Digital Library
- Tran, A. T., Truong, D. N., and Baas, B. M. 2009. A GALS many-core heterogeneous DSP platform with source-synchronous on-chip interconnection network. In Proceedings of the 3rd ACM/IEEE International Symposium on Networks-on-Chip. IEEE, 214--223. Google Scholar
Digital Library
- Vee, V.-Y. and Hsu, W.-J. 1999. A scalable and efficient storage allocator on shared memory multiprocessors. In Proceedings of the International Symposium on Pervasive Systems, Algorithms, and Networks. 230--235. Google Scholar
Digital Library
- Vo, K. P. 1996. Vmalloc: A general and efficient memory allocator. Softw. Pract. Exper. 26, 1--18.Google Scholar
Cross Ref
- Wilson, P., Johnstone, M. S., Neely, M., and Boles, D. 1995. Dynamic storage allocation: A survey and critical review. In Memory Management, Lecture Notes in Computer Science, vol. 986. Springer, 1--116. Google Scholar
Digital Library
- Xydis, S., Bartzas, A., Anagnostopoulos, I., Soudris, D., and Pekmestzi, K. 2010. Custom mutli-threaded dynamic memory management for multiprocessor system-on-chip platforms. In Proceedings of the International Conference on Embedded Computer Systems. 102--109.Google Scholar
- Yoo, R. M., Roamno, A., and Kozurakis, C. 2009. Phoenix rebirth: Scalable mapreduce on a large-scale shared-memory system. In Proceedings of the IEEE International Symposium on Workload Characterization. IEEE, 198--207. Google Scholar
Digital Library
Index Terms
Power-aware dynamic memory management on many-core platforms utilizing DVFS
Recommendations
Latency-aware DVFS for efficient power state transitions on many-core architectures
Energy efficiency is quickly becoming a first-class design constraint in high-performance computing (HPC). We need more efficient power management solutions to save energy costs and carbon footprint of HPC systems. Dynamic voltage and frequency scaling (...
Efficient system-level prototyping of power-aware dynamic memory managers for embedded systems
Special issue: Low-power design techniquesIn the near future, portable embedded devices must run multimedia and wireless network applications with enormous computational performance (1-40GOPS) requirements at a low energy consumption (0.1-2W). In these applications, the dynamic memory subsystem ...
Power-Utility-Driven Write Management for MLC PCM
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing SystemsPhase change memory (PCM) is a promising alternative to Dynamic Random Access Memory (DRAM) as main memory due to its merits of high density and low leakage power. Multi-level Cell (MLC) PCM is more attractive than Single-level Cell (SLC) PCM, because ...






Comments