Abstract
This work presents a methodology for efficient exploration of data interleaving and data-to-memory mapping options for Single Instruction Multiple Data (SIMD) platform architectures. The system architecture consists of a reconfigurable clustered scratch-pad memory and a SIMD functional unit, which performs the same operation on multiple input data in parallel. The memory accesses contribute substantially to the overall energy consumption of an embedded system executing a data intensive task. The scope of this work is the reduction of the overall energy consumption by increasing the utilization of the functional units and decreasing the number of memory accesses. The presented methodology is tested using a number of benchmark applications with holes in their access scheme. Potential gains are calculated based on the energy models, both for the processing and the memory part of the system. The reduction in energy consumption after efficient interleaving and mapping of data is between 40% and 80% for the complete system and the studied benchmarks.
- Santosh G. Abraham and Scott A Mahlke. 1999. Automatic and efficient evaluation of memory hierarchies for embedded systems. In Proceedings of the 32nd Annual International Symposium on Microarchitecture (MICRO-32). IEEE, 114--125. Google Scholar
Digital Library
- Berkin Akin, Franz Franchetti, and James C. Hoe. 2015. Data reorganization in memory using 3D-stacked DRAM. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. ACM, 131--143. Google Scholar
Digital Library
- Luca Benini, Alberto Macii, Enrico Macii, and Massimo Poncino. 2000b. Increasing energy efficiency of embedded systems by application-specific memory hierarchy generation. IEEE Design & Test of Computers 2 (2000), 74--85. Google Scholar
Digital Library
- Luca Benini, Alberto Macii, and Massimo Poncino. 2000a. A recursive algorithm for low-power memory partitioning. In Proceedings of the 2000 International Symposium on Low Power Electronics and Design, 2000 (ISLPED’00). IEEE, 78--83. Google Scholar
Digital Library
- Erik Brockmeyer, Bart Durinck, Henk Corporaal, and Francky Catthoor. 2007. Layer assignment techniques for low energy in multi-layered memory organizations. In Designing Embedded Processors. Springer, 157--190. Google Scholar
Digital Library
- RTL Cadence. 2014. Compiler User Manual (2014). http://www.cadence.com/rl/Resources/datasheets/encounter_rtlcompiler.pdf.Google Scholar
- Francky Catthoor, Sven Wuytack, G. E. de Greef, Florin Banica, Lode Nachtergaele, and Arnout Vandecappelle. 1998. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design. Springer. Google Scholar
Digital Library
- Shuai Che, Jeremy W. Sheaffer, and Kevin Skadron. 2011. Dymaxion: Optimizing memory access patterns for heterogeneous systems. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 13. Google Scholar
Digital Library
- Fei Chen and Edwin Hsing-Mean Sha. 1999. Loop scheduling and partitions for hiding memory latencies. In Proceedings of the 12th International Symposium on System Synthesis. IEEE Computer Society, 64. Google Scholar
Digital Library
- Eric Cheung, Harry Hsieh, and Felice Balarin. 2009. Memory subsystem simulation in software TLM/T models. In Proceedings of the 2009 Asia and South Pacific Design Automation Conference (ASP-DAC 2009). IEEE, 811--816. Google Scholar
Digital Library
- Doosan Cho, Ilya Issenin, Nikil Dutt, Jonghee W. Yoon, and Yunheung Paek. 2007. Software controlled memory layout reorganization for irregular array access patterns. In Proceedings of the 2007 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM, 179--188. Google Scholar
Digital Library
- Iason Filippopoulos, Francky Catthoor, and Per Gunnar Kjeldsberg. 2013. Exploration of energy efficient memory organisations for dynamic multimedia applications using system scenarios. Design Automation for Embedded Systems (2013), 1--24. Google Scholar
Digital Library
- Philip Garcia, Katherine Compton, Michael Schulte, Emily Blem, and Wenyin Fu. 2006. An overview of reconfigurable hardware in embedded systems. EURASIP Journal of Embedded Systems 2006, 1 (Jan. 2006), 13--13. Google Scholar
Digital Library
- R. Gonzalez and M. Horowitz. 1996. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits 31, 9 (Sept. 1996), 1277--1284. DOI:http://dx.doi.org/10.1109/4.535411Google Scholar
Cross Ref
- Peter Grun, Nikil Dutt, and Alex Nicolau. 2000. MIST: An algorithm for memory miss traffic management. In Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided Design. IEEE Press, 431--438. Google Scholar
Digital Library
- Yibo Guo, Qingfeng Zhuge, Jingtong Hu, Juan Yi, Meikang Qiu, and Edwin H. M. Sha. 2013. Data placement and duplication for embedded multicore systems with scratch pad memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 6 (2013), 809--817. Google Scholar
Digital Library
- J. Hulzink, M. Konijnenburg, M. Ashouei, A. Breeschoten, T. Berset, J. Huisken, J. Stuyt, H. de Groot, F. Barat, J. David, et al. 2011. An ultra low energy biomedical signal processing system operating at near-threshold. IEEE Transactions on Biomedical Circuits and Systems 5, 6 (2011), 546--554.Google Scholar
Cross Ref
- Yuriko Ishitobi, Tohru Ishihara, and Hiroto Yasuura. 2007. Code placement for reducing the energy consumption of embedded processors with scratchpad and cache memories. In Proceedings of the IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia’07). IEEE, 13--18.Google Scholar
Cross Ref
- Bruce L. Jacob, Peter M. Chen, Seth R. Silverman, and Trevor N. Mudge. 1996. An analytical model for designing memory hierarchies. IEEE Transactions on Computers 45, 10 (1996), 1180--1194. Google Scholar
Digital Library
- Axel Jantsch, Peeter Ellervee, Ahmed Hemani, Johnny Öberg, and Hannu Tenhunen. 1994. Hardware/software partitioning and minimizing memory interface traffic. In Proceedings of the Conference on European Design Automation. IEEE Computer Society Press, 226--231. Google Scholar
Digital Library
- Wang Kai and Xu Zhiwei. 2003. Synopsys Prime Power Manual Release U-2003.06-QA. (2003).Google Scholar
- Mahmut Kandemir, Ugur Sezer, and Victor Delaluz. 2001. Improving memory energy using access pattern classification. In Proceedings of the 2001 IEEE/ACM International Conference on Computer-Aided Design. IEEE Press, 201--206. Google Scholar
Digital Library
- A. Kritikakou, F. Catthoor, V. Kelefouras, and C. Goutis. 2014. A scalable and near-optimal representation for storage size management. ACM Transaction Architecture and Code Optimization 11, 1 (2014), 1--25. Google Scholar
Digital Library
- Angeliki Stavros Kritikakou. 2013. Development of Methodologies for Memory Management and Design Space Exploration of SW/HW Computer Architectures for Designing Embedded Systems. Ph.D. Dissertation. Department of Electrical and Computer Engineering School of Engineering, University of Patras.Google Scholar
- Chidamber Kulkarni, C. Ghez, Miguel Miranda, Francky Catthoor, and Hugo De Man. 2005. Cache conscious data layout organization for conflict miss reduction in embedded multimedia applications. IEEE Transactions on Computers 54, 1 (2005), 76--81. Google Scholar
Digital Library
- Jong-eun Lee, Kiyoung Choi, and Nikil D. Dutt. 2003. Compilation approach for coarse-grained reconfigurable architectures. IEEE Design & Test of Computers 1 (2003), 26--33. Google Scholar
Digital Library
- Yanbing Li and Wayne H. Wolf. 1999. Hardware/software co-synthesis with memory hierarchies. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 18, 10 (1999), 1405--1417. Google Scholar
Digital Library
- Zhe Ma, Pol Marchal, Daniele Paolo Scarpazza, Peng Yang, Chun Wong, José Ignacio Gómez, Stefaan Himpe, Chantal Ykman-Couvreur, and Francky Catthoor. 2007. Systematic Methodology for Real-Time Cost-Effective Mapping of Dynamic Concurrent Task-Based Systems on Heterogenous Platforms. Springer Science & Business Media. Google Scholar
Digital Library
- A. Macii, L. Benini, and M. Poncino. 2002. Memory Design Techniques for Low-Energy Embedded Systems. Kluwer Academic Publishers.Google Scholar
- Afzal Malik, Bill Moyer, and Dan Cermak. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the 2000 International Symposium on Low Power Electronics and Design (ISLPED’00). IEEE, 241--243. Google Scholar
Digital Library
- Naraig Manjikian and Tarek Abdelrahman. 1995. Array data layout for the reduction of cache conflicts. In Proceedings of the 8th International Conference on Parallel and Distributed Computing Systems. Citeseer, 1--8.Google Scholar
- Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In Field-Programmable Technology, 2002.(FPT). Proceedings. 2002 IEEE International Conference on. IEEE, 166--173.Google Scholar
- P. Meinerzhagen, C. Roth, and A. Burg. 2010. Towards generic low-power area-efficient standard cell based memory architectures. In Proceedings of the 2010 53rd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS’10). IEEE, 129--132.Google Scholar
- Pascal Meinerzhagen, S. M. Yasser Sherazi, Andreas Burg, and Joachim Neves Rodrigues. 2011. Benchmarking of standard-cell based memories in the sub-VT domain in 65-nm CMOS technology. IEEE Transactions on Emerging and Selected Topics in Circuits and Systems 1, 2 (2011).Google Scholar
- Sparsh Mittal. 2014. A survey of architectural techniques for improving cache power efficiency. Sustainable Computing: Informatics and Systems 4, 1 (2014), 33--43.Google Scholar
Cross Ref
- Yoichi Oshima, Bing J. Sheu, and Steve H. Jen. 1997. High-speed memory architectures for multimedia applications. Circuits and Devices Magazine, IEEE 13, 1 (1997), 8--13.Google Scholar
- Preeti Ranjan Panda, Francky Catthoor, Nikil D. Dutt, Koen Danckaert, Erik Brockmeyer, Chidamber Kulkarni, A. Vandercappelle, and Per Gunnar Kjeldsberg. 2001. Data and memory optimization techniques for embedded systems. ACM Transactions on Design Automation of Electronic Systems (TODAES) 6, 2 (2001), 149--206. Google Scholar
Digital Library
- Preeti Ranjan Panda, Nikil D. Dutt, and Alexandru Nicolau. 1999. Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration. Springer Science & Business Media.Google Scholar
- Preeti Ranjan Panda, Nikil D. Dutt, Alexandru Nicolau, Francky Catthoor, Arnout Vandecappelle, Erik Brockmeyer, Chidamber Kulkarni, and Eddy De Greef. 2001. Data memory organization and optimizations in application-specific systems. IEEE Design & Test of Computers 3 (2001), 56--57. Google Scholar
Digital Library
- N. L. Passes, Edwin Hsing-Mean Sha, and Liang-Fang Chao. 1995. Multi-dimensional interleaving for time-and-memory design optimization. In Proceedings of the 1995 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD’95). IEEE, 440--445. Google Scholar
Digital Library
- Herman Schmit and Donald E. Thomas. 1997. Synthesis of application-specific memory designs. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 5, 1 (1997), 101--111. Google Scholar
Digital Library
- Namita Sharma, Tom Vander Aa, Prashant Agrawal, Praveen Raghavan, Preeti Ranjan Panda, and Francky Catthoor. 2013. Data memory optimization in LTE downlink. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2610--2614.Google Scholar
Cross Ref
- Namita Sharma, Preeti Ranjan Panda, Francky Catthoor, Praveen Raghavan, and Tom Vander Aa. 2015. Array interleaving an energy efficient data layout transformation. ACM Transactions on Design Automation of Electronic Systems (TODAES) 20, 3 (2015), 44. Google Scholar
Digital Library
- Tajana Šimunić, Luca Benini, and Giovanni De Micheli. 1999. Cycle-accurate simulation of energy consumption in embedded systems. In Proceedings of the 36th Annual ACM/IEEE Design Automation Conference. ACM, 867--872. Google Scholar
Digital Library
- Stefan Steinke, Lars Wehmeyer, Bo-Sik Lee, and Peter Marwedel. 2002. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the 2002 Design, Automation and Test in Europe Conference and Exhibition. IEEE, 409--415. Google Scholar
Digital Library
- I.-Jui Sung, Geng Daniel Liu, and Wen-Mei W. Hwu. 2012. DL: A data layout transformation system for heterogeneous computing. In Innovative Parallel Computing (InPar) 2012. IEEE, 1--11.Google Scholar
Index Terms
Integrated Exploration Methodology for Data Interleaving and Data-to-Memory Mapping on SIMD Architectures
Recommendations
Software Programmable Data Allocation in Multi-bank Memory of SIMD Processors
DSD '10: Proceedings of the 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and ToolsThe host-SIMD style heterogeneous multi-processor architecture offers high computing performance and user friendly programmability. It explores both task level parallelism and data level parallelism by the on-chip multiple SIMD coprocessors. For ...
Array Interleaving—An Energy-Efficient Data Layout Transformation
Optimizations related to memory accesses and data storage make a significant difference to the performance and energy of a wide range of data-intensive applications. These techniques need to evolve with modern architectures supporting wide memory ...
Dynamic translation of structured Loads/Stores and register mapping for architectures with SIMD extensions
LCTES '17More and more modern processors have been supporting non-contiguous SIMD data accesses. However, translating such instructions has been overlooked in the Dynamic Binary Translation (DBT) area. For example, in the popular QEMU dynamic binary translator, ...






Comments