skip to main content
research-article

Integrated Exploration Methodology for Data Interleaving and Data-to-Memory Mapping on SIMD Architectures

Published:23 May 2016Publication History
Skip Abstract Section

Abstract

This work presents a methodology for efficient exploration of data interleaving and data-to-memory mapping options for Single Instruction Multiple Data (SIMD) platform architectures. The system architecture consists of a reconfigurable clustered scratch-pad memory and a SIMD functional unit, which performs the same operation on multiple input data in parallel. The memory accesses contribute substantially to the overall energy consumption of an embedded system executing a data intensive task. The scope of this work is the reduction of the overall energy consumption by increasing the utilization of the functional units and decreasing the number of memory accesses. The presented methodology is tested using a number of benchmark applications with holes in their access scheme. Potential gains are calculated based on the energy models, both for the processing and the memory part of the system. The reduction in energy consumption after efficient interleaving and mapping of data is between 40% and 80% for the complete system and the studied benchmarks.

References

  1. Santosh G. Abraham and Scott A Mahlke. 1999. Automatic and efficient evaluation of memory hierarchies for embedded systems. In Proceedings of the 32nd Annual International Symposium on Microarchitecture (MICRO-32). IEEE, 114--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Berkin Akin, Franz Franchetti, and James C. Hoe. 2015. Data reorganization in memory using 3D-stacked DRAM. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. ACM, 131--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Luca Benini, Alberto Macii, Enrico Macii, and Massimo Poncino. 2000b. Increasing energy efficiency of embedded systems by application-specific memory hierarchy generation. IEEE Design & Test of Computers 2 (2000), 74--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Luca Benini, Alberto Macii, and Massimo Poncino. 2000a. A recursive algorithm for low-power memory partitioning. In Proceedings of the 2000 International Symposium on Low Power Electronics and Design, 2000 (ISLPED’00). IEEE, 78--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Erik Brockmeyer, Bart Durinck, Henk Corporaal, and Francky Catthoor. 2007. Layer assignment techniques for low energy in multi-layered memory organizations. In Designing Embedded Processors. Springer, 157--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. RTL Cadence. 2014. Compiler User Manual (2014). http://www.cadence.com/rl/Resources/datasheets/encounter_rtlcompiler.pdf.Google ScholarGoogle Scholar
  7. Francky Catthoor, Sven Wuytack, G. E. de Greef, Florin Banica, Lode Nachtergaele, and Arnout Vandecappelle. 1998. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Shuai Che, Jeremy W. Sheaffer, and Kevin Skadron. 2011. Dymaxion: Optimizing memory access patterns for heterogeneous systems. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fei Chen and Edwin Hsing-Mean Sha. 1999. Loop scheduling and partitions for hiding memory latencies. In Proceedings of the 12th International Symposium on System Synthesis. IEEE Computer Society, 64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Eric Cheung, Harry Hsieh, and Felice Balarin. 2009. Memory subsystem simulation in software TLM/T models. In Proceedings of the 2009 Asia and South Pacific Design Automation Conference (ASP-DAC 2009). IEEE, 811--816. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Doosan Cho, Ilya Issenin, Nikil Dutt, Jonghee W. Yoon, and Yunheung Paek. 2007. Software controlled memory layout reorganization for irregular array access patterns. In Proceedings of the 2007 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM, 179--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Iason Filippopoulos, Francky Catthoor, and Per Gunnar Kjeldsberg. 2013. Exploration of energy efficient memory organisations for dynamic multimedia applications using system scenarios. Design Automation for Embedded Systems (2013), 1--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Philip Garcia, Katherine Compton, Michael Schulte, Emily Blem, and Wenyin Fu. 2006. An overview of reconfigurable hardware in embedded systems. EURASIP Journal of Embedded Systems 2006, 1 (Jan. 2006), 13--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Gonzalez and M. Horowitz. 1996. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits 31, 9 (Sept. 1996), 1277--1284. DOI:http://dx.doi.org/10.1109/4.535411Google ScholarGoogle ScholarCross RefCross Ref
  15. Peter Grun, Nikil Dutt, and Alex Nicolau. 2000. MIST: An algorithm for memory miss traffic management. In Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided Design. IEEE Press, 431--438. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yibo Guo, Qingfeng Zhuge, Jingtong Hu, Juan Yi, Meikang Qiu, and Edwin H. M. Sha. 2013. Data placement and duplication for embedded multicore systems with scratch pad memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 6 (2013), 809--817. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Hulzink, M. Konijnenburg, M. Ashouei, A. Breeschoten, T. Berset, J. Huisken, J. Stuyt, H. de Groot, F. Barat, J. David, et al. 2011. An ultra low energy biomedical signal processing system operating at near-threshold. IEEE Transactions on Biomedical Circuits and Systems 5, 6 (2011), 546--554.Google ScholarGoogle ScholarCross RefCross Ref
  18. Yuriko Ishitobi, Tohru Ishihara, and Hiroto Yasuura. 2007. Code placement for reducing the energy consumption of embedded processors with scratchpad and cache memories. In Proceedings of the IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia’07). IEEE, 13--18.Google ScholarGoogle ScholarCross RefCross Ref
  19. Bruce L. Jacob, Peter M. Chen, Seth R. Silverman, and Trevor N. Mudge. 1996. An analytical model for designing memory hierarchies. IEEE Transactions on Computers 45, 10 (1996), 1180--1194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Axel Jantsch, Peeter Ellervee, Ahmed Hemani, Johnny Öberg, and Hannu Tenhunen. 1994. Hardware/software partitioning and minimizing memory interface traffic. In Proceedings of the Conference on European Design Automation. IEEE Computer Society Press, 226--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Wang Kai and Xu Zhiwei. 2003. Synopsys Prime Power Manual Release U-2003.06-QA. (2003).Google ScholarGoogle Scholar
  22. Mahmut Kandemir, Ugur Sezer, and Victor Delaluz. 2001. Improving memory energy using access pattern classification. In Proceedings of the 2001 IEEE/ACM International Conference on Computer-Aided Design. IEEE Press, 201--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Kritikakou, F. Catthoor, V. Kelefouras, and C. Goutis. 2014. A scalable and near-optimal representation for storage size management. ACM Transaction Architecture and Code Optimization 11, 1 (2014), 1--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Angeliki Stavros Kritikakou. 2013. Development of Methodologies for Memory Management and Design Space Exploration of SW/HW Computer Architectures for Designing Embedded Systems. Ph.D. Dissertation. Department of Electrical and Computer Engineering School of Engineering, University of Patras.Google ScholarGoogle Scholar
  25. Chidamber Kulkarni, C. Ghez, Miguel Miranda, Francky Catthoor, and Hugo De Man. 2005. Cache conscious data layout organization for conflict miss reduction in embedded multimedia applications. IEEE Transactions on Computers 54, 1 (2005), 76--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jong-eun Lee, Kiyoung Choi, and Nikil D. Dutt. 2003. Compilation approach for coarse-grained reconfigurable architectures. IEEE Design & Test of Computers 1 (2003), 26--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yanbing Li and Wayne H. Wolf. 1999. Hardware/software co-synthesis with memory hierarchies. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 18, 10 (1999), 1405--1417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zhe Ma, Pol Marchal, Daniele Paolo Scarpazza, Peng Yang, Chun Wong, José Ignacio Gómez, Stefaan Himpe, Chantal Ykman-Couvreur, and Francky Catthoor. 2007. Systematic Methodology for Real-Time Cost-Effective Mapping of Dynamic Concurrent Task-Based Systems on Heterogenous Platforms. Springer Science & Business Media. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Macii, L. Benini, and M. Poncino. 2002. Memory Design Techniques for Low-Energy Embedded Systems. Kluwer Academic Publishers.Google ScholarGoogle Scholar
  30. Afzal Malik, Bill Moyer, and Dan Cermak. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the 2000 International Symposium on Low Power Electronics and Design (ISLPED’00). IEEE, 241--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Naraig Manjikian and Tarek Abdelrahman. 1995. Array data layout for the reduction of cache conflicts. In Proceedings of the 8th International Conference on Parallel and Distributed Computing Systems. Citeseer, 1--8.Google ScholarGoogle Scholar
  32. Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In Field-Programmable Technology, 2002.(FPT). Proceedings. 2002 IEEE International Conference on. IEEE, 166--173.Google ScholarGoogle Scholar
  33. P. Meinerzhagen, C. Roth, and A. Burg. 2010. Towards generic low-power area-efficient standard cell based memory architectures. In Proceedings of the 2010 53rd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS’10). IEEE, 129--132.Google ScholarGoogle Scholar
  34. Pascal Meinerzhagen, S. M. Yasser Sherazi, Andreas Burg, and Joachim Neves Rodrigues. 2011. Benchmarking of standard-cell based memories in the sub-VT domain in 65-nm CMOS technology. IEEE Transactions on Emerging and Selected Topics in Circuits and Systems 1, 2 (2011).Google ScholarGoogle Scholar
  35. Sparsh Mittal. 2014. A survey of architectural techniques for improving cache power efficiency. Sustainable Computing: Informatics and Systems 4, 1 (2014), 33--43.Google ScholarGoogle ScholarCross RefCross Ref
  36. Yoichi Oshima, Bing J. Sheu, and Steve H. Jen. 1997. High-speed memory architectures for multimedia applications. Circuits and Devices Magazine, IEEE 13, 1 (1997), 8--13.Google ScholarGoogle Scholar
  37. Preeti Ranjan Panda, Francky Catthoor, Nikil D. Dutt, Koen Danckaert, Erik Brockmeyer, Chidamber Kulkarni, A. Vandercappelle, and Per Gunnar Kjeldsberg. 2001. Data and memory optimization techniques for embedded systems. ACM Transactions on Design Automation of Electronic Systems (TODAES) 6, 2 (2001), 149--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Preeti Ranjan Panda, Nikil D. Dutt, and Alexandru Nicolau. 1999. Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration. Springer Science & Business Media.Google ScholarGoogle Scholar
  39. Preeti Ranjan Panda, Nikil D. Dutt, Alexandru Nicolau, Francky Catthoor, Arnout Vandecappelle, Erik Brockmeyer, Chidamber Kulkarni, and Eddy De Greef. 2001. Data memory organization and optimizations in application-specific systems. IEEE Design & Test of Computers 3 (2001), 56--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. N. L. Passes, Edwin Hsing-Mean Sha, and Liang-Fang Chao. 1995. Multi-dimensional interleaving for time-and-memory design optimization. In Proceedings of the 1995 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD’95). IEEE, 440--445. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Herman Schmit and Donald E. Thomas. 1997. Synthesis of application-specific memory designs. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 5, 1 (1997), 101--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Namita Sharma, Tom Vander Aa, Prashant Agrawal, Praveen Raghavan, Preeti Ranjan Panda, and Francky Catthoor. 2013. Data memory optimization in LTE downlink. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2610--2614.Google ScholarGoogle ScholarCross RefCross Ref
  43. Namita Sharma, Preeti Ranjan Panda, Francky Catthoor, Praveen Raghavan, and Tom Vander Aa. 2015. Array interleaving an energy efficient data layout transformation. ACM Transactions on Design Automation of Electronic Systems (TODAES) 20, 3 (2015), 44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Tajana Šimunić, Luca Benini, and Giovanni De Micheli. 1999. Cycle-accurate simulation of energy consumption in embedded systems. In Proceedings of the 36th Annual ACM/IEEE Design Automation Conference. ACM, 867--872. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Stefan Steinke, Lars Wehmeyer, Bo-Sik Lee, and Peter Marwedel. 2002. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the 2002 Design, Automation and Test in Europe Conference and Exhibition. IEEE, 409--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. I.-Jui Sung, Geng Daniel Liu, and Wen-Mei W. Hwu. 2012. DL: A data layout transformation system for heterogeneous computing. In Innovative Parallel Computing (InPar) 2012. IEEE, 1--11.Google ScholarGoogle Scholar

Index Terms

  1. Integrated Exploration Methodology for Data Interleaving and Data-to-Memory Mapping on SIMD Architectures

                          Recommendations

                          Comments

                          Login options

                          Check if you have access through your login credentials or your institution to get full access on this article.

                          Sign in

                          Full Access

                          PDF Format

                          View or Download as a PDF file.

                          PDF

                          eReader

                          View online with eReader.

                          eReader
                          About Cookies On This Site

                          We use cookies to ensure that we give you the best experience on our website.

                          Learn more

                          Got it!