skip to main content
research-article

A DWM-Based Stack Architecture Implementation for Energy Harvesting Systems

Published:27 September 2017Publication History
Skip Abstract Section

Abstract

Energy harvesting systems tend to use non-volatile processors to conduct computation under intermittent power supplies. While previous implementations of non-volatile processors are based on register architectures, stack architecture, known for its simplicity and small footprint, seems to be a better fit for energy harvesting systems. In this work, Domain Wall Memory (DWM) is used to implement ZPU, the world’s smallest working CPU. Not only does DWM offer ultra-high density and SRAM-comparable access latency, but the sequential access structure of DWM also makes it well suited for a stack whose accesses display high temporal locality. As the performance and energy of DWM are determined by the number of shift operations performed to access the stack, this paper further reduces shift operations through novel data placement and micro-code transformation optimizations. The impact of compiler optimization techniques on the number of shift operations is also investigated so as to select the most effective optimizations for DWM-based stack machine. Experimental studies confirm the effectiveness of the proposed DWM-based stack architectures in improving the performance and energy-efficiency of energy harvesting systems.

References

  1. GCC Compiler optimization tags for ZPU. https://manned.org/zpu-elf-gcc/15c1d71a.Google ScholarGoogle Scholar
  2. ZPU Inst. set. http://www.alvie.com/zpuino/zpu_instructions.html.Google ScholarGoogle Scholar
  3. 2012. ZPUino user manual. http://www.alvie.com/zpuino/downloads/zpuino-1.0.pdf.Google ScholarGoogle Scholar
  4. 2012. ZPUSim Source Files. https://github.com/robinsonb5/ZPUSim.Google ScholarGoogle Scholar
  5. 2015. ZPU - the worlds smallest 32 bit CPU with GCC toolchain: Overview. http://opencores.org/project,zpu.Google ScholarGoogle Scholar
  6. Xianzhang Chen, Edwin H.-M. Sha, Qingfeng Zhuge, Penglin Dai, and Weiwen Jiang. 2015. Optimizing data placement for reducing shift operations on domain wall memories. In Design Autom. Conf. (DAC). 139--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ping Chi, Cong Xu, Tao Zhang, Xiangyu Dong, and Yuan Xie. 2014. Using multi-level cell STT-RAM for fast and energy-efficient local checkpointing. In Proc. Intl. Conf. Comput. Design (ICCD). 301--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Xiangyu Dong, Naveen Muralimanohar, Norm Jouppi, Richard Kaufmann, and Yuan Xie. 2009. Leveraging 3d PCRAM technologies to reduce checkpoint overhead for future exascale systems. In Conf. on High Perform. Comput. Netw., Storage and Analysis. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michel Dubois, Murali Annavaram, and Per Stenstrom. 2012. Parallel Computer Organization and Design. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Stein Ove Eriksen. 2009. Low-power microcontroller core. Master’s thesis. Institutt for elektronikk og telekommunikasjon.Google ScholarGoogle Scholar
  11. Shouzhen Gu, Edwin H.-M. Sha, Qingfeng Zhuge, Yiran Chen, and Jingtong Hu. 2015. Area and performance co-optimization for domain wall memory in application-specific embedded systems. In Design Autom. Conf. (DAC). 20--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In IEEE International Workshop on Workload Characterization. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Qingda Hu, Guangyu Sun, Jiwu Shu, and Chao Zhang. 2016. Exploring main memory design based on racetrack memory technology. In ACM Great Lakes Symposium on VLSI (GLVLSI). 397--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. ITRS. International Technology Roadmap for Semiconductors. Emerging Research Devices (ERD).Google ScholarGoogle Scholar
  15. Hoda Aghaei Khouzani, Pouya Fotouhi, Chengmo Yang, and Guang R Gao. 2017. Leveraging access port positions to accelerate page table walk in DWM-based main memory. In Design Autom. 8 Test in Europe (DATE). 1450--1455.Google ScholarGoogle Scholar
  16. Donald Kline, Haifeng Xu, Rami Melhem, and Alex K. Jones. 2015. Domain-wall memory buffer for low-energy NoCs. In Design Autom. Conf. (DAC). 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Philip Koopman. 1990. Modern stack computer architecture. In Syst. Design 8 Net. Conf. 153--164.Google ScholarGoogle Scholar
  18. Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems. In Intl. Symp. Microarchitecture (MICRO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yongpan Liu, Zewei Li, Hehe Li, Yiqun Wang, Xueqing Li, Kaisheng Ma, Shuangchen Li, Meng-Fan Chang, Sampson John, Yuan Xie, Jiwu Shu, and Huazhong Yang. 2015. Ambient energy harvesting nonvolatile processors: From circuit to system. In Design Autom. Conf. (DAC). 150--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kaisheng Ma, Xueqing Li, Shuangchen Li, Yongpan Liu, John Jack Sampson, Yuan Xie, and Vijaykrishnan Narayanan. 2015. Nonvolatile processor architecture exploration for energy-harvesting applications. In Intl. Symp. Microarchitecture (MICRO). 32--40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mengjie Mao, Wujie Wen, Yaojun Zhang, Yiran Chen, and Hai Li. 2014. Exploration of GPGPU register file architecture using domain-wall-shift-write based racetrack memory. In Design Autom. Conf. (DAC). 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Seyedhamidreza Motaman, Anirudh Iyengar, and Swaroop Ghosh. 2015. Domain wall memory- layout, circuit and synergistic systems. IEEE Trans. on Nanotechnology 14 (Mar. 2015). Issue 2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Stuart S. Parkin, Masamitsu Hayashi, and Luc Thomas. 2008. Magnetic domain-wall racetrack memory. Science 320 (Apr. 2008), 109--194.Google ScholarGoogle Scholar
  24. Philippe Robin. 2007. Experiment with linux and ARM thumb-2 ISA. In CELF Embedded Linux Conf.Google ScholarGoogle Scholar
  25. Mohammad Salehi, Mohammad Khavari Tavana, Semeen Rehman, Muhammad Shafique, Alireza Ejlali, and Jörg Henkel. 2016. Two-state checkpointing for energy-efficient fault tolerance in hard real-time systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 7 (2016), 2426--2437.Google ScholarGoogle Scholar
  26. Steve Sinha, Satrajit Chatterjee, and Kaushik Ravindran. BOOST: Berkeleys Out-of-Order Stack Thingy. Technical Report. Citeseer.Google ScholarGoogle Scholar
  27. Zhenyu Sun, Wenqing Wu, and Hai Li. 2013. Cross-layer racetrack memory design for ultra high density and low power consumption. In Design Autom. Conf. (DAC). 53--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hong-Phuc Trinh, Weisheng Zhao, Jacques-Olivier Klein, Yue Zhang, Dafin Ravelsona, and Claude Chappert. 2013. Magnetic adder based on racetrack memory. IEEE Trans. on Circuits and Syst. (TCS) 60 (Jun. 2013). Issue 6.Google ScholarGoogle ScholarCross RefCross Ref
  29. Rangharajan Venkatesan, Vivek Kozhikkottu, Charles Augustine, Arijit Raychowdhury, Kaushik Roy, and Anand Raghunathan. 2012. Tapecache: A high density, energy efficient cache based on domain wall memory. In Proc. Intl. Symp. Low Power Electron. 8 Design (ISLPED). 185--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rangharajan Venkatesan, Mrigank Sharad, Kaushik Roy, and Anand Raghunathan. 2013. DWM-TAPESTRI-an energy efficient all-spin cache using domain wall shift based writes. In Design Autom. 8 Test in Europe (DATE). 1825--1830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mimi Xie, Chen Pan, Jingtong Hu, Chengmo Yang, and Yiran Chen. 2015. Checkpoint-aware instruction scheduling for nonvolatile processor with multiple functional units. In Asia 8 South Pacific Design Autom. Conf. (ASP-DAC). 316--321.Google ScholarGoogle Scholar
  32. Haifeng Xu, Yousra Alkabani, Rami Melhem, and Alex K Jones. 2016. FusedCache: A naturally inclusive, racetrack memory, dual-level private cache. IEEE Transactions on Multi-Scale Computing Systems 2, 2 (2016), 69--82.Google ScholarGoogle ScholarCross RefCross Ref
  33. Haifeng Xu, Yong Li, Rami Melhem, and Alex K. Jones. 2015. Multilane racetrack caches: Improving efficiency through compression and independent shifting. In Asia 8 South Pacific Design Autom. Conf. (ASP-DAC). 417--422.Google ScholarGoogle Scholar
  34. Chengmo Yang and Maria Ruiz Varela. 2015. Qualifying non-volatile register files for embedded systems through compiler-directed write minimization and balancing. In Intl. Conf. VLSI 8 System-on-Chip (VLSI-SoC). 86--91.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A DWM-Based Stack Architecture Implementation for Energy Harvesting Systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!