Abstract
Energy harvesting systems tend to use non-volatile processors to conduct computation under intermittent power supplies. While previous implementations of non-volatile processors are based on register architectures, stack architecture, known for its simplicity and small footprint, seems to be a better fit for energy harvesting systems. In this work, Domain Wall Memory (DWM) is used to implement ZPU, the world’s smallest working CPU. Not only does DWM offer ultra-high density and SRAM-comparable access latency, but the sequential access structure of DWM also makes it well suited for a stack whose accesses display high temporal locality. As the performance and energy of DWM are determined by the number of shift operations performed to access the stack, this paper further reduces shift operations through novel data placement and micro-code transformation optimizations. The impact of compiler optimization techniques on the number of shift operations is also investigated so as to select the most effective optimizations for DWM-based stack machine. Experimental studies confirm the effectiveness of the proposed DWM-based stack architectures in improving the performance and energy-efficiency of energy harvesting systems.
- GCC Compiler optimization tags for ZPU. https://manned.org/zpu-elf-gcc/15c1d71a.Google Scholar
- ZPU Inst. set. http://www.alvie.com/zpuino/zpu_instructions.html.Google Scholar
- 2012. ZPUino user manual. http://www.alvie.com/zpuino/downloads/zpuino-1.0.pdf.Google Scholar
- 2012. ZPUSim Source Files. https://github.com/robinsonb5/ZPUSim.Google Scholar
- 2015. ZPU - the worlds smallest 32 bit CPU with GCC toolchain: Overview. http://opencores.org/project,zpu.Google Scholar
- Xianzhang Chen, Edwin H.-M. Sha, Qingfeng Zhuge, Penglin Dai, and Weiwen Jiang. 2015. Optimizing data placement for reducing shift operations on domain wall memories. In Design Autom. Conf. (DAC). 139--144. Google Scholar
Digital Library
- Ping Chi, Cong Xu, Tao Zhang, Xiangyu Dong, and Yuan Xie. 2014. Using multi-level cell STT-RAM for fast and energy-efficient local checkpointing. In Proc. Intl. Conf. Comput. Design (ICCD). 301--308. Google Scholar
Digital Library
- Xiangyu Dong, Naveen Muralimanohar, Norm Jouppi, Richard Kaufmann, and Yuan Xie. 2009. Leveraging 3d PCRAM technologies to reduce checkpoint overhead for future exascale systems. In Conf. on High Perform. Comput. Netw., Storage and Analysis. Google Scholar
Digital Library
- Michel Dubois, Murali Annavaram, and Per Stenstrom. 2012. Parallel Computer Organization and Design. Cambridge University Press. Google Scholar
Digital Library
- Stein Ove Eriksen. 2009. Low-power microcontroller core. Master’s thesis. Institutt for elektronikk og telekommunikasjon.Google Scholar
- Shouzhen Gu, Edwin H.-M. Sha, Qingfeng Zhuge, Yiran Chen, and Jingtong Hu. 2015. Area and performance co-optimization for domain wall memory in application-specific embedded systems. In Design Autom. Conf. (DAC). 20--25. Google Scholar
Digital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In IEEE International Workshop on Workload Characterization. Google Scholar
Digital Library
- Qingda Hu, Guangyu Sun, Jiwu Shu, and Chao Zhang. 2016. Exploring main memory design based on racetrack memory technology. In ACM Great Lakes Symposium on VLSI (GLVLSI). 397--402. Google Scholar
Digital Library
- ITRS. International Technology Roadmap for Semiconductors. Emerging Research Devices (ERD).Google Scholar
- Hoda Aghaei Khouzani, Pouya Fotouhi, Chengmo Yang, and Guang R Gao. 2017. Leveraging access port positions to accelerate page table walk in DWM-based main memory. In Design Autom. 8 Test in Europe (DATE). 1450--1455.Google Scholar
- Donald Kline, Haifeng Xu, Rami Melhem, and Alex K. Jones. 2015. Domain-wall memory buffer for low-energy NoCs. In Design Autom. Conf. (DAC). 1--6. Google Scholar
Digital Library
- Philip Koopman. 1990. Modern stack computer architecture. In Syst. Design 8 Net. Conf. 153--164.Google Scholar
- Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems. In Intl. Symp. Microarchitecture (MICRO). Google Scholar
Digital Library
- Yongpan Liu, Zewei Li, Hehe Li, Yiqun Wang, Xueqing Li, Kaisheng Ma, Shuangchen Li, Meng-Fan Chang, Sampson John, Yuan Xie, Jiwu Shu, and Huazhong Yang. 2015. Ambient energy harvesting nonvolatile processors: From circuit to system. In Design Autom. Conf. (DAC). 150--155. Google Scholar
Digital Library
- Kaisheng Ma, Xueqing Li, Shuangchen Li, Yongpan Liu, John Jack Sampson, Yuan Xie, and Vijaykrishnan Narayanan. 2015. Nonvolatile processor architecture exploration for energy-harvesting applications. In Intl. Symp. Microarchitecture (MICRO). 32--40.Google Scholar
Digital Library
- Mengjie Mao, Wujie Wen, Yaojun Zhang, Yiran Chen, and Hai Li. 2014. Exploration of GPGPU register file architecture using domain-wall-shift-write based racetrack memory. In Design Autom. Conf. (DAC). 1--6. Google Scholar
Digital Library
- Seyedhamidreza Motaman, Anirudh Iyengar, and Swaroop Ghosh. 2015. Domain wall memory- layout, circuit and synergistic systems. IEEE Trans. on Nanotechnology 14 (Mar. 2015). Issue 2.Google Scholar
Digital Library
- Stuart S. Parkin, Masamitsu Hayashi, and Luc Thomas. 2008. Magnetic domain-wall racetrack memory. Science 320 (Apr. 2008), 109--194.Google Scholar
- Philippe Robin. 2007. Experiment with linux and ARM thumb-2 ISA. In CELF Embedded Linux Conf.Google Scholar
- Mohammad Salehi, Mohammad Khavari Tavana, Semeen Rehman, Muhammad Shafique, Alireza Ejlali, and Jörg Henkel. 2016. Two-state checkpointing for energy-efficient fault tolerance in hard real-time systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 7 (2016), 2426--2437.Google Scholar
- Steve Sinha, Satrajit Chatterjee, and Kaushik Ravindran. BOOST: Berkeleys Out-of-Order Stack Thingy. Technical Report. Citeseer.Google Scholar
- Zhenyu Sun, Wenqing Wu, and Hai Li. 2013. Cross-layer racetrack memory design for ultra high density and low power consumption. In Design Autom. Conf. (DAC). 53--58. Google Scholar
Digital Library
- Hong-Phuc Trinh, Weisheng Zhao, Jacques-Olivier Klein, Yue Zhang, Dafin Ravelsona, and Claude Chappert. 2013. Magnetic adder based on racetrack memory. IEEE Trans. on Circuits and Syst. (TCS) 60 (Jun. 2013). Issue 6.Google Scholar
Cross Ref
- Rangharajan Venkatesan, Vivek Kozhikkottu, Charles Augustine, Arijit Raychowdhury, Kaushik Roy, and Anand Raghunathan. 2012. Tapecache: A high density, energy efficient cache based on domain wall memory. In Proc. Intl. Symp. Low Power Electron. 8 Design (ISLPED). 185--190. Google Scholar
Digital Library
- Rangharajan Venkatesan, Mrigank Sharad, Kaushik Roy, and Anand Raghunathan. 2013. DWM-TAPESTRI-an energy efficient all-spin cache using domain wall shift based writes. In Design Autom. 8 Test in Europe (DATE). 1825--1830. Google Scholar
Digital Library
- Mimi Xie, Chen Pan, Jingtong Hu, Chengmo Yang, and Yiran Chen. 2015. Checkpoint-aware instruction scheduling for nonvolatile processor with multiple functional units. In Asia 8 South Pacific Design Autom. Conf. (ASP-DAC). 316--321.Google Scholar
- Haifeng Xu, Yousra Alkabani, Rami Melhem, and Alex K Jones. 2016. FusedCache: A naturally inclusive, racetrack memory, dual-level private cache. IEEE Transactions on Multi-Scale Computing Systems 2, 2 (2016), 69--82.Google Scholar
Cross Ref
- Haifeng Xu, Yong Li, Rami Melhem, and Alex K. Jones. 2015. Multilane racetrack caches: Improving efficiency through compression and independent shifting. In Asia 8 South Pacific Design Autom. Conf. (ASP-DAC). 417--422.Google Scholar
- Chengmo Yang and Maria Ruiz Varela. 2015. Qualifying non-volatile register files for embedded systems through compiler-directed write minimization and balancing. In Intl. Conf. VLSI 8 System-on-Chip (VLSI-SoC). 86--91.Google Scholar
Cross Ref
Index Terms
A DWM-Based Stack Architecture Implementation for Energy Harvesting Systems
Recommendations
ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0
Racetrack memories (RMs) have significantly evolved since their conception in 2008, making them a serious contender in the field of emerging memory technologies. Despite key technological advancements, the access latency and energy consumption of an RM-...
Generalized data placement strategies for racetrack memories
DATE '20: Proceedings of the 23rd Conference on Design, Automation and Test in EuropeUltra-dense non-volatile racetrack memories (RTMs) have been investigated at various levels in the memory hierarchy for improved performance and reduced energy consumption. However, the innate shift operations in RTMs hinder their applicability to ...
Energy-Efficient All-Spin Cache Hierarchy Using Shift-Based Writes and Multilevel Storage
Spintronic memories are considered to be promising candidates for future on-chip memories due to their high density, nonvolatility, and near-zero leakage. However, they also face challenges such as high write energy and latency and limited read speed ...






Comments