Abstract
Non-Volatile Memory technologies are coming as a viable option on account of the high density and low-leakage power over the conventional SRAM counterpart. However, the increased write latency reduces their chances as a substitute for SRAM. To attenuate this problem, a hybrid STT-RAM-SRAM architecture is proposed where with large STT-RAM ways, the small SRAM ways are incorporated for handling the write operations. However, the performance gain obtained from such an architecture is not as much as expected on account of the larger miss rate caused by smaller SRAM partition. This, in turn, may limit the amount of cache capacity.
This article attempts to reduce the miss penalty and improve the average memory access time by retaining the victims evicted from the hybrid cache in a smaller, fully associative SRAM structure called the victim cache. The victim cache is accessed on a miss in the primary hybrid cache. Hits in the victim cache require an exchange of the block between the main hybrid cache and the victim cache. In such cases, to effectively place the required block in the appropriate region of the main hybrid cache, we propose an access-based block placement technique. Besides, to manage the runtime load and the uneven evictions of the SRAM partition, we also present a dynamic region-based victim cache partitioning method to hold the victims dedicated to each region. Experimental evaluation on a full system simulator shows significant improvement in the performance and execution time along with a reduction in the overall miss rate. The proposed policy also increases the endurance of Hybrid Cache Architectures (HCA) by reducing writes in the STT partition.
- S. Agarwal and H. K. Kapoor. 2017. Lifetime enhancement of non-volatile caches by exploiting dynamic associativity management techniques. In Proceedings of the IFIP/IEEE International Conference on Very Large Scale Integration-System on a Chip. Springer, 46--71.Google Scholar
- S. Agarwal and H. K. Kapoor. 2018. Reuse-distance-aware write-intensity prediction of dataless entries for energy-efficient hybrid caches. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. 26, 10 (2018), 1881--1894.Google Scholar
Cross Ref
- S. Agarwal and H. K. Kapoor. 2019. Improving the lifetime of non-volatile cache by write restriction. IEEE Trans. Comput. 68, 9 (2019), 1297--1312.Google Scholar
Digital Library
- Junwhan Ahn, Sungjoo Yoo, and Kiyoung Choi. 2013. Write intensity prediction for energy-efficient non-volatile caches. In Proceedings of the IEEE International Symposium on Low Power Electronics and Design. 223--228.Google Scholar
Cross Ref
- Junwhan Ahn, Sungjoo Yoo, and Kiyoung Choi. 2014. DASCA: Dead write prediction assisted STT-RAM cache architecture. In Proceedings of the International Symposium on High Performance Computer Architecture. IEEE Computer Society, 25--36.Google Scholar
Cross Ref
- Dmytro Apalkov et al. 2013. Spin-transfer torque magnetic random access memory (STT-MRAM). J. Emerg. Technol. Comput. Syst. 9, 2, Article 13 (May 2013), 35 pages.Google Scholar
- Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. Technical Report. Princeton University.Google Scholar
- Nathan Binkert et al. 2011. The Gem5 simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7.Google Scholar
- Yu-Ting Chen, Jason Cong, Hui Huang, Bin Liu, Chunyue Liu, Miodrag Potkonjak, and Glenn Reinman. 2012. Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’12). 45--50.Google Scholar
- Ju-Hee Choi and Gi-Ho Park. 2017. NVM way allocation scheme to reduce NVM writes for hybrid cache architecture in chip-multiprocessors. IEEE Trans. Parallel Distrib. Syst. 28, 10 (2017), 2896--2910.Google Scholar
Digital Library
- Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P. Jouppi. 2012. NVSim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. Trans. Comp.-aided Des. Integ. Cir. Syst. 31, 7 (2012), 994--1007.Google Scholar
Digital Library
- Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, and Richard B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the 4th Annual IEEE International Workshop on Workload Characterization. (WWC’01). IEEE, 3--14.Google Scholar
- John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 4 (Sept. 2006), 1--17.Google Scholar
- N. P. Jouppi. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the 17th Annual International Symposium on Computer Architecture.Google Scholar
Digital Library
- Samira M. Khan, Daniel A. Jiménez, Doug Burger, and Babak Falsafi. 2010. Using dead blocks as a virtual victim cache. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). 489--500.Google Scholar
Digital Library
- Namhyung Kim, Junwhan Ahn, Woong Seo, and Kiyoung Choi. 2015. Energy-efficient exclusive last-level hybrid caches consisting of SRAM and STT-RAM. In Proceedings of the IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC’15). 183--188.Google Scholar
Cross Ref
- Y. B. Kim et al. 2011. Bi-layered RRAM with unlimited endurance and extremely uniform switching. In Proceedings of the International Conference on VLSI. 52--53.Google Scholar
- Kyle Kuan and Tosiron Adegbija. 2019. Energy-efficient runtime adaptable L1 STT-RAM cache design. IEEE Trans. Comput.-aided Des. Integ. Circ. Syst. 39, 6 (2019).Google Scholar
- Kyle Kuan and Tosiron Adegbija. 2019. Halls: An energy-efficient highly adaptable last level STT-RAM cache for multicore systems. IEEE Trans. Comput. 68, 11 (2019), 1623--1634.Google Scholar
Digital Library
- Dongwoo Lee and Kiyoung Choi. 2014. Energy-efficient partitioning of hybrid caches in multi-core architecture. In Proceedings of the VLSI-SoC: Internet of Things Foundations and 22nd IFIP WG 10.5/IEEE International Conference on Very Large Scale Integration, (VLSI-SoC’14). 58--74.Google Scholar
Cross Ref
- I. C. Lin and J. N. Chiou. 2015. High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policies. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. 23, 10 (Oct. 2015), 2149--2161.Google Scholar
Digital Library
- J. Luo, H. Cheng, I. Lin, and D. Chang. 2019. TAP: Reducing the energy of asymmetric hybrid last-level cache via thrashing aware placement and migration. IEEE Trans. Comput. 68, 12 (Dec. 2019), 1704--1719.Google Scholar
Cross Ref
- S. J. E. Wilton and N. P. Jouppi. 1996. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits 31, 5 (May 1996), 677–688. DOI:10.1109/4.509850Google Scholar
Cross Ref
- Arijit Nath, Sukarn Agarwal, and Hemangee K. Kapoor. 2020. Reuse distance-based victim cache for effective utilisation of hybrid main memory system. ACM Trans. Des. Autom. Electron. Syst. 25, 3, Article 24 (Feb. 2020), 32 pages.Google Scholar
Digital Library
- Sobhan Niknam, Arghavan Asad, Mahmood Fathy, and Amir-Mohammad Rahmani. 2015. Energy efficient 3D hybrid processor-memory architecture for the dark silicon age. In Proceedings of the 10th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC’15). 1--8.Google Scholar
Cross Ref
- Moinuddin K. Qureshi, Sudhanva Gurumurthi, and Bipin Rajendran. 2011. Phase change memory: From devices to systems. Synth. Lect. Comput. Archit. 6, 4 (2011), 1--134.Google Scholar
Cross Ref
- D. Stiliadis and A. Varma. 1997. Selective victim caching: A method to improve the performance of direct-mapped caches. IEEE Trans. Comput. 46, 5 (1997), 603--610.Google Scholar
Digital Library
- Guangyu Sun, Chao Zhang, Peng Li, Tao Wang, and Yiran Chen. 2016. Statistical cache bypassing for non-volatile memory. IEEE Trans. Comput. 65, 11 (Nov. 2016), 3427--3440.Google Scholar
Digital Library
- Zhenyu Sun, Xiuyuan Bi, Hai Li, Weng-Fai Wong, Zhong-Liang Ong, Xiaochun Zhu, and Wenqing Wu. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 329--338.Google Scholar
Digital Library
- Z. Wang, D. A. Jiménez, C. Xu, G. Sun, and Y. Xie. 2014. Adaptive placement and migration policy for an STT-RAM-based hybrid cache. In Proceedings of the IEEE 20th International Symposium on High Performance Computer Architecture (HPCA’14). 13--24.Google Scholar
- Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, and Yuan Xie. 2009. Power and performance of read-write aware hybrid caches with non-volatile memories. In Proceedings of the Conference on Design, Automation and Test in Europe. 737--742.Google Scholar
Digital Library
- Ying Zheng, Brian T. Davis, and Matthew Jordan. 2004. Performance evaluation of exclusive cache hierarchies. In Proceedings of the IEEE International Symposium on ISPASS Performance Analysis of Systems and Software. 89--96.Google Scholar
Cross Ref
Index Terms
Improving the Performance of Hybrid Caches Using Partitioned Victim Caching
Recommendations
Reuse Distance-based Victim Cache for Effective Utilisation of Hybrid Main Memory System
Hybrid main memories comprising DRAM and Non-volatile memories (NVM) are projected as potential replacements of the traditional DRAM-based memories. However, traditional cache management policies designed for improving the hit rate lack awareness of the ...
Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches
Although direct-mapped caches suffer from higher miss ratios as compared to set-associative caches, they are attractive for today's high-speed pipelined processors that require very low access times. Victim caching was proposed by Jouppi [1] as an ...
Improving data cache performance with integrated use of split caches, victim cache and stream buffers
MEDEA '04: Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architectureIn our prior work we explored a cache organization providing architectural support for distinguishing between memory references that exhibit spatial and temporal locality and mapping them to separate caches.That work showed that using separate (data) ...






Comments