skip to main content
research-article

Improving the Performance of Hybrid Caches Using Partitioned Victim Caching

Published:07 December 2020Publication History
Skip Abstract Section

Abstract

Non-Volatile Memory technologies are coming as a viable option on account of the high density and low-leakage power over the conventional SRAM counterpart. However, the increased write latency reduces their chances as a substitute for SRAM. To attenuate this problem, a hybrid STT-RAM-SRAM architecture is proposed where with large STT-RAM ways, the small SRAM ways are incorporated for handling the write operations. However, the performance gain obtained from such an architecture is not as much as expected on account of the larger miss rate caused by smaller SRAM partition. This, in turn, may limit the amount of cache capacity.

This article attempts to reduce the miss penalty and improve the average memory access time by retaining the victims evicted from the hybrid cache in a smaller, fully associative SRAM structure called the victim cache. The victim cache is accessed on a miss in the primary hybrid cache. Hits in the victim cache require an exchange of the block between the main hybrid cache and the victim cache. In such cases, to effectively place the required block in the appropriate region of the main hybrid cache, we propose an access-based block placement technique. Besides, to manage the runtime load and the uneven evictions of the SRAM partition, we also present a dynamic region-based victim cache partitioning method to hold the victims dedicated to each region. Experimental evaluation on a full system simulator shows significant improvement in the performance and execution time along with a reduction in the overall miss rate. The proposed policy also increases the endurance of Hybrid Cache Architectures (HCA) by reducing writes in the STT partition.

References

  1. S. Agarwal and H. K. Kapoor. 2017. Lifetime enhancement of non-volatile caches by exploiting dynamic associativity management techniques. In Proceedings of the IFIP/IEEE International Conference on Very Large Scale Integration-System on a Chip. Springer, 46--71.Google ScholarGoogle Scholar
  2. S. Agarwal and H. K. Kapoor. 2018. Reuse-distance-aware write-intensity prediction of dataless entries for energy-efficient hybrid caches. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. 26, 10 (2018), 1881--1894.Google ScholarGoogle ScholarCross RefCross Ref
  3. S. Agarwal and H. K. Kapoor. 2019. Improving the lifetime of non-volatile cache by write restriction. IEEE Trans. Comput. 68, 9 (2019), 1297--1312.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Junwhan Ahn, Sungjoo Yoo, and Kiyoung Choi. 2013. Write intensity prediction for energy-efficient non-volatile caches. In Proceedings of the IEEE International Symposium on Low Power Electronics and Design. 223--228.Google ScholarGoogle ScholarCross RefCross Ref
  5. Junwhan Ahn, Sungjoo Yoo, and Kiyoung Choi. 2014. DASCA: Dead write prediction assisted STT-RAM cache architecture. In Proceedings of the International Symposium on High Performance Computer Architecture. IEEE Computer Society, 25--36.Google ScholarGoogle ScholarCross RefCross Ref
  6. Dmytro Apalkov et al. 2013. Spin-transfer torque magnetic random access memory (STT-MRAM). J. Emerg. Technol. Comput. Syst. 9, 2, Article 13 (May 2013), 35 pages.Google ScholarGoogle Scholar
  7. Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. Technical Report. Princeton University.Google ScholarGoogle Scholar
  8. Nathan Binkert et al. 2011. The Gem5 simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7.Google ScholarGoogle Scholar
  9. Yu-Ting Chen, Jason Cong, Hui Huang, Bin Liu, Chunyue Liu, Miodrag Potkonjak, and Glenn Reinman. 2012. Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’12). 45--50.Google ScholarGoogle Scholar
  10. Ju-Hee Choi and Gi-Ho Park. 2017. NVM way allocation scheme to reduce NVM writes for hybrid cache architecture in chip-multiprocessors. IEEE Trans. Parallel Distrib. Syst. 28, 10 (2017), 2896--2910.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P. Jouppi. 2012. NVSim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. Trans. Comp.-aided Des. Integ. Cir. Syst. 31, 7 (2012), 994--1007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, and Richard B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the 4th Annual IEEE International Workshop on Workload Characterization. (WWC’01). IEEE, 3--14.Google ScholarGoogle Scholar
  13. John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 4 (Sept. 2006), 1--17.Google ScholarGoogle Scholar
  14. N. P. Jouppi. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the 17th Annual International Symposium on Computer Architecture.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Samira M. Khan, Daniel A. Jiménez, Doug Burger, and Babak Falsafi. 2010. Using dead blocks as a virtual victim cache. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). 489--500.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Namhyung Kim, Junwhan Ahn, Woong Seo, and Kiyoung Choi. 2015. Energy-efficient exclusive last-level hybrid caches consisting of SRAM and STT-RAM. In Proceedings of the IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC’15). 183--188.Google ScholarGoogle ScholarCross RefCross Ref
  17. Y. B. Kim et al. 2011. Bi-layered RRAM with unlimited endurance and extremely uniform switching. In Proceedings of the International Conference on VLSI. 52--53.Google ScholarGoogle Scholar
  18. Kyle Kuan and Tosiron Adegbija. 2019. Energy-efficient runtime adaptable L1 STT-RAM cache design. IEEE Trans. Comput.-aided Des. Integ. Circ. Syst. 39, 6 (2019).Google ScholarGoogle Scholar
  19. Kyle Kuan and Tosiron Adegbija. 2019. Halls: An energy-efficient highly adaptable last level STT-RAM cache for multicore systems. IEEE Trans. Comput. 68, 11 (2019), 1623--1634.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dongwoo Lee and Kiyoung Choi. 2014. Energy-efficient partitioning of hybrid caches in multi-core architecture. In Proceedings of the VLSI-SoC: Internet of Things Foundations and 22nd IFIP WG 10.5/IEEE International Conference on Very Large Scale Integration, (VLSI-SoC’14). 58--74.Google ScholarGoogle ScholarCross RefCross Ref
  21. I. C. Lin and J. N. Chiou. 2015. High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policies. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. 23, 10 (Oct. 2015), 2149--2161.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Luo, H. Cheng, I. Lin, and D. Chang. 2019. TAP: Reducing the energy of asymmetric hybrid last-level cache via thrashing aware placement and migration. IEEE Trans. Comput. 68, 12 (Dec. 2019), 1704--1719.Google ScholarGoogle ScholarCross RefCross Ref
  23. S. J. E. Wilton and N. P. Jouppi. 1996. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits 31, 5 (May 1996), 677–688. DOI:10.1109/4.509850Google ScholarGoogle ScholarCross RefCross Ref
  24. Arijit Nath, Sukarn Agarwal, and Hemangee K. Kapoor. 2020. Reuse distance-based victim cache for effective utilisation of hybrid main memory system. ACM Trans. Des. Autom. Electron. Syst. 25, 3, Article 24 (Feb. 2020), 32 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sobhan Niknam, Arghavan Asad, Mahmood Fathy, and Amir-Mohammad Rahmani. 2015. Energy efficient 3D hybrid processor-memory architecture for the dark silicon age. In Proceedings of the 10th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC’15). 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  26. Moinuddin K. Qureshi, Sudhanva Gurumurthi, and Bipin Rajendran. 2011. Phase change memory: From devices to systems. Synth. Lect. Comput. Archit. 6, 4 (2011), 1--134.Google ScholarGoogle ScholarCross RefCross Ref
  27. D. Stiliadis and A. Varma. 1997. Selective victim caching: A method to improve the performance of direct-mapped caches. IEEE Trans. Comput. 46, 5 (1997), 603--610.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Guangyu Sun, Chao Zhang, Peng Li, Tao Wang, and Yiran Chen. 2016. Statistical cache bypassing for non-volatile memory. IEEE Trans. Comput. 65, 11 (Nov. 2016), 3427--3440.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Zhenyu Sun, Xiuyuan Bi, Hai Li, Weng-Fai Wong, Zhong-Liang Ong, Xiaochun Zhu, and Wenqing Wu. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 329--338.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Z. Wang, D. A. Jiménez, C. Xu, G. Sun, and Y. Xie. 2014. Adaptive placement and migration policy for an STT-RAM-based hybrid cache. In Proceedings of the IEEE 20th International Symposium on High Performance Computer Architecture (HPCA’14). 13--24.Google ScholarGoogle Scholar
  31. Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, and Yuan Xie. 2009. Power and performance of read-write aware hybrid caches with non-volatile memories. In Proceedings of the Conference on Design, Automation and Test in Europe. 737--742.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ying Zheng, Brian T. Davis, and Matthew Jordan. 2004. Performance evaluation of exclusive cache hierarchies. In Proceedings of the IEEE International Symposium on ISPASS Performance Analysis of Systems and Software. 89--96.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Improving the Performance of Hybrid Caches Using Partitioned Victim Caching

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!