skip to main content
research-article

DSM: A Case for Hardware-Assisted Merging of DRAM Rows with Same Content

Published:12 June 2020Publication History
Skip Abstract Section

Abstract

The number of cores and the capacities of main memory in modern systems have been growing significantly. Specifically, memory scaling, although at a slower pace than computation scaling, provided opportunities for very large DRAMs with Terabytes (TBs) capacity. Consequently, addressing the performance and energy consumption bottlenecks of DRAMs is more important than ever. DRAM memory refresh operation is one of the main contributing factors to the memory overheads, especially for large capacity DRAMs used in modern servers and emerging large-scale data centers. This paper addresses the memory refresh problem by leveraging the fact that most cloud servers host virtualized systems that use similar kernels, libraries, etc. We propose and experimentally evaluate a novel approach that exploits this observation to address the DRAM refresh overhead in such systems. More specifically, in this work, we present DSM, a light-weight hardware extension in memory controller to detect the pages with same content in memory and refresh only one of them and redirect the requests to the others to this page. Our detailed experimental analysis shows that the proposed DSM design can reduce 99\textsuperscriptth percentile memory access latency by up to 2.01x, and it also reduces the overall memory energy consumption by up to 8.5%.

References

  1. Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA '15). Association for Computing Machinery, New York, NY, USA, 336--348.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amazon. 2020. Amazon AWS EC2. https://aws.amazon.com/ec2/Google ScholarGoogle Scholar
  3. Andrea Arcangeli, Izik Eidus, and Chris Wright. 2009. Increasing memory density by using KSM. In Proceedings of the linux symposium . Citeseer, Montreal, Canada, 19--28.Google ScholarGoogle Scholar
  4. Krste Asanović. 2014. FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers. In 12th USENIX Conference on File and Storage Technologies (FAST 14) . USENIX Association, Santa Clara, CA.Google ScholarGoogle Scholar
  5. Mohammad Bakhshalipour, Aydin Faraji, Seyed Armin Vakil Ghahani, Farid Samandi, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. 2019 a. Reducing Writebacks Through In-Cache Displacement. ACM Trans. Des. Autom. Electron. Syst. , Vol. 24, 2, Article Article 16 (Jan. 2019), bibinfonumpages21 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Mohammad Bakhshalipour, Pejman Lotfi-Kamran, Abbas Mazloumi, Farid Samandi, Mahmood Naderan-Tahan, Mehdi Modarressi, and Hamid Sarbazi-Azad. 2018b. Fast data delivery for many-core processors. IEEE Trans. Comput. , Vol. 67, 10 (2018), 1416--1429.Google ScholarGoogle ScholarCross RefCross Ref
  7. Mohammad Bakhshalipour, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. 2018a. Domino temporal data prefetcher. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, Vienna, Austria, 131--142.Google ScholarGoogle ScholarCross RefCross Ref
  8. Mohammad Bakhshalipour, Mehran Shakerinava, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. 2019 b. Bingo spatial data prefetcher. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, Washington, DC, USA, 399--411.Google ScholarGoogle ScholarCross RefCross Ref
  9. Mohammad Bakhshalipour, Seyedali Tabaeiaghdaei, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. 2019 c. Evaluation of Hardware Data Prefetchers on Server Processors. ACM Comput. Surv. , Vol. 52, 3, Article Article 52 (June 2019), bibinfonumpages29 pages.Google ScholarGoogle Scholar
  10. Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories. ACM Trans. Archit. Code Optim. , Vol. 14, 2, Article Article 14 (June 2017), bibinfonumpages25 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Luiz André Barroso, Urs Hölzle, and Parthasarathy Ranganathan. 2018. The Datacenter as a Computer: Designing Warehouse-Scale Machines, Third Edition. Synthesis Lectures on Computer Architecture , Vol. 13, 3 (2018), i--189.Google ScholarGoogle ScholarCross RefCross Ref
  12. Fabrice Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC '05). USENIX Association, Berkeley, CA, USA, 41--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ishwar Bhati, Zeshan Chishti, Shih-Lien Lu, and Bruce Jacob. 2015. Flexible Auto-Refresh: Enabling Scalable and Energy-Efficient DRAM Refresh Reductions. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA '15). Association for Computing Machinery, New York, NY, USA, 235--246.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News , Vol. 39, 2 (Aug. 2011), 1--7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kevin Kai-Wei Chang, Donghyuk Lee, Zeshan Chishti, Alaa R Alameldeen, Chris Wilkerson, Yoongu Kim, and Onur Mutlu. 2014. Improving DRAM performance by parallelizing refreshes with accesses. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) . IEEE, Orlando, FL, USA, 356--367.Google ScholarGoogle ScholarCross RefCross Ref
  16. Fei Chen, Yi Shan, Yu Zhang, Yu Wang, Hubertus Franke, Xiaotao Chang, and Kun Wang. 2014. Enabling FPGAs in the Cloud. In Proceedings of the 11th ACM Conference on Computing Frontiers (CF '14). Association for Computing Machinery, New York, NY, USA, Article Article 3, bibinfonumpages10 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. David Cheriton, Amin Firoozshahian, Alex Solomatnikov, John P. Stevenson, and Omid Azizi. 2012. HICAMP: Architectural Support for Efficient Concurrency-safe Shared Structured Data Access. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). ACM, New York, NY, USA, 287--300.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Winter Corp. 2020. WinterCorp. Big Data and Data Warehousing. http://www.wintercorp.com/Google ScholarGoogle Scholar
  19. Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM , Vol. 56, 2 (2013), 74--80.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's Highly Available Key-Value Store. SIGOPS Oper. Syst. Rev. , Vol. 41, 6 (Oct. 2007), 205--220.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-Efficient and QoS-Aware Cluster Management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). Association for Computing Machinery, New York, NY, USA, 127--144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Qingyuan Deng, David Meisner, Luiz Ramos, Thomas F. Wenisch, and Ricardo Bianchini. 2011. MemScale: Active Low-Power Modes for Main Memory. SIGARCH Comput. Archit. News , Vol. 39, 1 (March 2011), 225--238.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Aleksandar Dragojević , Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast Remote Memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). USENIX Association, Seattle, WA, 401--414.Google ScholarGoogle Scholar
  24. Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. 2019. The Design and Operation of CloudLab. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 1--14.Google ScholarGoogle Scholar
  25. Duncan Elliott, Michael Stumm, W. Martin Snelgrove, Christian Cojocaru, and Robert McKenzie. 1999. Computational RAM: Implementing Processors in Memory. IEEE Des. Test , Vol. 16, 1 (Jan. 1999), 32--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Fei Gao, Georgios Tziantzioulis, and David Wentzlaff. 2019. ComputeDRAM: In-Memory Compute Using Off-the-Shelf DRAMs. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '52). Association for Computing Machinery, New York, NY, USA, 100--113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sukhpal Singh Gill and Rajkumar Buyya. 2018. A Taxonomy and Future Directions for Sustainable Cloud Computing: 360 Degree View. ACM Comput. Surv. , Vol. 51, 5, Article Article 104 (Dec. 2018), bibinfonumpages33 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Maya Gokhale, Bill Holmes, and Ken Iobst. 1995. Processing in Memory: The Terasys Massively Parallel PIM Array. Computer , Vol. 28, 4 (April 1995), 23--31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Nagendra Gulur, R. Govindarajan, and Mahesh Mehendale. 2016. MicroRefresh: Minimizing Refresh Overhead in DRAM Caches. In Proceedings of the Second International Symposium on Memory Systems (MEMSYS '16). Association for Computing Machinery, New York, NY, USA, 350--361.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Diwaker Gupta, Sangmin Lee, Michael Vrable, Stefan Savage, Alex C. Snoeren, George Varghese, Geoffrey M. Voelker, and Amin Vahdat. 2008. Difference Engine: Harnessing Memory Redundancy in Virtual Machines. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI'08). USENIX Association, Berkeley, CA, USA, 309--322.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hasan Hassan, Minesh Patel, Jeremie S. Kim, A. Giray Yaglikci, Nandita Vijaykumar, Nika Mansouri Ghiasi, Saugata Ghose, and Onur Mutlu. 2019. CROW: A Low-Cost Substrate for Improving DRAM Performance, Energy Efficiency, and Reliability. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA '19). Association for Computing Machinery, New York, NY, USA, 129--142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News , Vol. 34, 4 (Sept. 2006), 1--17.Google ScholarGoogle Scholar
  33. IBM. 2020. IBM Cloud Services. https://www.ibm.com/services/cloudGoogle ScholarGoogle Scholar
  34. Intel. 2017. Intel Optane Memory . https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/optane-memory-brief.pdfGoogle ScholarGoogle Scholar
  35. Anca Iordache, Guillaume Pierre, Peter Sanders, Jose Gabriel de F. Coutinho, and Mark Stillwell. 2016. High Performance in the Cloud with FPGA Groups. In Proceedings of the 9th International Conference on Utility and Cloud Computing (UCC '16). Association for Computing Machinery, New York, NY, USA, 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ciji Isen and Lizy John. 2009. ESKIMO: Energy Savings Using Semantic Knowledge of Inconsequential Memory Occupancy for DRAM Subsystem. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). Association for Computing Machinery, New York, NY, USA, 337--346.Google ScholarGoogle Scholar
  37. Shubham Jain, Ashish Ranjan, Kaushik Roy, and Anand Raghunathan. 2017. Computing in memory with spin-transfer torque magnetic ram. IEEE Transactions on Very Large Scale Integration (VLSI) Systems , Vol. 26, 3 (2017), 470--483.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Uksong Kang, Hak-Soo Yu, Churoo Park, Hongzhong Zheng, John Halbert, Kuljit Bains, S Jang, and Joo Sun Choi. 2014. Co-architecting controllers and DRAM to enhance DRAM process scaling. In The Memory Forum .Google ScholarGoogle Scholar
  39. Harshad Kasture, Davide B. Bartolini, Nathan Beckmann, and Daniel Sanchez. 2015. Rubik: Fast Analytical Power Management for Latency-Critical Systems. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). Association for Computing Machinery, New York, NY, USA, 598--610.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Harshad Kasture and Daniel Sanchez. 2016. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In 2016 IEEE International Symposium on Workload Characterization (IISWC). IEEE, Providence, RI, USA, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  41. Samira Khan, Donghyuk Lee, Yoongu Kim, Alaa R. Alameldeen, Chris Wilkerson, and Onur Mutlu. 2014. The Efficacy of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study. In The 2014 ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '14). Association for Computing Machinery, New York, NY, USA, 519--532.Google ScholarGoogle Scholar
  42. Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory. SIGARCH Comput. Archit. News , Vol. 44, 3 (June 2016), 380--392.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jagadish B. Kotra, Narges Shahidi, Zeshan A. Chishti, and Mahmut T. Kandemir. 2017. Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 723--736.Google ScholarGoogle Scholar
  44. Jagadish B. Kotra, Haibo Zhang, Alaa R. Alameldeen, Chris Wilkerson, and Mahmut T. Kandemir. 2018. CHAMELEON: A Dynamically Reconfigurable Heterogeneous Memory System. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-51). IEEE Press, Fukuoka, Japan, 533--545.Google ScholarGoogle Scholar
  45. Shahar Kvatinsky, Dmitry Belousov, Slavik Liman, Guy Satat, Nimrod Wald, Eby G Friedman, Avinoam Kolodny, and Uri C Weiser. 2014. MAGIC-Memristor-aided logic. IEEE Transactions on Circuits and Systems II: Express Briefs , Vol. 61, 11 (2014), 895--899.Google ScholarGoogle ScholarCross RefCross Ref
  46. Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. 2016. Coordinated and Efficient Huge Page Management with Ingens. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 705--721.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble. 2014. Tales of the Tail: Hardware, OS, and Application-Level Sources of Tail Latency. In Proceedings of the ACM Symposium on Cloud Computing (SOCC '14). Association for Computing Machinery, New York, NY, USA, 1--14.Google ScholarGoogle Scholar
  48. Shuangchen Li, Cong Xu, Qiaosha Zou, Jishen Zhao, Yu Lu, and Yuan Xie. 2016. Pinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-Volatile Memories. In Proceedings of the 53rd Annual Design Automation Conference (DAC '16). Association for Computing Machinery, New York, NY, USA, Article Article 173, bibinfonumpages6 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Zheng Li, Selome Tesfatsion, Saeed Bastani, Ahmed Ali-Eldin, Erik Elmroth, Maria Kihl, and Rajiv Ranjan. 2017. A survey on modeling energy consumption of cloud applications: deconstruction, state of the art, and trade-off debates. IEEE Transactions on Sustainable Computing , Vol. 2, 3 (July 2017), 255--274.Google ScholarGoogle ScholarCross RefCross Ref
  50. Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K. Reinhardt, and Thomas F. Wenisch. 2009. Disaggregated Memory for Expansion and Sharing in Blade Servers. SIGARCH Comput. Archit. News , Vol. 37, 3 (June 2009), 267--278.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Kevin Lim, Yoshio Turner, Jose Renato Santos, Alvin AuYoung, Jichuan Chang, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2012. System-Level Implications of Disaggregated Memory. In Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA '12). IEEE Computer Society, USA, 1--12.Google ScholarGoogle Scholar
  52. Wei-Cheng Lin, Chia-Heng Tu, Chih-Wei Yeh, and Shih-Hao Hung. 2017. GPU acceleration for kernel samepage merging. In 2017 IEEE 23rd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). IEEE, IEEE, Hsinchu, Taiwan, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  53. Linux. 2009. Linux KSM - Kernel Samepage Merging (KSM) . https://www.linux-kvm.org/page/KSMGoogle ScholarGoogle Scholar
  54. Jamie Liu, Ben Jaiyen, Yoongu Kim, Chris Wilkerson, and Onur Mutlu. 2013. An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). Association for Computing Machinery, New York, NY, USA, 60--71.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Jamie Liu, Ben Jaiyen, Richard Veras, and Onur Mutlu. 2012. RAIDR: Retention-Aware Intelligent DRAM Refresh. SIGARCH Comput. Archit. News , Vol. 40, 3 (June 2012), 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Benjamin G. Zorn. 2011. Flikker: Saving DRAM Refresh-power Through Critical Data Partitioning. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, USA, 213--224.Google ScholarGoogle Scholar
  57. Elliot Lockerman, Axel Feldmann, Mohammad Bakhshalipour, Alexandru Stanescu, Shashwat Gupta, Daniel Sanchez, and Nathan Beckmann. 2020. Livia: Data-Centric Computing Throughout the Memory Hierarchy. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 417--433.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. David Meisner, Junjie Wu, and Thomas F. Wenisch. 2012. BigHouse: A Simulation Infrastructure for Data Center Systems. In Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS '12). IEEE Computer Society, USA, 35--45.Google ScholarGoogle Scholar
  59. Microsoft. 2020. Microsoft Azure . https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/Google ScholarGoogle Scholar
  60. Amirhossein Mirhosseini, Akshitha Sriraman, and Thomas F Wenisch. 2019. Enhancing server efficiency in the face of killer microseconds. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) . IEEE, Washington, DC, USA, 185--198.Google ScholarGoogle ScholarCross RefCross Ref
  61. Janani Mukundan, Hillery Hunter, Kyu-hyoun Kim, Jeffrey Stuecheli, and José F. Mart'inez. 2013. Understanding and Mitigating Refresh Overheads in High-Density DDR4 DRAM Systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). Association for Computing Machinery, New York, NY, USA, 48--59.Google ScholarGoogle Scholar
  62. Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, and Xun Jian. 2018. Nonblocking Memory Refresh. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA '18). IEEE Press, Los Angeles, California, 588--599.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Stanko Novakovic, Alexandros Daglis, Edouard Bugnion, Babak Falsafi, and Boris Grot. 2014. Scale-out NUMA. SIGPLAN Not. , Vol. 49, 4 (Feb. 2014), 3--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Minoru Oikawa, Atsushi Kawai, Kentaro Nomura, Kenji Yasuoka, Kazuyuki Yoshikawa, and Tetsu Narumi. 2012. DS-CUDA: A Middleware to Use Many GPUs in the Cloud Environment. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis (SCC '12). IEEE Computer Society, USA, 1207--1214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Ashish Panwar, Aravinda Prasad, and K. Gopinath. 2018. Making Huge Pages Actually Useful. SIGPLAN Not. , Vol. 53, 2 (March 2018), 679--692.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keeton, Christoforos Kozyrakis, Randi Thomas, and Katherine Yelick. 1997. A Case for Intelligent RAM. IEEE Micro , Vol. 17, 2 (March 1997), 34--44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Nathan Pemberton, John D Kubiatowicz, and Randy H Katz. 2018. Enabling Efficient and Transparent Remote Memory Access in Disaggregated Datacenters . Ph.D. Dissertation. Master's thesis, University of California at Berkeley, Berkeley, CA.Google ScholarGoogle Scholar
  68. Moinuddin K Qureshi, Dae-Hyun Kim, Samira Khan, Prashant J Nair, and Onur Mutlu. 2015. AVATAR: A variable-retention-time (VRT) aware refresh for DRAM systems. In 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks . IEEE, Rio de Janeiro, Brazil, 427--437.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Mehrnoosh Raoufi, Quan Deng, Youtao Zhang, and Jun Yang. 2019. PageCmp: Bandwidth Efficient Page Deduplication through In-memory Page Comparison. In 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) . IEEE, Miami, FL, USA, 82--87.Google ScholarGoogle ScholarCross RefCross Ref
  70. Samsung. 2020. Samsung DDR4. https://www.samsung.com/semiconductor/dram/ddr4/Google ScholarGoogle Scholar
  71. Andreas Sandberg, Nikos Nikoleris, Trevor E Carlson, Erik Hagersten, Stefanos Kaxiras, and David Black-Schaffer. 2015. Full speed ahead: Detailed architectural simulation at near-native speed. In 2015 IEEE International Symposium on Workload Characterization. IEEE, Atlanta, GA, USA, 183--192.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and et al. 2013. RowClone: Fast and Energy-Efficient in-DRAM Bulk Data Copy and Initialization. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). Association for Computing Machinery, New York, NY, USA, 185--197.Google ScholarGoogle Scholar
  73. Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. 2017. Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50 '17). Association for Computing Machinery, New York, NY, USA, 273--287.Google ScholarGoogle Scholar
  74. Dimitrios Skarlatos, Nam Sung Kim, and Josep Torrellas. 2017. Pageforge: A Near-memory Content-aware Page-merging Architecture. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50 '17). ACM, New York, NY, USA, 302--314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Yingying Tian, Samira M. Khan, Daniel A. Jiménez, and Gabriel H. Loh. 2014. Last-level Cache Deduplication. In Proceedings of the 28th ACM International Conference on Supercomputing (ICS '14). ACM, New York, NY, USA, 53--62.Google ScholarGoogle Scholar
  76. Josep Torrellas. 2012. FlexRAM: Toward an Advanced Intelligent Memory System: A Retrospective Paper. In Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD 2012) (ICCD '12). IEEE Computer Society, USA, 3--4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Ubuntu. 2020. Ubuntu Server . https://www.ubuntu.com/serverGoogle ScholarGoogle Scholar
  78. Ashish Vulimiri, Philip Brighten Godfrey, Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, and Scott Shenker. 2013. Low Latency via Redundancy. In Proceedings of the Ninth ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT '13). Association for Computing Machinery, New York, NY, USA, 283--294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Carl A. Waldspurger. 2003. Memory Resource Management in VMware ESX Server. SIGOPS Oper. Syst. Rev. , Vol. 36, SI (Dec. 2003), 181--194.Google ScholarGoogle Scholar
  80. Nai Xia, Chen Tian, Yan Luo, Hang Liu, and Xiaoliang Wang. 2018. UKSM: Swift Memory Deduplication via Hierarchical and Adaptive Memory Region Distilling. In 16th USENIX Conference on File and Storage Technologies (FAST 18) . USENIX Association, Oakland, CA, 325--340.Google ScholarGoogle Scholar
  81. Dongli Zhang, Moussa Ehsan, Michael Ferdman, and Radu Sion. 2014a. DIMMer: A Case for Turning off DIMMs in Clouds. In Proceedings of the ACM Symposium on Cloud Computing (SOCC '14). Association for Computing Machinery, New York, NY, USA, 1--8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Yunqi Zhang, Michael A. Laurenzano, Jason Mars, and Lingjia Tang. 2014b. SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, USA, 406--418.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Yunqi Zhang, David Meisner, Jason Mars, and Lingjia Tang. 2016. Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, Seoul, Republic of Korea, 456--468.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DSM: A Case for Hardware-Assisted Merging of DRAM Rows with Same Content

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
            Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 4, Issue 2
            SIGMETRICS
            June 2020
            623 pages
            EISSN:2476-1249
            DOI:10.1145/3405833
            Issue’s Table of Contents

            Copyright © 2020 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 June 2020
            • Online AM: 7 May 2020
            Published in pomacs Volume 4, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!