Abstract
The number of cores and the capacities of main memory in modern systems have been growing significantly. Specifically, memory scaling, although at a slower pace than computation scaling, provided opportunities for very large DRAMs with Terabytes (TBs) capacity. Consequently, addressing the performance and energy consumption bottlenecks of DRAMs is more important than ever. DRAM memory refresh operation is one of the main contributing factors to the memory overheads, especially for large capacity DRAMs used in modern servers and emerging large-scale data centers. This paper addresses the memory refresh problem by leveraging the fact that most cloud servers host virtualized systems that use similar kernels, libraries, etc. We propose and experimentally evaluate a novel approach that exploits this observation to address the DRAM refresh overhead in such systems. More specifically, in this work, we present DSM, a light-weight hardware extension in memory controller to detect the pages with same content in memory and refresh only one of them and redirect the requests to the others to this page. Our detailed experimental analysis shows that the proposed DSM design can reduce 99\textsuperscriptth percentile memory access latency by up to 2.01x, and it also reduces the overall memory energy consumption by up to 8.5%.
- Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA '15). Association for Computing Machinery, New York, NY, USA, 336--348.Google Scholar
Digital Library
- Amazon. 2020. Amazon AWS EC2. https://aws.amazon.com/ec2/Google Scholar
- Andrea Arcangeli, Izik Eidus, and Chris Wright. 2009. Increasing memory density by using KSM. In Proceedings of the linux symposium . Citeseer, Montreal, Canada, 19--28.Google Scholar
- Krste Asanović. 2014. FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers. In 12th USENIX Conference on File and Storage Technologies (FAST 14) . USENIX Association, Santa Clara, CA.Google Scholar
- Mohammad Bakhshalipour, Aydin Faraji, Seyed Armin Vakil Ghahani, Farid Samandi, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. 2019 a. Reducing Writebacks Through In-Cache Displacement. ACM Trans. Des. Autom. Electron. Syst. , Vol. 24, 2, Article Article 16 (Jan. 2019), bibinfonumpages21 pages.Google Scholar
Digital Library
- Mohammad Bakhshalipour, Pejman Lotfi-Kamran, Abbas Mazloumi, Farid Samandi, Mahmood Naderan-Tahan, Mehdi Modarressi, and Hamid Sarbazi-Azad. 2018b. Fast data delivery for many-core processors. IEEE Trans. Comput. , Vol. 67, 10 (2018), 1416--1429.Google Scholar
Cross Ref
- Mohammad Bakhshalipour, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. 2018a. Domino temporal data prefetcher. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, Vienna, Austria, 131--142.Google Scholar
Cross Ref
- Mohammad Bakhshalipour, Mehran Shakerinava, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. 2019 b. Bingo spatial data prefetcher. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, Washington, DC, USA, 399--411.Google Scholar
Cross Ref
- Mohammad Bakhshalipour, Seyedali Tabaeiaghdaei, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. 2019 c. Evaluation of Hardware Data Prefetchers on Server Processors. ACM Comput. Surv. , Vol. 52, 3, Article Article 52 (June 2019), bibinfonumpages29 pages.Google Scholar
- Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories. ACM Trans. Archit. Code Optim. , Vol. 14, 2, Article Article 14 (June 2017), bibinfonumpages25 pages.Google Scholar
Digital Library
- Luiz André Barroso, Urs Hölzle, and Parthasarathy Ranganathan. 2018. The Datacenter as a Computer: Designing Warehouse-Scale Machines, Third Edition. Synthesis Lectures on Computer Architecture , Vol. 13, 3 (2018), i--189.Google Scholar
Cross Ref
- Fabrice Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC '05). USENIX Association, Berkeley, CA, USA, 41--41.Google Scholar
Digital Library
- Ishwar Bhati, Zeshan Chishti, Shih-Lien Lu, and Bruce Jacob. 2015. Flexible Auto-Refresh: Enabling Scalable and Energy-Efficient DRAM Refresh Reductions. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA '15). Association for Computing Machinery, New York, NY, USA, 235--246.Google Scholar
Digital Library
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News , Vol. 39, 2 (Aug. 2011), 1--7.Google Scholar
Digital Library
- Kevin Kai-Wei Chang, Donghyuk Lee, Zeshan Chishti, Alaa R Alameldeen, Chris Wilkerson, Yoongu Kim, and Onur Mutlu. 2014. Improving DRAM performance by parallelizing refreshes with accesses. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) . IEEE, Orlando, FL, USA, 356--367.Google Scholar
Cross Ref
- Fei Chen, Yi Shan, Yu Zhang, Yu Wang, Hubertus Franke, Xiaotao Chang, and Kun Wang. 2014. Enabling FPGAs in the Cloud. In Proceedings of the 11th ACM Conference on Computing Frontiers (CF '14). Association for Computing Machinery, New York, NY, USA, Article Article 3, bibinfonumpages10 pages.Google Scholar
Digital Library
- David Cheriton, Amin Firoozshahian, Alex Solomatnikov, John P. Stevenson, and Omid Azizi. 2012. HICAMP: Architectural Support for Efficient Concurrency-safe Shared Structured Data Access. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). ACM, New York, NY, USA, 287--300.Google Scholar
Digital Library
- Winter Corp. 2020. WinterCorp. Big Data and Data Warehousing. http://www.wintercorp.com/Google Scholar
- Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM , Vol. 56, 2 (2013), 74--80.Google Scholar
Digital Library
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's Highly Available Key-Value Store. SIGOPS Oper. Syst. Rev. , Vol. 41, 6 (Oct. 2007), 205--220.Google Scholar
Digital Library
- Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-Efficient and QoS-Aware Cluster Management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). Association for Computing Machinery, New York, NY, USA, 127--144.Google Scholar
Digital Library
- Qingyuan Deng, David Meisner, Luiz Ramos, Thomas F. Wenisch, and Ricardo Bianchini. 2011. MemScale: Active Low-Power Modes for Main Memory. SIGARCH Comput. Archit. News , Vol. 39, 1 (March 2011), 225--238.Google Scholar
Digital Library
- Aleksandar Dragojević , Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast Remote Memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). USENIX Association, Seattle, WA, 401--414.Google Scholar
- Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. 2019. The Design and Operation of CloudLab. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 1--14.Google Scholar
- Duncan Elliott, Michael Stumm, W. Martin Snelgrove, Christian Cojocaru, and Robert McKenzie. 1999. Computational RAM: Implementing Processors in Memory. IEEE Des. Test , Vol. 16, 1 (Jan. 1999), 32--41.Google Scholar
Digital Library
- Fei Gao, Georgios Tziantzioulis, and David Wentzlaff. 2019. ComputeDRAM: In-Memory Compute Using Off-the-Shelf DRAMs. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '52). Association for Computing Machinery, New York, NY, USA, 100--113.Google Scholar
Digital Library
- Sukhpal Singh Gill and Rajkumar Buyya. 2018. A Taxonomy and Future Directions for Sustainable Cloud Computing: 360 Degree View. ACM Comput. Surv. , Vol. 51, 5, Article Article 104 (Dec. 2018), bibinfonumpages33 pages.Google Scholar
Digital Library
- Maya Gokhale, Bill Holmes, and Ken Iobst. 1995. Processing in Memory: The Terasys Massively Parallel PIM Array. Computer , Vol. 28, 4 (April 1995), 23--31.Google Scholar
Digital Library
- Nagendra Gulur, R. Govindarajan, and Mahesh Mehendale. 2016. MicroRefresh: Minimizing Refresh Overhead in DRAM Caches. In Proceedings of the Second International Symposium on Memory Systems (MEMSYS '16). Association for Computing Machinery, New York, NY, USA, 350--361.Google Scholar
Digital Library
- Diwaker Gupta, Sangmin Lee, Michael Vrable, Stefan Savage, Alex C. Snoeren, George Varghese, Geoffrey M. Voelker, and Amin Vahdat. 2008. Difference Engine: Harnessing Memory Redundancy in Virtual Machines. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI'08). USENIX Association, Berkeley, CA, USA, 309--322.Google Scholar
Digital Library
- Hasan Hassan, Minesh Patel, Jeremie S. Kim, A. Giray Yaglikci, Nandita Vijaykumar, Nika Mansouri Ghiasi, Saugata Ghose, and Onur Mutlu. 2019. CROW: A Low-Cost Substrate for Improving DRAM Performance, Energy Efficiency, and Reliability. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA '19). Association for Computing Machinery, New York, NY, USA, 129--142.Google Scholar
Digital Library
- John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News , Vol. 34, 4 (Sept. 2006), 1--17.Google Scholar
- IBM. 2020. IBM Cloud Services. https://www.ibm.com/services/cloudGoogle Scholar
- Intel. 2017. Intel Optane Memory . https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/optane-memory-brief.pdfGoogle Scholar
- Anca Iordache, Guillaume Pierre, Peter Sanders, Jose Gabriel de F. Coutinho, and Mark Stillwell. 2016. High Performance in the Cloud with FPGA Groups. In Proceedings of the 9th International Conference on Utility and Cloud Computing (UCC '16). Association for Computing Machinery, New York, NY, USA, 1--10.Google Scholar
Digital Library
- Ciji Isen and Lizy John. 2009. ESKIMO: Energy Savings Using Semantic Knowledge of Inconsequential Memory Occupancy for DRAM Subsystem. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). Association for Computing Machinery, New York, NY, USA, 337--346.Google Scholar
- Shubham Jain, Ashish Ranjan, Kaushik Roy, and Anand Raghunathan. 2017. Computing in memory with spin-transfer torque magnetic ram. IEEE Transactions on Very Large Scale Integration (VLSI) Systems , Vol. 26, 3 (2017), 470--483.Google Scholar
Digital Library
- Uksong Kang, Hak-Soo Yu, Churoo Park, Hongzhong Zheng, John Halbert, Kuljit Bains, S Jang, and Joo Sun Choi. 2014. Co-architecting controllers and DRAM to enhance DRAM process scaling. In The Memory Forum .Google Scholar
- Harshad Kasture, Davide B. Bartolini, Nathan Beckmann, and Daniel Sanchez. 2015. Rubik: Fast Analytical Power Management for Latency-Critical Systems. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). Association for Computing Machinery, New York, NY, USA, 598--610.Google Scholar
Digital Library
- Harshad Kasture and Daniel Sanchez. 2016. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In 2016 IEEE International Symposium on Workload Characterization (IISWC). IEEE, Providence, RI, USA, 1--10.Google Scholar
Cross Ref
- Samira Khan, Donghyuk Lee, Yoongu Kim, Alaa R. Alameldeen, Chris Wilkerson, and Onur Mutlu. 2014. The Efficacy of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study. In The 2014 ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '14). Association for Computing Machinery, New York, NY, USA, 519--532.Google Scholar
- Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory. SIGARCH Comput. Archit. News , Vol. 44, 3 (June 2016), 380--392.Google Scholar
Digital Library
- Jagadish B. Kotra, Narges Shahidi, Zeshan A. Chishti, and Mahmut T. Kandemir. 2017. Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 723--736.Google Scholar
- Jagadish B. Kotra, Haibo Zhang, Alaa R. Alameldeen, Chris Wilkerson, and Mahmut T. Kandemir. 2018. CHAMELEON: A Dynamically Reconfigurable Heterogeneous Memory System. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-51). IEEE Press, Fukuoka, Japan, 533--545.Google Scholar
- Shahar Kvatinsky, Dmitry Belousov, Slavik Liman, Guy Satat, Nimrod Wald, Eby G Friedman, Avinoam Kolodny, and Uri C Weiser. 2014. MAGIC-Memristor-aided logic. IEEE Transactions on Circuits and Systems II: Express Briefs , Vol. 61, 11 (2014), 895--899.Google Scholar
Cross Ref
- Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. 2016. Coordinated and Efficient Huge Page Management with Ingens. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 705--721.Google Scholar
Digital Library
- Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble. 2014. Tales of the Tail: Hardware, OS, and Application-Level Sources of Tail Latency. In Proceedings of the ACM Symposium on Cloud Computing (SOCC '14). Association for Computing Machinery, New York, NY, USA, 1--14.Google Scholar
- Shuangchen Li, Cong Xu, Qiaosha Zou, Jishen Zhao, Yu Lu, and Yuan Xie. 2016. Pinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-Volatile Memories. In Proceedings of the 53rd Annual Design Automation Conference (DAC '16). Association for Computing Machinery, New York, NY, USA, Article Article 173, bibinfonumpages6 pages.Google Scholar
Digital Library
- Zheng Li, Selome Tesfatsion, Saeed Bastani, Ahmed Ali-Eldin, Erik Elmroth, Maria Kihl, and Rajiv Ranjan. 2017. A survey on modeling energy consumption of cloud applications: deconstruction, state of the art, and trade-off debates. IEEE Transactions on Sustainable Computing , Vol. 2, 3 (July 2017), 255--274.Google Scholar
Cross Ref
- Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K. Reinhardt, and Thomas F. Wenisch. 2009. Disaggregated Memory for Expansion and Sharing in Blade Servers. SIGARCH Comput. Archit. News , Vol. 37, 3 (June 2009), 267--278.Google Scholar
Digital Library
- Kevin Lim, Yoshio Turner, Jose Renato Santos, Alvin AuYoung, Jichuan Chang, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2012. System-Level Implications of Disaggregated Memory. In Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA '12). IEEE Computer Society, USA, 1--12.Google Scholar
- Wei-Cheng Lin, Chia-Heng Tu, Chih-Wei Yeh, and Shih-Hao Hung. 2017. GPU acceleration for kernel samepage merging. In 2017 IEEE 23rd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). IEEE, IEEE, Hsinchu, Taiwan, 1--6.Google Scholar
Cross Ref
- Linux. 2009. Linux KSM - Kernel Samepage Merging (KSM) . https://www.linux-kvm.org/page/KSMGoogle Scholar
- Jamie Liu, Ben Jaiyen, Yoongu Kim, Chris Wilkerson, and Onur Mutlu. 2013. An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). Association for Computing Machinery, New York, NY, USA, 60--71.Google Scholar
Digital Library
- Jamie Liu, Ben Jaiyen, Richard Veras, and Onur Mutlu. 2012. RAIDR: Retention-Aware Intelligent DRAM Refresh. SIGARCH Comput. Archit. News , Vol. 40, 3 (June 2012), 1--12.Google Scholar
Digital Library
- Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Benjamin G. Zorn. 2011. Flikker: Saving DRAM Refresh-power Through Critical Data Partitioning. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, USA, 213--224.Google Scholar
- Elliot Lockerman, Axel Feldmann, Mohammad Bakhshalipour, Alexandru Stanescu, Shashwat Gupta, Daniel Sanchez, and Nathan Beckmann. 2020. Livia: Data-Centric Computing Throughout the Memory Hierarchy. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 417--433.Google Scholar
Digital Library
- David Meisner, Junjie Wu, and Thomas F. Wenisch. 2012. BigHouse: A Simulation Infrastructure for Data Center Systems. In Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS '12). IEEE Computer Society, USA, 35--45.Google Scholar
- Microsoft. 2020. Microsoft Azure . https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/Google Scholar
- Amirhossein Mirhosseini, Akshitha Sriraman, and Thomas F Wenisch. 2019. Enhancing server efficiency in the face of killer microseconds. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) . IEEE, Washington, DC, USA, 185--198.Google Scholar
Cross Ref
- Janani Mukundan, Hillery Hunter, Kyu-hyoun Kim, Jeffrey Stuecheli, and José F. Mart'inez. 2013. Understanding and Mitigating Refresh Overheads in High-Density DDR4 DRAM Systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). Association for Computing Machinery, New York, NY, USA, 48--59.Google Scholar
- Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, and Xun Jian. 2018. Nonblocking Memory Refresh. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA '18). IEEE Press, Los Angeles, California, 588--599.Google Scholar
Digital Library
- Stanko Novakovic, Alexandros Daglis, Edouard Bugnion, Babak Falsafi, and Boris Grot. 2014. Scale-out NUMA. SIGPLAN Not. , Vol. 49, 4 (Feb. 2014), 3--18.Google Scholar
Digital Library
- Minoru Oikawa, Atsushi Kawai, Kentaro Nomura, Kenji Yasuoka, Kazuyuki Yoshikawa, and Tetsu Narumi. 2012. DS-CUDA: A Middleware to Use Many GPUs in the Cloud Environment. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis (SCC '12). IEEE Computer Society, USA, 1207--1214.Google Scholar
Digital Library
- Ashish Panwar, Aravinda Prasad, and K. Gopinath. 2018. Making Huge Pages Actually Useful. SIGPLAN Not. , Vol. 53, 2 (March 2018), 679--692.Google Scholar
Digital Library
- David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keeton, Christoforos Kozyrakis, Randi Thomas, and Katherine Yelick. 1997. A Case for Intelligent RAM. IEEE Micro , Vol. 17, 2 (March 1997), 34--44.Google Scholar
Digital Library
- Nathan Pemberton, John D Kubiatowicz, and Randy H Katz. 2018. Enabling Efficient and Transparent Remote Memory Access in Disaggregated Datacenters . Ph.D. Dissertation. Master's thesis, University of California at Berkeley, Berkeley, CA.Google Scholar
- Moinuddin K Qureshi, Dae-Hyun Kim, Samira Khan, Prashant J Nair, and Onur Mutlu. 2015. AVATAR: A variable-retention-time (VRT) aware refresh for DRAM systems. In 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks . IEEE, Rio de Janeiro, Brazil, 427--437.Google Scholar
Digital Library
- Mehrnoosh Raoufi, Quan Deng, Youtao Zhang, and Jun Yang. 2019. PageCmp: Bandwidth Efficient Page Deduplication through In-memory Page Comparison. In 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) . IEEE, Miami, FL, USA, 82--87.Google Scholar
Cross Ref
- Samsung. 2020. Samsung DDR4. https://www.samsung.com/semiconductor/dram/ddr4/Google Scholar
- Andreas Sandberg, Nikos Nikoleris, Trevor E Carlson, Erik Hagersten, Stefanos Kaxiras, and David Black-Schaffer. 2015. Full speed ahead: Detailed architectural simulation at near-native speed. In 2015 IEEE International Symposium on Workload Characterization. IEEE, Atlanta, GA, USA, 183--192.Google Scholar
Digital Library
- Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and et al. 2013. RowClone: Fast and Energy-Efficient in-DRAM Bulk Data Copy and Initialization. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). Association for Computing Machinery, New York, NY, USA, 185--197.Google Scholar
- Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. 2017. Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50 '17). Association for Computing Machinery, New York, NY, USA, 273--287.Google Scholar
- Dimitrios Skarlatos, Nam Sung Kim, and Josep Torrellas. 2017. Pageforge: A Near-memory Content-aware Page-merging Architecture. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50 '17). ACM, New York, NY, USA, 302--314.Google Scholar
Digital Library
- Yingying Tian, Samira M. Khan, Daniel A. Jiménez, and Gabriel H. Loh. 2014. Last-level Cache Deduplication. In Proceedings of the 28th ACM International Conference on Supercomputing (ICS '14). ACM, New York, NY, USA, 53--62.Google Scholar
- Josep Torrellas. 2012. FlexRAM: Toward an Advanced Intelligent Memory System: A Retrospective Paper. In Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD 2012) (ICCD '12). IEEE Computer Society, USA, 3--4.Google Scholar
Digital Library
- Ubuntu. 2020. Ubuntu Server . https://www.ubuntu.com/serverGoogle Scholar
- Ashish Vulimiri, Philip Brighten Godfrey, Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, and Scott Shenker. 2013. Low Latency via Redundancy. In Proceedings of the Ninth ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT '13). Association for Computing Machinery, New York, NY, USA, 283--294.Google Scholar
Digital Library
- Carl A. Waldspurger. 2003. Memory Resource Management in VMware ESX Server. SIGOPS Oper. Syst. Rev. , Vol. 36, SI (Dec. 2003), 181--194.Google Scholar
- Nai Xia, Chen Tian, Yan Luo, Hang Liu, and Xiaoliang Wang. 2018. UKSM: Swift Memory Deduplication via Hierarchical and Adaptive Memory Region Distilling. In 16th USENIX Conference on File and Storage Technologies (FAST 18) . USENIX Association, Oakland, CA, 325--340.Google Scholar
- Dongli Zhang, Moussa Ehsan, Michael Ferdman, and Radu Sion. 2014a. DIMMer: A Case for Turning off DIMMs in Clouds. In Proceedings of the ACM Symposium on Cloud Computing (SOCC '14). Association for Computing Machinery, New York, NY, USA, 1--8.Google Scholar
Digital Library
- Yunqi Zhang, Michael A. Laurenzano, Jason Mars, and Lingjia Tang. 2014b. SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, USA, 406--418.Google Scholar
Digital Library
- Yunqi Zhang, David Meisner, Jason Mars, and Lingjia Tang. 2016. Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, Seoul, Republic of Korea, 456--468.Google Scholar
Digital Library
Index Terms
DSM: A Case for Hardware-Assisted Merging of DRAM Rows with Same Content
Recommendations
DSM: A Case for Hardware-Assisted Merging of DRAM Rows with Same Content
SIGMETRICS '20: Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer SystemsThe number of cores and the capacities of main memory in modern systems have been growing significantly. Specifically, memory scaling, although at a slower pace than computation scaling, provided opportunities for very large DRAMs with Terabytes (TBs) ...
DSM: A Case for Hardware-Assisted Merging of DRAM Rows with Same Content
The number of cores and the capacities of main memory in modern systems have been growing significantly. Specifically, memory scaling, although at a slower pace than computation scaling, provided opportunities for very large DRAMs with Terabytes (TBs) ...
Power management of hybrid DRAM/PRAM-based main memory
DAC '11: Proceedings of the 48th Design Automation ConferenceHybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power ...






Comments