skip to main content
research-article

Thermostat: Application-transparent Page Management for Two-tiered Main Memory

Published:04 April 2017Publication History
Skip Abstract Section

Abstract

The advent of new memory technologies that are denser and cheaper than commodity DRAM has renewed interest in two-tiered main memory schemes. Infrequently accessed application data can be stored in such memories to achieve significant memory cost savings. Past research on two-tiered main memory has assumed a 4KB page size. However, 2MB huge pages are performance critical in cloud applications with large memory footprints, especially in virtualized cloud environments, where nested paging drastically increases the cost of 4KB page management. We present Thermostat, an application-transparent huge-page-aware mechanism to place pages in a dual-technology hybrid memory system while achieving both the cost advantages of two-tiered memory and performance advantages of transparent huge pages. We present an online page classification mechanism that accurately classifies both 4KB and 2MB pages as hot or cold while incurring no observable performance overhead across several representative cloud applications. We implement Thermostat in Linux kernel version 4.5 and evaluate its effectiveness on representative cloud computing workloads running under KVM virtualization. We emulate slow memory with performance characteristics approximating near-future high-density memory technology and show that Thermostat migrates up to 50% of application footprint to slow memory while limiting performance degradation to 3%, thereby reducing memory cost up to 30%.

References

  1. Aerospike. http://www.aerospike.com/. [Online; accessed 8-Aug-2016].Google ScholarGoogle Scholar
  2. Cassandra. http://cassandra.apache.org/. [Online; accessed 8-May-2016].Google ScholarGoogle Scholar
  3. Libvirt virtualization api. http://libvirt.org/formatdomain.html#elementsCPU. [Online; accessed 13-Aug-2016].Google ScholarGoogle Scholar
  4. PerfKit Benchmarker. https://github.com/GoogleCloudPlatform/PerfKitBenchmarker. [Online; accessed 8-May-2016].Google ScholarGoogle Scholar
  5. Redis. http://redis.io/. [Online; accessed 8-May-2016].Google ScholarGoogle Scholar
  6. MemtableSSTable. https://wiki.apache.org/cassandra/MemtableSSTable. [Online; accessed 8-May-2016].Google ScholarGoogle Scholar
  7. TPC-C. http://www.tpc.org/tpcc/. [Online; accessed 8-May-2016].Google ScholarGoogle Scholar
  8. N. Agarwal, D. Nellans, M. O'Connor, S. W. Keckler, and T. F. Wenisch. Unlocking Bandwidth for GPUs in CC-NUMA Systems. In International Symposium on High-Performance Computer Architecture (HPCA), pages 354--365, February 2015 Google ScholarGoogle ScholarCross RefCross Ref
  9. N. Agarwal, D. Nellans, M. Stephenson, M. O'Connor, and S. W. Keckler. Page Placement Strategies for GPUs within Heterogeneous Memory Systems. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 607--618, March 2015 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Agarwal, D. Nellans, E. Ebrahimi, T. F. Wenisch, J. Danskin, and S. W. Keckler. Selective GPU caches to eliminate CPU-GPU HW cache coherence. In International Symposium on High-Performance Computer Architecture (HPCA), pages 494--506, Mar. 2016. Google ScholarGoogle ScholarCross RefCross Ref
  11. AMD-NPTAMD. AMD-V Nested Paging. http://developer.amd.com/wordpress/media/2012/10/NPT-WP-1%201-final-TM.pdf, 2008. [Online; accessed 2-May-2016].Google ScholarGoogle Scholar
  12. B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload Analysis of a Large-scale Key-value Store. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, pages 53--64, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Azimi, L. Soares, M. Stumm, T. Walsh, and A. D. Brown. PATH: page access tracking to improve memory management. In Proceedings of the International Symposium on Memory Management, pages 31--42, Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Baskakov, P. Gao, and J. K. Spencer. Identification of Low-activity Large Memory Pages. May 2016. URL http://www.google.ch/patents/US9330015. US Patent 9,330,015.Google ScholarGoogle Scholar
  15. A. Basu, J. Gandhi, J. Chang, M. D. Hill, and M. M. Swift. Efficient Virtual Memory for Big Memory Servers. In International Symposium on Computer Architecture (ISCA), pages 237--248, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Chen, B. Wu, D. Li, and X. Shen. PORPLE: An Extensible Optimizer for Portable Data Placement on GPU. In International Symposium on Microarchitecture (MICRO), pages 88--100, Dec. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of Symposium on Cloud Computing, pages 143--154, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Corbet. AutoNUMA: the other approach to NUMA scheduling. http://lwn.net/Articles/488709/, 2012. [Online; accessed 9-May-2016].Google ScholarGoogle Scholar
  19. 016)]hughd-hugetmpfsDickins, Hugh. huge tmpfs: THPagecache implemented by teams. https://lwn.net/Articles/682623/, 2016. [Online; accessed 30-April-2016].Google ScholarGoogle Scholar
  20. D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. Oltp-bench: An extensible testbed for benchmarking relational databases. Proceedings of the VLDB Endowment, 7 (4): 277--288, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. R. Dulloor, S. Kumar, A. Keshavamurthy, P. Lantz, D. Reddy, R. Sankaran, and J. Jackson. System Software for Persistent Memory. In Proceedings of the European Conference on Computer Systems(EuroSys), pages 15:1--15:15, Apr. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. R. Dulloor, A. Roy, Z. Zhao, N. Sundaram, N. Satish, R. Sankaran, J. Jackson, and K. Schwan. Data Tiering in Heterogeneous Memory Systems. In Proceedings of the European Conference on Computer Systems (EuroSys), pages 15:1--15:16, April 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Ekman and P. Stenstrom. A Cost-Effective Main Memory Organization for Future Servers. In International Parallel and Distributed Processing Symposium (IPDPS), pages 45a--45a, April 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Eranian. What Can Performance Counters Do for Memory Subsystem Analysis? In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 26--30, Mar. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 37--48, Mar. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Gandhi, A. Basu, M. H. Hill, and M. M. Swift. A Tool to Instrument x86--64 TLB Misses. SIGARCH Computer Architecture News (CAN), 2014.Google ScholarGoogle Scholar
  27. M. Gorman. Huge pages part 4: benchmarking with huge pages. http://lwn.net/Articles/378641/, 2010. [Online; accessed 11-Aug-2016].Google ScholarGoogle Scholar
  28. F. Guo, S. Kim, Y. Baskakov, and I. Banerjee. Proactively Breaking Large Pages to Improve Memory Overcommitment Performance in VMware ESXi. In International Conference on Virtual Execution Environments(VEE), pages 39--51, Mar. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 015)]3dcrosspointIntel. Intel and Micron Produce Breakthrough Memory Technology. https://newsroom.intel.com/news-releases/intel-and-micron-produce-breakthrough-memory-technology/, 2015. [Online; accessed 29-April-2016].Google ScholarGoogle Scholar
  30. 016)]Intel-sw-manualIntel. Intel® 64 and IA-32 Architectures Developer's Manual, 2016. [Online; accessed 2-May-2016].Google ScholarGoogle Scholar
  31. S. Kannan, A. Gavrilovska, and K. Schwan. pVM: Persistent Virtual Memory for Efficient Capacity Scaling and Object Storage. In Proceedings of the European Conference on Computer Systems(EuroSys), pages 13:1--13:16, Apr. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. 011)]kstaledLespinasse, Michel. idle page tracking / working set estimation. https://lwn.net/Articles/460762/, 2011. [Online; accessed 29-April-2016].Google ScholarGoogle Scholar
  33. Y. Li, J. Choi, J. Sun, S. Ghose, H. Wang, J. Meza, J. Ren, and O. Mutlu. Managing Hybrid Main Memories with a Page-Utility Driven Performance Model. pharXiv preprint arXiv:1507.03303, 2015.Google ScholarGoogle Scholar
  34. K. Lim, J. Chang, T. Mudge, P. Ranganathan, S. K. Reinhardt, and T. F. Wenisch. Disaggregated Memory for Expansion and Sharing in Blade Servers. In International Symposium on Computer Architecture (ISCA), pages 267--278, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. K. Lim, Y. Turner, J. R. Santos, A. AuYoung, J. Chang, P. Ranganathan, and T. F. Wenisch. System-level Implications of Disaggregated Memory. In International Symposium on High-Performance Computer Architecture (HPCA), pages 1--12, Feb. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. F. X. Lin and X. Liu. Memif: Towards Programming Heterogeneous Memory Asynchronously. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 369--383, Apr. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the Conference on Programming Language Design and Implementation(PLDI), pages 190--200, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. ]cgroupsMenage, Paul. CGROUPS . http://lxr.free-electrons.com/source/Documentation/cgroup-v1/memory.txt. [Online; accessed 4-May-2016].Google ScholarGoogle Scholar
  39. A. Mirhosseini, A. Agrawal, and J. Torrellas. Survive: Pointer-based In-DRAM Incremental Checkpointing for Low-Cost Data Persistence and Rollback-Recovery. IEEE Computer Architecture Letters, PP (99): 1--1, 2016.Google ScholarGoogle Scholar
  40. M. Qureshi, J. Karidis, M. Franceschini, V. Srinivasan, L. Lastras, and B. Abali. Enhancing Lifetime and Security of PCM-Based Main Memory with Start-Gap Wear Leveling. In International Symposium on Microarchitecture (MICRO), Nov. 2009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. K. Qureshi, V. Srinivasan, and J. A. Rivers. Scalable High Performance Main Memory System Using Phase-change Memory Technology. In International Symposium on Computer Architecture (ISCA), pages 24--33, June 2009Google ScholarGoogle Scholar
  42. C. A. Waldspurger. Memory Resource Management in VMware ESX Server. SIGOPS Operating Systems Review, 36 (SI): 181--194, Dec. 2002.Google ScholarGoogle Scholar
  43. T. Walsh. Generating Miss Rate Curves with Low Overhead Using Existing Hardware. Master's thesis, University of Toronto, 2009.Google ScholarGoogle Scholar
  44. P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic Tracking of Page Miss Ratio Curve for Memory Management. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 177--188, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Thermostat: Application-transparent Page Management for Two-tiered Main Memory

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 52, Issue 4
      ASPLOS '17
      April 2017
      811 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/3093336
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
        April 2017
        856 pages
        ISBN:9781450344654
        DOI:10.1145/3037697

      Copyright © 2017 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 April 2017

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!