skip to main content
10.1145/3307681.3325398acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Public Access

Kleio: A Hybrid Memory Page Scheduler with Machine Intelligence

Published: 17 June 2019 Publication History

Abstract

The increasing demand of big data analytics for more main memory capacity in datacenters and exascale computing environments is driving the integration of heterogeneous memory technologies. The new technologies exhibit vastly greater differences in access latencies, bandwidth and capacity compared to the traditional NUMA systems. Leveraging this heterogeneity while also delivering application performance enhancements requires intelligent data placement. We present Kleio, a page scheduler with machine intelligence for applications that execute across hybrid memory components. Kleio is a hybrid page scheduler that combines existing, lightweight, history-based data tiering methods for hybrid memory, with novel intelligent placement decisions based on deep neural networks. We contribute new understanding toward the scope of benefits that can be achieved by using intelligent page scheduling in comparison to existing history-based approaches, and towards the choice of the deep learning algorithms and their parameters that are effective for this problem space. Kleio incorporates a new method for prioritizing pages that leads to highest performance boost, while limiting the resulting system resource overheads. Our performance evaluation indicates that Kleio reduces on average 80% of the performance gap between the existing solutions and an oracle with knowledge of future access pattern. Kleio provides hybrid memory systems with fast and effective neural network training and prediction accuracy levels, which bring significant application performance improvements with limited resource overheads, so as to lay the grounds for its practical integration in future systems.

References

[1]
2018. CORAL Benchmark Codes. https://asc.llnl.gov/CORAL-benchmarks/.
[2]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen,Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, San-jay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Leven-berg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. Tensor Flow: Large-Scale Machine Learning on Heterogeneous Systems. http://tensorflow.org/ Software available from tensorflow.org.
[3]
Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.
[4]
Chandranil Chakraborttii, Vikas Sinha, and Heiner Litz. 2018. SSD QoS Improvements Through Machine Learning. In Proceedings of the ACM Symposium on Cloud Computing (SoCC '18). ACM, New York, NY, USA, 511--511.
[5]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC) (IISWC '09). IEEE Computer Society, Washington, DC,USA, 44--54.
[6]
François Chollet et al. 2015. Keras. https://keras.io.
[7]
Chiachen Chou, Aamer Jaleel, and Moinuddin Qureshi. 2017. BATMAN: Techniques for Maximizing System Bandwidth of Memory Systems with stacked-DRAM. In Proceedings of the International Symposium on Memory Systems (MEM-SYS '17). ACM, New York, NY, USA, 268--280.
[8]
Anwesha Das, Frank Mueller, Charles Siegel, and Abhinav Vishnu. 2018. Desh: Deep Learning for System Health Prediction of Lead Times to Failure in HPC. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '18). ACM, New York, NY, USA, 40--51.
[9]
Thaleia Dimitra Doudali and Ada Gavrilovska. 2017. CoMerge: Toward Efficient Data Placement in Shared Heterogeneous Memory Systems. In Proceedings of the International Symposium on Memory Systems (MEMSYS '17). ACM, New York, NY, USA, 251--261.
[10]
Thaleia Dimitra Doudali and Ada Gavrilovska. 2018. Mnemo: Boosting MemoryCost Efficiency in Hybrid Memory Systems. In Proceedings of the ACM Symposiumon Cloud Computing (SoCC '18). ACM, New York, NY, USA, 523--523.
[11]
Subramanya R. Dulloor, Amitabha Roy, Zheguang Zhao, Narayanan Sundaram, Nadathur Satish, Rajesh Sankaran, Jeff Jackson, and Karsten Schwan. 2016. Data Tiering in Heterogeneous Memory Systems. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys '16). ACM, New York, NY, USA,Article 15, 16 pages.
[12]
Milad Hashemi, Kevin Swersky, Jamie Smith, Grant Ayers, Heiner Litz, JichuanChang, Christos Kozyrakis, and Parthasarathy Ranganathan. 2018. Learning Memory Access Patterns. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, Stockholmsm Ã'ssan, Stockholm Sweden, 1919--1928. http://proceedings.mlr.press/v80/hashemi18a.html
[13]
Engin Ipek, Onur Mutlu, José F. Martínez, and Rich Caruana. 2008. Self-Optimizing Memory Controllers: A Reinforcement Learning Approach. SIGARCH Comput. Archit. News 36, 3 (June 2008), 39--50.
[14]
Stefan Kaestle, Reto Achermann, Timothy Roscoe, and Tim Harris. 2015. Shoal: Smart Allocation and Replication of Memory For Parallel Programs. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). USENIX Association, Santa Clara, CA, 263--276. https://www.usenix.org/conference/atc15/technical-session/presentation/kaestle
[15]
Sudarsun Kannan, Ada Gavrilovska, Vishal Gupta, and Karsten Schwan. 2017. HeteroOS - OS Design for Heterogeneous Memory Management in Datacenter. In 44th International Symposium on Computer Architecture (ISCA'17). Toronto,ON.
[16]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRRabs/1412.6980 (2014). arXiv:1412.6980 http://arxiv.org/abs/1412.6980
[17]
Ana Klimovic, Heiner Litz, and Christos Kozyrakis. 2018. Selecta: Heterogeneous Cloud Storage Configuration for Data Analytics. In Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '18). USENIX Association, Berkeley, CA, USA, 759--773. http://dl.acm.org/citation.cfm?id=3277355.3277429
[18]
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2017. The Case for Learned Index Structures. CoRRabs/1712.01208 (2017). arXiv:1712.01208 http://arxiv.org/abs/1712.01208
[19]
Felix Xiaozhu Lin and Xu Liu. 2016. Memif: Towards Programming Heterogeneous Memory Asynchronously. SIGARCH Comput. Archit. News44, 2 (March2016), 369--383.
[20]
M. R. Meswani, S. Blagodurov, D. Roberts, J. Slice, M. Ignatowski, and G. H. Loh. 2015. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), Vol. 00. 126--136.
[21]
Disa Mhembere, Da Zheng, Carey E. Priebe, Joshua T. Vogelstein, and Randal Burns. 2017. Knor: A NUMA-Optimized In-Memory, Distributed and Semi-External-Memory K-means Library. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '17). ACM, New York, NY, USA, 67--78.
[22]
Azalia Mirhoseini, Hieu Pham, Quoc Le, Mohammad Norouzi, Samy Bengio, Benoit Steiner, Yuefeng Zhou, Naveen Kumar, Rasmus Larsen, and Jeff Dean.2017. Device Placement Optimization with Reinforcement Learning. https://arxiv.org/abs/1706.04972
[23]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness,Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540(Feb. 2015), 529--533.
[24]
Arvind Narayanan, Saurabh Verma, Eman Ramadan, Pariya Babaie, and Zhi-Li Zhang. 2018. DeepCache: A Deep Learning Based Framework For Content Caching. In Proceedings of the 2018 Workshop on Network Meets AI & ML (NetAI'18). ACM, New York, NY, USA, 48--53.
[25]
Mark Oskin and Gabriel H. Loh. 2015. A Software-Managed Approach to Die-Stacked DRAM. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT) (PACT '15). IEEE Computer Society, Washington, DC, USA, 188--200.
[26]
Florian Schmidt, Mathias Niepert, and Felipe Huici. 2018. Representation Learning for Resource Usage Prediction. CoRRabs/1802.00673 (2018). arXiv:1802.00673 http://arxiv.org/abs/1802.00673
[27]
Du Shen, Xu Liu, and Felix Xiaozhu Lin. 2016. Characterizing Emerging Heterogeneous Memory. In Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management (ISMM 2016). ACM, New York, NY, USA,13--23.
[28]
Kai Wu, Yingchao Huang, and Dong Li. 2017. Unimem: Runtime Data Management on Non-volatile Memory-based Heterogeneous Main Memory. In Proceedings of the International Conference for High Performance Computing, Networking,Storage and Analysis (SC '17). ACM, New York, NY, USA, Article 58, 14 pages.
[29]
Kai Wu, Jie Ren, and Dong Li. 2018. Runtime Data Management on Non-volatile Memory-based Heterogeneous Memory for Task-parallel Programs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). IEEE Press, Piscataway, NJ, USA, Article 31, 13 pages. http://dl.acm.org/citation.cfm?id=3291656.3291698
[30]
Panruo Wu, Dong Li, Zizhong Chen, Jeffrey S. Vetter, and Sparsh Mittal. 2016. Algorithm-Directed Data Placement in Explicitly Managed Non-Volatile Memory. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC '16). ACM, New York, NY, USA, 141--152.
[31]
Yuan Zeng and Xiaochen Guo. 2017. Long Short Term Memory Based Hard-ware Prefetcher: A Case Study. In Proceedings of the International Symposiumon Memory Systems (MEMSYS '17). ACM, New York, NY, USA, 305--311.
[32]
Minjia Zhang, Samyam Rajbhandari, Wenhan Wang, and Yuxiong He. 2018. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA,951--965. https://www.usenix.org/conference/atc18/presentation/zhang-minjia

Cited By

View all
  • (2024)AutoOSProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692362(7511-7525)Online publication date: 21-Jul-2024
  • (2024)TelescopeProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692018(409-424)Online publication date: 10-Jul-2024
  • (2024)Intelligent Hybrid Memory Scheduling Based on Page Pattern Recognition2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546893(1-2)Online publication date: 25-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '19: Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing
June 2019
278 pages
ISBN:9781450366700
DOI:10.1145/3307681
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data tiering
  2. emerging memory technologies
  3. heterogeneous memory systems
  4. hybrid memory systems
  5. long short term memory networks
  6. machine intelligence
  7. machine learning
  8. non volatile memory
  9. page scheduler
  10. recurrent neural networks

Qualifiers

  • Research-article

Funding Sources

Conference

HPDC '19
Sponsor:

Acceptance Rates

HPDC '19 Paper Acceptance Rate 22 of 106 submissions, 21%;
Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)514
  • Downloads (Last 6 weeks)54
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)AutoOSProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692362(7511-7525)Online publication date: 21-Jul-2024
  • (2024)TelescopeProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692018(409-424)Online publication date: 10-Jul-2024
  • (2024)Intelligent Hybrid Memory Scheduling Based on Page Pattern Recognition2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546893(1-2)Online publication date: 25-Mar-2024
  • (2024)Genetic Cache: A Machine Learning Approach to Designing DRAM Cache Controllers in HBM SystemsACM Journal on Emerging Technologies in Computing Systems10.1145/367696620:3(1-24)Online publication date: 8-Jul-2024
  • (2024)Do Predictors for Resource Overcommitment Even Predict?Proceedings of the 4th Workshop on Machine Learning and Systems10.1145/3642970.3655838(153-160)Online publication date: 22-Apr-2024
  • (2024)MTM: Rethinking Memory Profiling and Migration for Multi-Tiered Large MemoryProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650075(803-817)Online publication date: 22-Apr-2024
  • (2024)IDT: Intelligent Data Placement for Multi-tiered Main Memory with Reinforcement LearningProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658659(69-82)Online publication date: 3-Jun-2024
  • (2024)Enabling Large Dynamic Neural Network Training with Learning-based Memory Management2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00066(788-802)Online publication date: 2-Mar-2024
  • (2024)PatternS: An intelligent hybrid memory scheduler driven by page pattern recognitionJournal of Systems Architecture10.1016/j.sysarc.2024.103178153(103178)Online publication date: Aug-2024
  • (2024)Olsync: Object-level tiering and coordination in tiered storage systems based on software-defined networkFuture Generation Computer Systems10.1016/j.future.2024.107521(107521)Online publication date: Oct-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media