skip to main content
article

RTHMS: a tool for data placement on hybrid memory system

Published:18 June 2017Publication History
Skip Abstract Section

Abstract

Traditional scientific and emerging data analytics applications require fast, power-efficient, large, and persistent memories. Combining all these characteristics within a single memory technology is expensive and hence future supercomputers will feature different memory technologies side-by-side. However, it is a complex task to program hybrid-memory systems and to identify the best object-to-memory mapping. We envision that programmers will probably resort to use default configurations that only require minimal interventions on the application code or system settings. In this work, we argue that intelligent, fine-grained data placement can achieve higher performance than default setups.

We present an algorithm for data placement on hybrid-memory systems. Our algorithm is based on a set of single-object allocation rules and global data placement decisions. We also present RTHMS, a tool that implements our algorithm and provides recommendations about the object-to-memory mapping. Our experiments on a hybrid memory system, an Intel Knights Landing processor with DRAM and HBM, show that RTHMS is able to achieve higher performance than the default configuration. We believe that RTHMS will be a valuable tool for programmers working on complex hybrid-memory systems.

References

  1. The CORAL Benchmarks. https://asc.llnl.gov/CORALbenchmarks/, 2017. {Online; accessed 15-Janurary-2017}. Rodinia:Accelerating Compute-Intensive Applications with Accelerators. https://www.cs.virginia.edu/~skadron/wiki/rodinia/ index.php/, 2017. {Online; accessed 15-Janurary-2017}. The DGEMM Benchmark. http://www.nersc.gov/research-anddevelopment/apex/apex-benchmarks/dgemm/, 2017. {Online; accessed 15-Janurary-2017}. Knights Landing (KNL) Testing & Development Platform. http://www. archer.ac.uk/documentation/knl-guide/, 2017.Google ScholarGoogle Scholar
  2. {Online; accessed 15-Janurary-2017}. The Graph500 Benchmark. http://www.graph500.org/, 2017. {Online; accessed 15-Janurary-2017}. Multithreaded Transposition of Square Matrices with Common Code for Intel Xeon Processors and Intel Xeon Phi Coprocessors. https: //colfaxresearch.com/multithreaded-transpositionof-square-matrices-with-common-code-for-intel-xeonprocessors-and-intel-xeon-phi-coprocessors/, 2017.Google ScholarGoogle Scholar
  3. {Online; accessed 15-Janurary-2017}. Xsbench: The Monte Carlo macroscopic cross section lookup benchmark. https://github.com/ANL-CESAR/XSBench, 2017. {Online; accessed 01-January-2017}. J. Absar and F. Catthoor. Analysis of scratch-pad and data-cache performance using statistical methods. In Asia and South Pacific Conference on Design Automation, 2006., pages 6 pp.–, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Chatterjee, M. Shevgoor, R. Balasubramonian, A. Davis, Z. Fang, R. Illikkal, and R. Iyer. Leveraging heterogeneity in dram main memories to accelerate critical word access. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-45, pages 13–24, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. R. Dulloor, A. Roy, Z. Zhao, N. Sundaram, N. Satish, R. Sankaran, J. Jackson, and K. Schwan. Data tiering in heterogeneous memory systems. In Proceedings of the Eleventh European Conference on Computer Systems, page 15. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Hassan, H. Vandierendonck, and D. S. Nikolopoulos. Software-managed energy-efficient hybrid DRAM/NVM main memory. In Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pages 23:1–23:8, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. I. Karlin, A. Bhatele, J. Keasler, B. L. Chamberlain, J. Cohen, Z. Devito, R. Haque, D. Laney, E. Luke, F. Wang, et al. Exploring traditional and emerging parallel programming models using a proxy application. In Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, pages 919–932. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Kestor, R. Gioiosa, D. J. Kerbyson, and A. Hoisie. Quantifying the energy cost of data movement in scientific applications. In 2013 IEEE International Symposium on Workload Characterization (IISWC), pages 56–65, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  9. D. Li, J. S. Vetter, G. Marin, C. McCurdy, C. Cira, Z. Liu, and W. Yu. Identifying opportunities for byte-addressable non-volatile memory in extreme-scale scientific applications. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS ’12, pages 945–956, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Acm sigplan notices, volume 40, pages 190–200. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. A. Mandelman, R. H. Dennard, G. B. Bronner, J. K. DeBrosse, R. Divakaruni, Y. Li, and C. J. Radens. Challenges and future directions for the scaling of dynamic random-access memory (dram). IBM J. Res. Dev., 46(2-3):187–212, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. D. McCalpin. A survey of memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newsletter, pages 19– 25, 1995.Google ScholarGoogle Scholar
  13. I. B. Peng, S. Markidis, E. Laure, G. Kestor, and R. Gioiosa. Exploring application performance on emerging hybrid-memory supercomputers. In High Performance Computing and Communications; 2016 IEEE 18th International Conference on, pages 473–480. IEEE, 2016.Google ScholarGoogle Scholar
  14. I. B. Peng, R. Gioiosa, G. Kestor, P. Cicotti, E. Laure, and S. Markidis. Exploring the performance benefit of hybrid memory system on HPC environments. In Parallel and Distributed Processing Symposium Workshops, 2017 IEEE International. IEEE, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  15. L. E. Ramos, E. Gorbatov, and R. Bianchini. Page placement in hybrid memory systems. In Proceedings of the international conference on Supercomputing, pages 85–95. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Shen, X. Liu, and F. X. Lin. Characterizing emerging heterogeneous memory. In Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management, pages 13–23. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Wang, B. Wu, D. Li, X. Shen, W. Yu, Y. Jiao, and J. S. Vetter. Exploring hybrid memory for gpu energy efficiency through software-hardware codesign. In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, pages 93–102. IEEE Press, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Wei, D. Jiang, S. A. McKee, J. Xiong, and M. Chen. Exploiting program semantics to place data in hybrid memory. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT), PACT ’15, pages 163–173, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. RTHMS: a tool for data placement on hybrid memory system

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!