Abstract
Traditional scientific and emerging data analytics applications require fast, power-efficient, large, and persistent memories. Combining all these characteristics within a single memory technology is expensive and hence future supercomputers will feature different memory technologies side-by-side. However, it is a complex task to program hybrid-memory systems and to identify the best object-to-memory mapping. We envision that programmers will probably resort to use default configurations that only require minimal interventions on the application code or system settings. In this work, we argue that intelligent, fine-grained data placement can achieve higher performance than default setups.
We present an algorithm for data placement on hybrid-memory systems. Our algorithm is based on a set of single-object allocation rules and global data placement decisions. We also present RTHMS, a tool that implements our algorithm and provides recommendations about the object-to-memory mapping. Our experiments on a hybrid memory system, an Intel Knights Landing processor with DRAM and HBM, show that RTHMS is able to achieve higher performance than the default configuration. We believe that RTHMS will be a valuable tool for programmers working on complex hybrid-memory systems.
- The CORAL Benchmarks. https://asc.llnl.gov/CORALbenchmarks/, 2017. {Online; accessed 15-Janurary-2017}. Rodinia:Accelerating Compute-Intensive Applications with Accelerators. https://www.cs.virginia.edu/~skadron/wiki/rodinia/ index.php/, 2017. {Online; accessed 15-Janurary-2017}. The DGEMM Benchmark. http://www.nersc.gov/research-anddevelopment/apex/apex-benchmarks/dgemm/, 2017. {Online; accessed 15-Janurary-2017}. Knights Landing (KNL) Testing & Development Platform. http://www. archer.ac.uk/documentation/knl-guide/, 2017.Google Scholar
- {Online; accessed 15-Janurary-2017}. The Graph500 Benchmark. http://www.graph500.org/, 2017. {Online; accessed 15-Janurary-2017}. Multithreaded Transposition of Square Matrices with Common Code for Intel Xeon Processors and Intel Xeon Phi Coprocessors. https: //colfaxresearch.com/multithreaded-transpositionof-square-matrices-with-common-code-for-intel-xeonprocessors-and-intel-xeon-phi-coprocessors/, 2017.Google Scholar
- {Online; accessed 15-Janurary-2017}. Xsbench: The Monte Carlo macroscopic cross section lookup benchmark. https://github.com/ANL-CESAR/XSBench, 2017. {Online; accessed 01-January-2017}. J. Absar and F. Catthoor. Analysis of scratch-pad and data-cache performance using statistical methods. In Asia and South Pacific Conference on Design Automation, 2006., pages 6 pp.–, 2006. Google Scholar
Digital Library
- N. Chatterjee, M. Shevgoor, R. Balasubramonian, A. Davis, Z. Fang, R. Illikkal, and R. Iyer. Leveraging heterogeneity in dram main memories to accelerate critical word access. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-45, pages 13–24, 2012. Google Scholar
Digital Library
- S. R. Dulloor, A. Roy, Z. Zhao, N. Sundaram, N. Satish, R. Sankaran, J. Jackson, and K. Schwan. Data tiering in heterogeneous memory systems. In Proceedings of the Eleventh European Conference on Computer Systems, page 15. ACM, 2016. Google Scholar
Digital Library
- A. Hassan, H. Vandierendonck, and D. S. Nikolopoulos. Software-managed energy-efficient hybrid DRAM/NVM main memory. In Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pages 23:1–23:8, 2015. Google Scholar
Digital Library
- I. Karlin, A. Bhatele, J. Keasler, B. L. Chamberlain, J. Cohen, Z. Devito, R. Haque, D. Laney, E. Luke, F. Wang, et al. Exploring traditional and emerging parallel programming models using a proxy application. In Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, pages 919–932. IEEE, 2013. Google Scholar
Digital Library
- G. Kestor, R. Gioiosa, D. J. Kerbyson, and A. Hoisie. Quantifying the energy cost of data movement in scientific applications. In 2013 IEEE International Symposium on Workload Characterization (IISWC), pages 56–65, 2013.Google Scholar
Cross Ref
- D. Li, J. S. Vetter, G. Marin, C. McCurdy, C. Cira, Z. Liu, and W. Yu. Identifying opportunities for byte-addressable non-volatile memory in extreme-scale scientific applications. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS ’12, pages 945–956, 2012. Google Scholar
Digital Library
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Acm sigplan notices, volume 40, pages 190–200. ACM, 2005. Google Scholar
Digital Library
- J. A. Mandelman, R. H. Dennard, G. B. Bronner, J. K. DeBrosse, R. Divakaruni, Y. Li, and C. J. Radens. Challenges and future directions for the scaling of dynamic random-access memory (dram). IBM J. Res. Dev., 46(2-3):187–212, 2002. Google Scholar
Digital Library
- J. D. McCalpin. A survey of memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newsletter, pages 19– 25, 1995.Google Scholar
- I. B. Peng, S. Markidis, E. Laure, G. Kestor, and R. Gioiosa. Exploring application performance on emerging hybrid-memory supercomputers. In High Performance Computing and Communications; 2016 IEEE 18th International Conference on, pages 473–480. IEEE, 2016.Google Scholar
- I. B. Peng, R. Gioiosa, G. Kestor, P. Cicotti, E. Laure, and S. Markidis. Exploring the performance benefit of hybrid memory system on HPC environments. In Parallel and Distributed Processing Symposium Workshops, 2017 IEEE International. IEEE, 2017.Google Scholar
Cross Ref
- L. E. Ramos, E. Gorbatov, and R. Bianchini. Page placement in hybrid memory systems. In Proceedings of the international conference on Supercomputing, pages 85–95. ACM, 2011. Google Scholar
Digital Library
- D. Shen, X. Liu, and F. X. Lin. Characterizing emerging heterogeneous memory. In Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management, pages 13–23. ACM, 2016. Google Scholar
Digital Library
- B. Wang, B. Wu, D. Li, X. Shen, W. Yu, Y. Jiao, and J. S. Vetter. Exploring hybrid memory for gpu energy efficiency through software-hardware codesign. In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, pages 93–102. IEEE Press, 2013. Google Scholar
Digital Library
- W. Wei, D. Jiang, S. A. McKee, J. Xiong, and M. Chen. Exploiting program semantics to place data in hybrid memory. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT), PACT ’15, pages 163–173, 2015. Google Scholar
Digital Library
Index Terms
RTHMS: a tool for data placement on hybrid memory system
Recommendations
ATMem: adaptive data placement in graph applications on heterogeneous memories
CGO 2020: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and OptimizationActive development in new memory devices, such as non-volatile memories and high-bandwidth memories, brings heterogeneous memory systems (HMS) as a promising solution for implementing large-scale memory systems with cost, area, and power limitations. ...
Page placement in hybrid memory systems
ICS '11: Proceedings of the international conference on SupercomputingPhase-Change Memory (PCM) technology has received substantial attention recently. Because PCM is byte-addressable and exhibits access times in the nanosecond range, it can be used in main memory designs. In fact, PCM has higher density and lower idle ...
RTHMS: a tool for data placement on hybrid memory system
ISMM 2017: Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory ManagementTraditional scientific and emerging data analytics applications require fast, power-efficient, large, and persistent memories. Combining all these characteristics within a single memory technology is expensive and hence future supercomputers will ...






Comments