10.1145/3030207.3030230acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

TARUC: A Topology-Aware Resource Usability and Contention Benchmark

Authors Info & Claims
Published:17 April 2017Publication History

ABSTRACT

Computer architects have increased hardware parallelism and power efficiency by integrating massively parallel hardware accelerators (coprocessors) into compute systems. Many modern HPC clusters now consist of multi-CPU nodes along with additional hardware accelerators in the form of graphics processing units (GPUs). Each CPU and GPU is integrated with system memory via communication links (QPI and PCIe) and multi-channel memory controllers. The increasing density of these heterogeneous computing systems has resulted in complex performance phenomena including non-uniform memory access (NUMA) and resource contention that make application performance hard to predict and tune. This paper presents the Topology Aware Resource Usability and Contention (TARUC) benchmark. TARUC is a modular, open-source, and highly configurable benchmark useful for profiling dense heterogeneous systems to provide insight for developers who wish to tune application codes for specific systems. Analysis of TARUC performance profiles from a multi-CPU, multi-GPU system is also presented.

References

  1. G. Baker. An emperical study of contention and NUMA effects on heterogeneous computing systems. Master's thesis, California Polytechnic State University, June 2016.Google ScholarGoogle Scholar
  2. L. Bergstrom. Measuring NUMA Effects With the STREAM Benchmark. arXiv preprint arXiv:1103.3225, 2011.Google ScholarGoogle Scholar
  3. F. Gaud, B. Lepers, J. Decouchant, J. Funston, A. Fedorova, and V. Quéma. Large Pages May be Harmful on NUMA Systems. In 2014 USENIX Annual Technical Conference (USENIX ATC 14), pages 231--242, 2014.Google ScholarGoogle Scholar
  4. Intel. White paper: An Introduction to the ÂŹQuickPath Interconnect. Technical report, Intel Corporation, January 2009.Google ScholarGoogle Scholar
  5. P. Jacob, A. Zia, O. Erdogan, P. M. Belemjian, J.-W. Kim, M. Chu, R. P. Kraft, J. F. McDonald, and K. Bernstein. Mitigating Memory Wall Effects in High-Clock-Rate and Multicore CMOS 3-D Processor Memory Stacks. Proceedings of the IEEE, 97(1):108--122, 2009. Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Lawley. White paper: Understanding Performance of PCI Express Systems. Technical report, XILINX, October 2014.Google ScholarGoogle Scholar
  7. S. A. McKee. Reflections on the Memory Wall. In Proceedings of the 1st conference on Computing frontiers, page 162. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Spafford, J. S. Meredith, and J. S. Vetter. Quantifying NUMA and Contention Effects in Multi-GPU Systems. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, page 11. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Su, D. Li, D. S. Nikolopoulos, M. Grove, K. Cameron, and B. R. De Supinski. Critical Path-Based Thread Placement for NUMA Systems. ACM SIGMETRICS Performance Evaluation Review, 40(2):106--112, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. The Top500 List of Supercomputers. http://www.top500.org. Accessed: 2016-4-14.Google ScholarGoogle Scholar

Index Terms

  1. TARUC: A Topology-Aware Resource Usability and Contention Benchmark

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!