skip to main content
research-article

History-Based Arbitration for Fairness in Processor-Interconnect of NUMA Servers

Authors Info & Claims
Published:04 April 2017Publication History
Skip Abstract Section

Abstract

NUMA (non-uniform memory access) servers are commonly used in high-performance computing and datacenters. Within each server, a processor-interconnect (e.g., Intel QPI, AMD HyperTransport) is used to communicate between the different sockets or nodes. In this work, we explore the impact of the processor-interconnect on overall performance -- in particular, the performance un- fairness caused by processor-interconnect arbitration. It is well known that locally-fair arbitration does not guarantee globally-fair bandwidth sharing as closer nodes receive more bandwidth in a multi-hop network. However, this work demonstrates that the opposite can occur in a commodity NUMA server where remote nodes receive higher bandwidth (and perform better). We analyze this problem and iden- tify that this occurs because of external concentration used in router micro-architectures for processor-interconnects without globally-aware arbitration. While accessing remote memory can occur in any NUMA system, performance un- fairness (or performance variation) is more critical in cloud computing and virtual machines with shared resources. We demonstrate how this unfairness creates significant performance variation when a workload is executed on the Xen virtualization platform. We then provide analysis using synthetic workloads to better understand the source of unfair- ness and eliminate the impact of other shared resources, including the shared last-level cache and main memory. To provide fairness, we propose a novel, history-based arbitration that tracks the history of arbitration grants made in the previous history window. A weighted arbitration is done based on the history to provide global fairness. Through simulations, we show our proposed history-based arbitration can provide global fairness and minimize the processor- interconnect performance unfairness at low cost.

References

  1. D. Abts and D. Weisser. Age-Based Packet Arbitration in Large-Radix k-ary n-cubes. In ICS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Ahn, S. Li, O. Seongil, and N. P. Jouppi. McSimAGoogle ScholarGoogle Scholar
  3. : A Manycore Simulator with Application-levelGoogle ScholarGoogle Scholar
  4. Simulation and Detailed Microarchitecture Modeling. In ISPASS, 2013.Google ScholarGoogle Scholar
  5. J. Balfour and W. J. Dally. Design Tradeoffs for Tiled CMP On-Chip Networks. In ICS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the Art of Virtualization. In SOSP, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny. QNoC: QoS Architecture and Design Process for Network on Chip. Journal of Systems Architecture, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Conway and B. Hughes. The AMD Opteron Northbridge Architecture. IEEE Micro, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. J. Dally and B. Towles. Route Packets, Not Wires: On-Chip Iinterconnection Networks. In DAC, 2001.Google ScholarGoogle Scholar
  10. W. J. Dally and B. P. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Das, O. Mutlu, T. Moscibroda, and C. R. Das. Application-Aware Prioritization Mechanisms for On-Chip Networks. In MICRO, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize, B. Lepers, V. Quema, and M. Roth. Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems. In ASPLOS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Demers, S. Keshav, and S. Shenker. Analysis and Simulation of a Fair Queueing Algorithm. In SIGCOMM, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Grot, S. W. Keckler, and O. Mutlu. Preemptive Virtual Clock: A Flexible, Efficient, and Cost-effective QOS Scheme for Networks-on-Chip. In MICRO, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Intel. An Introduction to the Intel QuickPath Interconnect, 2009. URL http://www.intel.com/content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf.Google ScholarGoogle Scholar
  16. N. Jiang, J. Balfour, D. U. Becker, B. Towles, W. J. Dally, G. Michelogiannakis, and J. Kim. A Detailed and Flexible Cycle-Accurate Network-on-Chip Simulator. In ISPASS, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  17. R. E. Kessler and J. L. Schwarzmeier. CRAY T3D: A New Dimension for Cray Research. In COMPCON, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. H. Kim and A. A. Chien. Rotating Combined Queueing (RCQ): Bandwidth and Latency Guarantees in Low-Cost, High-Performance Networks. In ISCA, 1996.Google ScholarGoogle Scholar
  19. Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. In MICRO, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Kumar, Y. Pan, J. Kim, G. Memik, and A. Choudhary. Exploring concentration and channel slicing in on-chip network router. In NOCS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. W. Lee, M. C. Ng, and K. Asanovic. Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks. In ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. M. Lee, J. Kim, D. Abts, M. Marty, and J. W. Lee. Probabilistic Distance-based Arbitration: Providing Equality of Service for Many-core CMPs. In MICRO, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Millberg, E. Nilsson, R. Thid, and A. Jantsch. Guaranteed Bandwidth using Looped Containers in Temporally Disjoint Networks within the Nostrum Network on Chip. In DATE, 2004. Google ScholarGoogle ScholarCross RefCross Ref
  24. O. Mutlu and T. Moscibroda. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. In MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. O. Mutlu and T. Moscibroda. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems. In ISCA, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Mutnury, F. Paglia, J. Mobley, G. K. Singh, and R. Bellomio. QuickPath Interconnect (QPI) Design and Aanalysis in High Speed Servers. In EPEPS, 2010.Google ScholarGoogle Scholar
  27. K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. Fair Queuing Memory Systems. In MICRO, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Ouyang and Y. Xie. LOFT: A High Performance Network-on-Chip Providing Quality-of-Service Support. In MICRO, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Rao, K. Wang, X. Zhou, and C.-Z. Xu. Optimizing Virtual Machine Scheduling in NUMA Multicore Systems. In HPCA, 2013.Google ScholarGoogle Scholar
  30. P. Salihundam, S. Jain, T. Jacob, S. Kumar, V. Erraguntla, Y. Hoskote, S. Vangal, G. Ruhl, and N. Borkar. A 2 Tb/s 6 x 4 Mesh Network for a Single-Chip Cloud Computer with DVFS in 45 nm CMOS. IEEE Journal of Solid-State Circuits, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  31. G. Sartori. Hypertransport Technology. In Platform Conference, 2001.Google ScholarGoogle Scholar
  32. W. Song, H. J. Jung, J. Ahn, J. Lee, and J. Kim. Evaluation of performance unfairness in numa system architecture. IEEE Computer Architecture Letters, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  33. W. Song, J. Kim. D. Abts, and J. Lee. Security Vulnerability in Processor-Interconnect Router Design. In CCS, 2014.Google ScholarGoogle Scholar
  34. W. Song, H. Choi, J. Kim, E. Kim, Y. Kim, and J. Kim. PIkit: A New Kernel-Independent Processor-Interconnect Rootkit. In USENIX Security, 2016.Google ScholarGoogle Scholar
  35. L. Tang, J. Mars, X. Zhang, R. Hagmann, R. Hundt, and E. Tune. Optimizing Google's Warehouse Scale Computers: The NUMA Experience. In HPCA, 2013.Google ScholarGoogle Scholar
  36. G. L. Yuan, A. Bakhoda, and T. M. Aamodt. Complexity effective memory access scheduling for many-core accelerator architectures. In MICRO, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. Sha. Memguard: Memory Bandwidth Reservation System for Efficient Performance Isolation in Multi-core Platforms. In RTAS, 2013.Google ScholarGoogle Scholar
  38. L. Zhang. Virtual Clock: A New Traffic Control Algorithm for Packet Switching Networks. In SIGCOMM, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. History-Based Arbitration for Fairness in Processor-Interconnect of NUMA Servers

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 52, Issue 4
      ASPLOS '17
      April 2017
      811 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/3093336
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
        April 2017
        856 pages
        ISBN:9781450344654
        DOI:10.1145/3037697

      Copyright © 2017 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 April 2017

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!