skip to main content
research-article

Paragon: QoS-aware scheduling for heterogeneous datacenters

Published:16 March 2013Publication History
Skip Abstract Section

Abstract

Large-scale datacenters (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty to match applications to one of the many hardware platforms available can degrade performance, violating the quality of service (QoS) guarantees that many cloud workloads require. While previous work has identified the impact of heterogeneity and interference, existing solutions are computationally intensive, cannot be applied online and do not scale beyond few applications.

We present Paragon, an online and scalable DC scheduler that is heterogeneity and interference-aware. Paragon is derived from robust analytical methods and instead of profiling each application in detail, it leverages information the system already has about applications it has previously seen. It uses collaborative filtering techniques to quickly and accurately classify an unknown, incoming workload with respect to heterogeneity and interference in multiple shared resources, by identifying similarities to previously scheduled applications. The classification allows Paragon to greedily schedule applications in a manner that minimizes interference and maximizes server utilization. Paragon scales to tens of thousands of servers with marginal scheduling overheads in terms of time or state.

We evaluate Paragon with a wide range of workload scenarios, on both small and large-scale systems, including 1,000 servers on EC2. For a 2,500-workload scenario, Paragon enforces performance guarantees for 91% of applications, while significantly improving utilization. In comparison, heterogeneity-oblivious, interference-oblivious and least-loaded schedulers only provide similar guarantees for 14%, 11% and 3% of workloads. The differences are more striking in oversubscribed scenarios where resource efficiency is more critical.

References

  1. A. Alameldeen, D. Wood. "IPC Considered Harmful for Multiprocessor Workloads". In IEEE Micro, July/Aug. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Banga, P. Druschel, J. Mogul. "Resource containers: a new facility for resource management in server systems". In Proc. of OSDI, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Barroso. "Warehouse-Scale Computing: Entering the Teenage Decade". ISCA Keynote, SJ, June 2011. Google ScholarGoogle Scholar
  4. L. A. Barroso, U. Holzle. "The Datacenter as a Computer". Synthesis Series on Computer Architecture, May 2009.Google ScholarGoogle Scholar
  5. L. A. Barroso and U. Holzle. "The Case for Energy- Proportional Computing". Computer, 40(12):33--37, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. M. Bell. Y. Koren, C. Volinsky. "The BellKor 2008 Solution to the Netflix Prize". Technical report, AT&T Labs, Oct 2007.Google ScholarGoogle Scholar
  7. L. Bottou. "Large-Scale Machine Learning with Stochastic Gradient Descent". In Proc. of COMPSTAT 2010.Google ScholarGoogle Scholar
  8. C. Bienia, et al. "The PARSEC benchmark suite: Characterization and architectural implications". In Proc. of PACT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Calder, et al. "Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency". In Proc. of SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Chase, D. Anderson, et al. "Managing Energy and Server Resources in Hosting Centers". In SIGOPS, 35(5):103--116, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Craeynest, et al. "Scheduling Heterogeneous Multi-Cores through Performance Impact Estimation (PIE)". In Proc. of ISCA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Dean and S. Ghemawat. "MapReduce: Simplified Data Processing on Large Clusters". In Proc. of OSDI, SF, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Amazon Elastic Compute Cloud.http://aws.amazon.com/ec2/Google ScholarGoogle Scholar
  14. A. Fedorova, D. Vengerov, D. Doucette. "Operating System on Heterogeneous Core Systems". In Proc. of OSHMA, 2007.Google ScholarGoogle Scholar
  15. S. Ghemawat, H. Gobioff, S.-T Leung . "The Google File System". In Proc. of SOSP, NY, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Gmach, J. Rolia, et al. "Workload Analysis and Demand Prediction of Enterprise Data Center Applications". In Proc. of IISWC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Google Compute Engine. cloud.google.com/compute.Google ScholarGoogle Scholar
  18. S. Govindan, et al. "Cuanta: Quantifying effects of shared on-chip resource interference for consolidated virtual machines. In Proc. of SOCC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J.R. Hamilton. "Cost of Power in Large-Scale Data Centers". http://perspectives.mvdirona.com.Google ScholarGoogle Scholar
  20. B. Hindman, et al. "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center". In Proc. of NSDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Jaleel, M. Mattina, B. Jacob. "Last Level Cache (LLC) Performance of Data Mining Workloads On a CMP - A Case Study of Parallel Bioinformatics Workloads". In Proc. of 12th HPCA, TX, 2006.Google ScholarGoogle Scholar
  22. J. Katz and Y. Lindell. "Introduction to Modern Cryptography". Chapman & Hall/CRC Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Kozyrakis, A. Kansal, et al. "Server Engineering Insights for Large-Scale Online Services". In IEEE Micro, vol.30, no.4, July 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Kiwiel. "Convergence and efficiency of subgradient methods for quasiconvex minimization". Math. Programming, Springer, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  25. J. Leverich, C. Kozyrakis. "On the Energy (In)Efficiency of Hadoop Clusters". In Proc. of HotPower, October 2009.Google ScholarGoogle Scholar
  26. J. Lin, A. Kolcz. "Large-Scale Machine Learning at Twitter". In Proc. of SIGMOD, Scottsdale, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Mars, L. Tang, R. Hundt. "Heterogeneity in "Homogeneous" Warehouse-Scale Computers: A Performance Opportunity". In IEEE CAL, July-December 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Mars, L. Tang, et al. "Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations". In Proc. of MICRO-44, Brazil, December 2011 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Meisner, C. Sadler, L. A. Barroso, et al. "Power Management of On-line Data-Intensive Services". In Proc. of ISCA, SJ, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Narayanan, B. Ozisikyilmaz, et al. "MineBench: A Bench- mark Suite for DataMining Workloads". In Proc. of IISWC, 2006.Google ScholarGoogle Scholar
  31. R. Nathuji, C. Isci, E. Gorbatov. "Exploiting platform heterogeneity for power efficient data centers". In Proc. of ICAC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Nathuji, et al. "Q-Clouds: Managing Performance Interference Effects for QoS-Aware Clouds". In Proc. of EuroSys, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Rackspace. http://www.rackspace.com/.Google ScholarGoogle Scholar
  34. A. Rajaraman and J. Ullman. "Textbook on Mining of Massive Datasets", 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Amazon EC2: Rightscale. https://aws.amazon.com/ solution-providers/isv/rightscale.Google ScholarGoogle Scholar
  36. D. Sanchez, C. Kozyrakis. "Vantage: Scalable and Efficient Fine-Grain Cache Partitioning". In Proc. of ISCA, SJ, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Shelepov, J. Saez, et al. "HASS: A Scheduler for Heterogeneous Multicore Systems". In OSP, vol. 43, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Sun, Y. Xie, H. Zhang, C. Faloutsos. "Less is More: Compact Matrix Decomposition for Large Sparse Graphs". In Proc. of SDM, 2007.Google ScholarGoogle Scholar
  39. N. Vasic, et al. "DejaVu: Accelerating Resource Allocation in Virtualized Environments". In Proc. of ASPLOS, London, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. vMotion TM. http://www.vmware.com/products/vmotionGoogle ScholarGoogle Scholar
  41. VMWare vSphere. http://www.vmware.com/products/vsphere/Google ScholarGoogle Scholar
  42. T. Wenisch, et al. "SimFlex: Statistical Sampling of Computer System Simulation". In IEEE MICRO, vol. 26, no. 4, Jul-Aug 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Windows Azure. http://www.windowsazure.com/.Google ScholarGoogle Scholar
  44. I. Witten, E. Frank et al. "Data Mining: Practical Machine Learning Tools and Techniques". M. Kaufmann, 3rd Edition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. Woo, M. Ohara, et al. "The SPLASH-2 Programs: Characterization and Methodological Considerations". In Proc. of the 22nd ISCA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Xen Hypervisor 4.0. http://www.xen.org/Google ScholarGoogle Scholar
  47. XenServer. http://www.citrix.com/products/xenserver/overview.htmlGoogle ScholarGoogle Scholar
  48. X. Zhu, et al. "1000 Islands: An Integrated Approach to Resource Management for Vurtualized Datacenters". In Cluster Computing Journal, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Paragon: QoS-aware scheduling for heterogeneous datacenters

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 48, Issue 4
      ASPLOS '13
      April 2013
      540 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2499368
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
        March 2013
        574 pages
        ISBN:9781450318709
        DOI:10.1145/2451116

      Copyright © 2013 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 March 2013

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!