Abstract
Large-scale datacenters (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty to match applications to one of the many hardware platforms available can degrade performance, violating the quality of service (QoS) guarantees that many cloud workloads require. While previous work has identified the impact of heterogeneity and interference, existing solutions are computationally intensive, cannot be applied online and do not scale beyond few applications.
We present Paragon, an online and scalable DC scheduler that is heterogeneity and interference-aware. Paragon is derived from robust analytical methods and instead of profiling each application in detail, it leverages information the system already has about applications it has previously seen. It uses collaborative filtering techniques to quickly and accurately classify an unknown, incoming workload with respect to heterogeneity and interference in multiple shared resources, by identifying similarities to previously scheduled applications. The classification allows Paragon to greedily schedule applications in a manner that minimizes interference and maximizes server utilization. Paragon scales to tens of thousands of servers with marginal scheduling overheads in terms of time or state.
We evaluate Paragon with a wide range of workload scenarios, on both small and large-scale systems, including 1,000 servers on EC2. For a 2,500-workload scenario, Paragon enforces performance guarantees for 91% of applications, while significantly improving utilization. In comparison, heterogeneity-oblivious, interference-oblivious and least-loaded schedulers only provide similar guarantees for 14%, 11% and 3% of workloads. The differences are more striking in oversubscribed scenarios where resource efficiency is more critical.
- A. Alameldeen, D. Wood. "IPC Considered Harmful for Multiprocessor Workloads". In IEEE Micro, July/Aug. 2006. Google Scholar
Digital Library
- G. Banga, P. Druschel, J. Mogul. "Resource containers: a new facility for resource management in server systems". In Proc. of OSDI, 1999. Google Scholar
Digital Library
- L. Barroso. "Warehouse-Scale Computing: Entering the Teenage Decade". ISCA Keynote, SJ, June 2011. Google Scholar
- L. A. Barroso, U. Holzle. "The Datacenter as a Computer". Synthesis Series on Computer Architecture, May 2009.Google Scholar
- L. A. Barroso and U. Holzle. "The Case for Energy- Proportional Computing". Computer, 40(12):33--37, 2007. Google Scholar
Digital Library
- R. M. Bell. Y. Koren, C. Volinsky. "The BellKor 2008 Solution to the Netflix Prize". Technical report, AT&T Labs, Oct 2007.Google Scholar
- L. Bottou. "Large-Scale Machine Learning with Stochastic Gradient Descent". In Proc. of COMPSTAT 2010.Google Scholar
- C. Bienia, et al. "The PARSEC benchmark suite: Characterization and architectural implications". In Proc. of PACT, 2008. Google Scholar
Digital Library
- B. Calder, et al. "Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency". In Proc. of SOSP, 2011. Google Scholar
Digital Library
- J. Chase, D. Anderson, et al. "Managing Energy and Server Resources in Hosting Centers". In SIGOPS, 35(5):103--116, 2001. Google Scholar
Digital Library
- K. Craeynest, et al. "Scheduling Heterogeneous Multi-Cores through Performance Impact Estimation (PIE)". In Proc. of ISCA, 2012. Google Scholar
Digital Library
- J. Dean and S. Ghemawat. "MapReduce: Simplified Data Processing on Large Clusters". In Proc. of OSDI, SF, 2004. Google Scholar
Digital Library
- Amazon Elastic Compute Cloud.http://aws.amazon.com/ec2/Google Scholar
- A. Fedorova, D. Vengerov, D. Doucette. "Operating System on Heterogeneous Core Systems". In Proc. of OSHMA, 2007.Google Scholar
- S. Ghemawat, H. Gobioff, S.-T Leung . "The Google File System". In Proc. of SOSP, NY, 2003. Google Scholar
Digital Library
- D. Gmach, J. Rolia, et al. "Workload Analysis and Demand Prediction of Enterprise Data Center Applications". In Proc. of IISWC, 2007. Google Scholar
Digital Library
- Google Compute Engine. cloud.google.com/compute.Google Scholar
- S. Govindan, et al. "Cuanta: Quantifying effects of shared on-chip resource interference for consolidated virtual machines. In Proc. of SOCC, 2011. Google Scholar
Digital Library
- J.R. Hamilton. "Cost of Power in Large-Scale Data Centers". http://perspectives.mvdirona.com.Google Scholar
- B. Hindman, et al. "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center". In Proc. of NSDI, 2011. Google Scholar
Digital Library
- A. Jaleel, M. Mattina, B. Jacob. "Last Level Cache (LLC) Performance of Data Mining Workloads On a CMP - A Case Study of Parallel Bioinformatics Workloads". In Proc. of 12th HPCA, TX, 2006.Google Scholar
- J. Katz and Y. Lindell. "Introduction to Modern Cryptography". Chapman & Hall/CRC Press, 2007. Google Scholar
Digital Library
- C. Kozyrakis, A. Kansal, et al. "Server Engineering Insights for Large-Scale Online Services". In IEEE Micro, vol.30, no.4, July 2010. Google Scholar
Digital Library
- K. Kiwiel. "Convergence and efficiency of subgradient methods for quasiconvex minimization". Math. Programming, Springer, 2001.Google Scholar
Cross Ref
- J. Leverich, C. Kozyrakis. "On the Energy (In)Efficiency of Hadoop Clusters". In Proc. of HotPower, October 2009.Google Scholar
- J. Lin, A. Kolcz. "Large-Scale Machine Learning at Twitter". In Proc. of SIGMOD, Scottsdale, 2012. Google Scholar
Digital Library
- J. Mars, L. Tang, R. Hundt. "Heterogeneity in "Homogeneous" Warehouse-Scale Computers: A Performance Opportunity". In IEEE CAL, July-December 2011. Google Scholar
Digital Library
- J. Mars, L. Tang, et al. "Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations". In Proc. of MICRO-44, Brazil, December 2011 Google Scholar
Digital Library
- D. Meisner, C. Sadler, L. A. Barroso, et al. "Power Management of On-line Data-Intensive Services". In Proc. of ISCA, SJ, June 2011. Google Scholar
Digital Library
- R. Narayanan, B. Ozisikyilmaz, et al. "MineBench: A Bench- mark Suite for DataMining Workloads". In Proc. of IISWC, 2006.Google Scholar
- R. Nathuji, C. Isci, E. Gorbatov. "Exploiting platform heterogeneity for power efficient data centers". In Proc. of ICAC, 2007. Google Scholar
Digital Library
- R. Nathuji, et al. "Q-Clouds: Managing Performance Interference Effects for QoS-Aware Clouds". In Proc. of EuroSys, 2010. Google Scholar
Digital Library
- Rackspace. http://www.rackspace.com/.Google Scholar
- A. Rajaraman and J. Ullman. "Textbook on Mining of Massive Datasets", 2011. Google Scholar
Digital Library
- Amazon EC2: Rightscale. https://aws.amazon.com/ solution-providers/isv/rightscale.Google Scholar
- D. Sanchez, C. Kozyrakis. "Vantage: Scalable and Efficient Fine-Grain Cache Partitioning". In Proc. of ISCA, SJ, 2011. Google Scholar
Digital Library
- D. Shelepov, J. Saez, et al. "HASS: A Scheduler for Heterogeneous Multicore Systems". In OSP, vol. 43, 2009. Google Scholar
Digital Library
- J. Sun, Y. Xie, H. Zhang, C. Faloutsos. "Less is More: Compact Matrix Decomposition for Large Sparse Graphs". In Proc. of SDM, 2007.Google Scholar
- N. Vasic, et al. "DejaVu: Accelerating Resource Allocation in Virtualized Environments". In Proc. of ASPLOS, London, 2012. Google Scholar
Digital Library
- vMotion TM. http://www.vmware.com/products/vmotionGoogle Scholar
- VMWare vSphere. http://www.vmware.com/products/vsphere/Google Scholar
- T. Wenisch, et al. "SimFlex: Statistical Sampling of Computer System Simulation". In IEEE MICRO, vol. 26, no. 4, Jul-Aug 2006. Google Scholar
Digital Library
- Windows Azure. http://www.windowsazure.com/.Google Scholar
- I. Witten, E. Frank et al. "Data Mining: Practical Machine Learning Tools and Techniques". M. Kaufmann, 3rd Edition. Google Scholar
Digital Library
- S. Woo, M. Ohara, et al. "The SPLASH-2 Programs: Characterization and Methodological Considerations". In Proc. of the 22nd ISCA, 1995. Google Scholar
Digital Library
- Xen Hypervisor 4.0. http://www.xen.org/Google Scholar
- XenServer. http://www.citrix.com/products/xenserver/overview.htmlGoogle Scholar
- X. Zhu, et al. "1000 Islands: An Integrated Approach to Resource Management for Vurtualized Datacenters". In Cluster Computing Journal, 2009. Google Scholar
Digital Library
Index Terms
Paragon: QoS-aware scheduling for heterogeneous datacenters
Recommendations
Paragon: QoS-aware scheduling for heterogeneous datacenters
ASPLOS '13Large-scale datacenters (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty to match applications to one of the many hardware platforms available can degrade performance, ...
Paragon: QoS-aware scheduling for heterogeneous datacenters
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systemsLarge-scale datacenters (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty to match applications to one of the many hardware platforms available can degrade performance, ...
QoS-Aware scheduling in heterogeneous datacenters with paragon
Large-scale datacenters (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty of matching applications to one of the many hardware platforms available can degrade performance, ...







Comments