ABSTRACT
Small jobs, that are typically run for interactive data analyses in datacenters, continue to be plagued by disproportionately long-running tasks called stragglers. In the production clusters at Facebook and Microsoft Bing, even after applying state-of-the-art straggler mitigation techniques, these latency sensitive jobs have stragglers that are on average 8 times slower than themedian task in that job. Such stragglers increase the average job duration by 47%. This is because current mitigation techniques all involve an element of waiting and speculation. We instead propose full cloning of small jobs, avoiding waiting and speculation altogether. Cloning of small jobs only marginally increases utilization because workloads show that while the majority of jobs are small, they only consume a small fraction of the resources. The main challenge of cloning is, however, that extra clones can cause contention for intermediate data. We use a technique, delay assignment, which efficiently avoids such contention. Evaluation of our system, Dolly, using production workloads shows that the small jobs speedup by 34% to 46% after state-of-the-artmitigation techniques have been applied, using just 5% extra resources for cloning.
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In USENIX OSDI, 2004. Google Scholar
- M. Isard, M. Budiu, Y. Yu, A. Birrell and D. Fetterly. Dryad: Distributed Data-parallel Programs from Sequential Building Blocks. In ACM Eurosys, 2007. Google Scholar
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In USENIX NSDI, 2012. Google Scholar
- G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, E. Harris, and B. Saha. Reining in the Outliers in Map-Reduce Clusters using Mantri. In USENIX OSDI, 2010. Google Scholar
- M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, I. Stoica. Improving MapReduce Performance in Heterogeneous Environments. In USENIX OSDI, 2008. Google Scholar
- J. Dean. Achieving Rapid Response Times in Large Online Services. http://research.google.com/people/jeff/latency.html.Google Scholar
- S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, T. Vassilakis. Dremel: Interactive Analysis of Web-Scale Datasets. In VLDB, 2010. Google Scholar
- Hadoop. http://hadoop.apache.org.Google Scholar
- Hadoop distributed file system. http://hadoop.apache.org/hdfs.Google Scholar
- Hive. http://wiki.apache.org/hadoop/Hive.Google Scholar
- R. Chaiken, B. Jenkins, P. Larson, B. Ramsey, D. Shakib, S. Weaver, J. Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Datasets. In VLDB, 2008. Google Scholar
- Y. Yu et al. Distributed Aggregation for Data-Parallel Computing: Interfaces and Implementations. In ACM SOSP, 2009. Google Scholar
- G. Ananthanarayanan, C. Douglas, R. Ramakrishnan, S. Rao, and I. Stoica. True Elasticity in Multi-Tenant Clusters through Amoeba. In ACM SoCC, 2012. Google Scholar
- Y. Kwon, M. Balazinska, B. Howe, and J. Rolia. A Study of Skew in MapReduce Applications. In Open Cirrus Summit, 2011.Google Scholar
- L. A. Barroso. Warehouse-scale computing: Entering the teenage decade. In ISCA, 2011. Google Scholar
- Y. Chen, S. Alspaugh, D. Borthakur, R. Katz. Energy Efficiency for Large-Scale MapReduce Workloads with Significant Interactive Analysis. In ACM EuroSys, 2012. Google Scholar
- J. Wilkes and C. Reiss., 2011. https://code.google.com/p/googleclusterdata/wiki/ClusterData2011_1.Google Scholar
- C. Reiss, A. Tumanov, G. Ganger, R. H. Katz, M. Kozuch. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In ACM SoCC, 2012. Google Scholar
- A. Thusoo. Data warehousing and analytics infrastructure at facebook. In SIGMOD, 2010. Google Scholar
- G. Ananthanarayanan, A. Ghodsi, S. Shenker, I. Stoica. Disk Locality Considered Irrelevant. In USENIX HotOS, 2011.Google Scholar
- S. Ko, I. Hoque, B. Cho, I. Gupta. Making Cloud Intermediate Data Fault-Tolerant. In ACM SOCC, 2010. Google Scholar
- M. Chowdhury, M. Zaharia, J. Ma, M. Jordan, I. Stoica. Managing Data Transfers in Computer Clusters with Orchestra. In ACM SIGCOMM, 2011. Google Scholar
- Hadoop Slowstart. https://issues.apache.org/jira/browse/MAPREDUCE-1184/.Google Scholar
- A. Baratloo, M. Karaul, Z. Kedem, and P. Wycko. Charlotte: Metacomputing on the Web. In 9th Conference on Parallel and Distributed Computing Systems, 1996.Google Scholar
- E. Korpela D. Anderson, J. Cobb. SETI@home: An Experiment in Public-Resource Computing. In Comm. ACM, 2002. Google Scholar
- M. C. Rinard and P. C. Diniz. Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers. In ACM PLDI, 1996. Google Scholar
- D. Paranhos, W. Cirne, and F. Brasileiro. Trading Cycles for Information: Using Replication to Schedule Bag-of-Tasks Applications on Computational Grids. In Euro-Par, 2003.Google Scholar
- G. Ghare and S. Leutenegger. Improving Speedup and Response Times by Replicating Parallel Programs on a SNOW. In JSSPP, 2004. Google Scholar
- W. Cirne, D. Paranhos, F. Brasileiro, L. F. W. Goes, and W. Voorsluys. On the Efficacy, Efficiency and Emergent Behavior of Task Replication in Large Distributed Systems. In Parallel Computing, 2007. Google Scholar
- A. Merchant, M. Uysal, P. Padala, X. Zhu, S. Singhal, and K. Shin. Maestro: Quality-of-Service in Large Disk Arrays. In ACM ICAC, 2011. Google Scholar
- E. Ipek, M. Krman, N. Krman, and J. F. Martinez. Core Fusion: Accommodating Software Diversity in Chip Multiprocessors. In ISCA, 2007. Google Scholar
- J. G. Elerath and S. Shah. Dependence upon fly-height and quantity of heads. In Annual Symposium on Reliability and Maintainability, 2003.Google Scholar
- J. G. Elerath and S. Shah. Server class disk drives: How reliable are they? In Annual Symposium on Reliability and Maintainability, 2004.Google Scholar
- J. Gray and C. van Ingen. Empirical measurements of disk failure rates and error rates. In Technical Report MSR-TR- 2005-166, 2005.Google Scholar
- G. DeCandia and D. Hastorun and M. Jampani and G. Kakulapati and A. Lakshman and A. Pilchin and S. Sivasubramanian and P. Vosshall and W. Vogels. Dynamo: Amazons Highly Available Key-value Store. In ACM SOSP, 2007. Google Scholar
- M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In ACM EuroSys, 2010. Google Scholar
- G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, I. Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In USENIX NSDI, 2012. Google Scholar
Index Terms
Effective straggler mitigation: attack of the clones
Recommendations
Optimal Server Selection for Straggler Mitigation
The performance of large-scale distributed compute systems is adversely impacted by stragglers when the execution time of a job is uncertain. To manage stragglers, we consider a multi-fork approach for job scheduling, where additional parallel servers are ...
Effective ensembles of heuristics for scheduling flexible job shop problem with new job insertion
Flexible job shop scheduling problem.Ensembles of heuristics.Re-scheduling for new job insertion.Multiple objectives scheduling problem. This study investigates the flexible job shop scheduling problem (FJSP) with new job insertion. FJSP with new job ...
Sequential addition of coded sub-tasks for straggler mitigation
IEEE INFOCOM 2020 - IEEE Conference on Computer CommunicationsStraggler mitigation can be achieved by redundant computation. In MDS redundancy method, a task is divided into k sub-tasks which are encoded to n coded sub-tasks, such that a task is completed if any k coded sub-tasks are completed. Two important metrics ...




Comments