skip to main content
research-article
Open Access

Achievable Stability in Redundancy Systems

Authors Info & Claims
Published:30 November 2020Publication History
Skip Abstract Section

Abstract

We consider a system with N~parallel servers where incoming jobs are immediately replicated to, say, d~servers. Each of the N servers has its own queue and follows a FCFS discipline. As soon as the first job replica is completed, the remaining replicas are abandoned. We investigate the achievable stability region for a quite general workload model with different job types and heterogeneous servers, reflecting job-server affinity relations which may arise from data locality issues and soft compatibility constraints. Under the assumption that job types are known beforehand we show for New-Better-than-Used (NBU) distributed speed variations that no replication $(d=1)$ gives a strictly larger stability region than replication $(d>1)$. Strikingly, this does not depend on the underlying distribution of the intrinsic job sizes, but observing the job types is essential for this statement to hold. In case of non-observable job types we show that for New-Worse-than-Used (NWU) distributed speed variations full replication ($d=N$) gives a larger stability region than no replication $(d=1)$.

References

  1. M. Aktas, G. Joshi, S. Kadhe, F. Kazemi, and E. Soljanin. 2020. Service rate region: A new aspect of coded distributed system design. ArXiv 2009.01598 (2020), 1--43.Google ScholarGoogle Scholar
  2. S.E. Anderson, A. Johnston, G. Joshi, G.L. Matthews, C. Mayer, and E. Soljanin. 2018. Service rate region of content access from erasure coded storage. Proceedings of the 2018 Information Theory Workshop (2018), 600--605.Google ScholarGoogle Scholar
  3. E. Anton, U. Ayesta, M. Jonckheere, and I.M. Verloop. 2019. On the stability of redudancy models. ArXiv 1903.04414 (2019).Google ScholarGoogle Scholar
  4. E. Anton, U. Ayesta, M. Jonckheere, and I.M. Verloop. 2020. Improving the performance of heterogeneous data centers through redundancy. ArXiv 2003.01394 (2020).Google ScholarGoogle Scholar
  5. K. Bimpikis and M.G. Markakis. 2018. Learning and hierarchies in service systems. Management Science , Vol. 65, 3 (2018), 1--18.Google ScholarGoogle Scholar
  6. K. Gardner, M. Harchol-Balter, A. Scheller-Wolf, and B. Van Houdt. 2017a. A better model for job redundancy: Decoupling server slowdown and job size. IEEE ACM Transactions on Networking , Vol. 25, 6 (2017), 3353--3367.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Gardner, M. Harchol-Balter, A. Scheller-Wolf, M. Velednitsky, and S. Zbarsky. 2017b. Redundancy-d: The power of d choices for redundancy. Operations Research , Vol. 65, 4 (2017), 1078--1094.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J.M. Harrison and M.J. López. 1999. Heavy traffic resource pooling in parallel-server systems. Queueing Systems , Vol. 33, 4 (1999), 339--368.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Hellemans, T. Bodas, and B. Van Houdt. 2019. Performance analysis of workload dependent load balancing policies. Proceedings of the ACM on Measurement and Analysis of Computing Systems , Vol. 3, 2 (2019), 1--35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Hellemans and B. Van Houdt. 2019. Performance of Redundancy(d) with identical/independent replicas. ACM Transactions on Modeling and Performance Evaluation of Computing Systems , Vol. 4, 2 (2019), 1--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Joshi. 2016. Efficient Redundancy Techniques to Reduce Delay in Cloud Systems. Ph.D. Dissertation. Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  12. G. Joshi. 2018. Synergy via Redundancy: Boosting service capacity with adaptive replication. ACM SIGMETRICS Performance Evaluation Review , Vol. 45, 3 (2018), 21--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Joshi, Y. Liu, and E. Soljanin. 2014. On the delay-storage trade-off in content download from coded distributed storage systems. IEEE Journal on Selected Areas in Communications , Vol. 32, 5 (2014), 989--997.Google ScholarGoogle ScholarCross RefCross Ref
  14. Y. Kim, R. Righter, and R. Wolff. 2009. Job replication on multiserver systems. Advances in Applied Probability , Vol. 41, 2 (2009), 546--575.Google ScholarGoogle ScholarCross RefCross Ref
  15. G. Koole and R. Righter. 2008. Resource allocation in grid computing. Journal of Scheduling , Vol. 11 (2008), 163--173.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Mendelson. 2020. A lower bound on the stability region for redundancy-d with FIFO service discipline. ArXiv 2004.14793 (2020).Google ScholarGoogle Scholar
  17. F. Poloczek and F. Ciucu. 2016. Contrasting effects of replication in parallel systems: From overload to underload and back. ACM SIGMETRICS Performance Evaluation Review , Vol. 44, 1 (2016), 375--376.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Raaijmakers, S.C. Borst, and O.J. Boxma. 2018. Delta probing policies for redundancy. Performance Evaluation , Vol. 127--128 (2018), 21--35.Google ScholarGoogle Scholar
  19. Y. Raaijmakers, S.C. Borst, and O.J. Boxma. 2019. Redundancy scheduling with scaled Bernoulli service requirements. Queueing Systems , Vol. 93, 1--2 (2019), 67--82.Google ScholarGoogle ScholarCross RefCross Ref
  20. Y. Raaijmakers, S.C. Borst, and O.J. Boxma. 2020. Stability of redundancy systems with processor sharing. VALUETOOLS '20: Proceedings of the 13th EAI International Conference on Performance Evaluation Methodologies and Tools (2020), 120--127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N.B. Shah, K. Lee, and K. Ramchandran. 2016. When do redundant requests reduce latency? IEEE Transactions on Communications , Vol. 64, 2 (2016), 715--722.Google ScholarGoogle ScholarCross RefCross Ref
  22. A.L. Stolyar. 2005. Optimal routing in output-queued flexible server systems. Probability in the Engineering and Informational Sciences , Vol. 19, 2 (2005), 141--189.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Stoyan. 1983. Comparison Methods for Queues and Other Stochastic Models. Chichester, Wiley. (edited with revisions by D.J. Daley).Google ScholarGoogle Scholar
  24. Y. Sun, C.E. Koksal, and N.B. Shroff. 2017. On delay-optimal scheduling in queueing systems with replications computing. ArXiv 1603.07322v8 (2017).Google ScholarGoogle Scholar
  25. D. Wang, G. Joshi, and G.W. Wornell. 2019. Efficient straggler replication in large-scale parallel computing. ACM Transactions on Modeling and Performance Evaluation of Computing Systems , Vol. 4, 2 (2019), 1--23.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Achievable Stability in Redundancy Systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!