skip to main content
research-article
Public Access

Portfolio-driven Resource Management for Transient Cloud Servers

Authors Info & Claims
Published:13 June 2017Publication History
Skip Abstract Section

Abstract

Cloud providers have begun to offer their surplus capacity in the form of low-cost transient servers, which can be revoked unilaterally at any time. While the low cost of transient servers makes them attractive for a wide range of applications, such as data processing and scientific computing, failures due to server revocation can severely degrade application performance. Since different transient server types offer different cost and availability tradeoffs, we present the notion of server portfolios that is based on financial portfolio modeling. Server portfolios enable construction of an "optimal" mix of severs to meet an application's sensitivity to cost and revocation risk. We implement model-driven portfolios in a system called ExoSphere, and show how diverse applications can use portfolios and application-specific policies to gracefully handle transient servers. We show that ExoSphere enables widely-used parallel applications such as Spark, MPI, and BOINC to be made transiency-aware with modest effort. Our experiments show that allowing the applications to use suitable transiency-aware policies, ExoSphere is able to achieve 80% cost savings when compared to on-demand servers and greatly reduces revocation risk compared to existing approaches.

References

  1. Amazon EC2 Spot Instances. https://aws.amazon.com/ec2/spot/, September 24th 2015.Google ScholarGoogle Scholar
  2. Ec2 spot bid advisor. https://aws.amazon.com/ec2/spot/bid-advisor/, September 2015.Google ScholarGoogle Scholar
  3. Ec2 spot-fleet. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet.html, September 2015.Google ScholarGoogle Scholar
  4. Eucalyptus workload traces. https://www.cs.ucsb.edu/~rich/workload/, 2015.Google ScholarGoogle Scholar
  5. Google preemptible instances. https://cloud.google.com/compute/docs/instances/preemptible, September 24th 2015.Google ScholarGoogle Scholar
  6. Kubernetes. https://kubernetes.io, June 2016.Google ScholarGoogle Scholar
  7. Mpich: High performance portable mpi. https://www.open-mpi.org/, 2016.Google ScholarGoogle Scholar
  8. Openmpi checkpointing. https://www.open-mpi.org/faq/?category=ft, 2016.Google ScholarGoogle Scholar
  9. Risk-return trade-off. http://cvxopt.org/examples/book/portfolio.html, 2016.Google ScholarGoogle Scholar
  10. Ec2 spot instances pricing. https://aws.amazon.com/ec2/spot/pricing/, January 2017.Google ScholarGoogle Scholar
  11. O. Alipourfard, H. H. Liu, J. Chen, S. Venkataraman, M. Yu, and M. Zhang. Cherrypick: Adaptively unearthing the best cloud configurations for big data analytics. In NSDI. USENIX, 2017.Google ScholarGoogle Scholar
  12. D. P. Anderson. Boinc: A system for public-resource computing and storage. In Grid Computing, 2004. Proceedings. Fifth IEEE/ACM International Workshop on, pages 4--10. IEEE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, et~al. The nas parallel benchmarks. International Journal of High Performance Computing Applications, 5(3):63--73, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. O. Ben-Yehuda, M. Ben-Yehuda, A. Schuster, and D. Tsafrir. Deconstructing Amazon EC2 Spot Instance Pricing. In CloudCom, November 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Boutin, J. Ekanayake, W. Lin, B. Shi, J. Zhou, Z. Qian, M. Wu, and L. Zhou. Apollo: scalable and coordinated scheduling for cloud-scale computing. In OSDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. A. Brealey, S. C. Myers, F. Allen, and P. Mohanty. Principles of corporate finance. Tata McGraw-Hill Education, 2012.Google ScholarGoogle Scholar
  18. M. Carvalho, W. Cirne, F. Brasileiro, and J. Wilkes. Long-term slos for reclaimed cloud computing resources. In Proceedings of the ACM Symposium on Cloud Computing, pages 1--13. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Chen, C. Wang, B. B. Zhou, L. Sun, Y. C. Lee, and A. Y. Zomaya. Tradeoffs between profit and customer satisfaction for service provisioning in the cloud. In HPDC, pages 229--238. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Ciavotta, E. Gianniti, and D. Ardagna. D-space4cloud: a design tool for big data applications. In Algorithms and Architectures for Parallel Processing, pages 614--629. Springer, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  21. J. T. Daly. A Higher Order Estimate of the Optimum Checkpoint Interval for Restart Dumps. Future Generation Computer Systems, 22(3), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Delgado, F. Dinu, A.-M. Kermarrec, and W. Zwaenepoel. Hawk: Hybrid datacenter scheduling. In USENIX ATC, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Delimitrou and C. Kozyrakis. Quasar: resource-efficient and qos-aware cluster management. In ACM SIGPLAN Notices, volume~49, pages 127--144. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Delimitrou and C. Kozyrakis. Hcloud: Resource-efficient provisioning in shared cloud systems. In ASPLOS, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. J. Dubois and G. Casale. Optispot: minimizing application deployment cost using spot cloud resources. Cluster Computing, pages 1--17, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. J. Elton, M. J. Gruber, S. J. Brown, and W. N. Goetzmann. Modern portfolio theory and investment analysis. John Wiley & Sons, 2009.Google ScholarGoogle Scholar
  27. D. R. Engler, M. F. Kaashoek, et~al. Exokernel: An operating system architecture for application-level resource management. In SOSP. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. J. Fabozzi, F. Gupta, and H. M. Markowitz. The legacy of modern portfolio theory. The Journal of Investing, 11(3):7--22, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  29. B. Farley, A. Juels, V. Varadarajan, T. Ristenpart, K. D. Bowers, and M. M. Swift. More for your money: exploiting performance heterogeneity in public clouds. In Symposium on Cloud Computing. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant resource fairness: Fair allocation of multiple resource types. In NSDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Harlap, A. Tumanov, A. Chung, G. Ganger, and P. Gibbons. Proteus: agile ml elasticity through tiered reliability in dynamic resource markets. In EuroSys. ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. H. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In NSDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. B. A. Huberman, R. M. Lukose, and T. Hogg. An economics approach to hard computational problems. Science, 275(5296):51--54, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  34. K. Karanasos, S. Rao, C. Curino, C. Douglas, K. Chaliparambil, G. M. Fumarola, S. Heddaya, R. Ramakrishnan, and S. Sakalanaga. Mercury: Hybrid centralized and distributed scheduling in large shared clusters. In USENIX ATC, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Mace, P. Bodik, R. Fonseca, and M. Musuvathi. Retro: Targeted resource management in multi-tenant distributed systems. In NSDI 15, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Marathe, R. Harris, D. Lowenthal, B. R. De~Supinski, B. Rountree, and M. Schulz. Exploiting redundancy for cost-effective, time-constrained execution of hpc applications on amazon ec2. In HPDC. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. H. Markowitz. Portfolio selection. The journal of finance, 7(1):77--91, 1952.Google ScholarGoogle Scholar
  38. A. Meucci. Risk and Asset Allocation. Springer Finance, 2005.Google ScholarGoogle Scholar
  39. X. Ouyang, D. Irwin, and P. Shenoy. Spotlight: An information service for the cloud. In IEEE International Conference on Distributed Computing Systems (ICDCS), 2016.Google ScholarGoogle ScholarCross RefCross Ref
  40. S. Satchell and A. Scowcroft. A demystification of the black--litterman model: Managing quantitative and traditional portfolio construction. Journal of Asset Management, 1(2):138--150, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  41. M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. Omega: flexible, scalable schedulers for large compute clusters. In EuroSys. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Sedaghat, E. Wadbro, J. Wilkes, S. De~Luna, O. Seleznjev, and E. Elmroth. Diehard: reliable scheduling to survive correlated failures in cloud data centers. In Cluster, Cloud and Grid Computing (CCGrid), pages 52--59. IEEE/ACM, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. P. Sharma, T. Guo, X. He, D. Irwin, and P. Shenoy. Flint: batch-interactive data-intensive processing on transient servers. In EuroSys. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. P. Sharma, D. Irwin, and P. Shenoy. How not to bid the cloud. In HotCloud. USENIX, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. P. Sharma, S. Lee, T. Guo, D. Irwin, and P. Shenoy. Spotcheck: Designing a derivative iaas cloud on the spot market. In EuroSys, page~16. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. R. Singh, D. Irwin, P. Shenoy, and K. Ramakrishnan. Yank: Enabling Green Data Centers to Pull the Plug. In NSDI, April 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. R. Singh, P. Sharma, D. Irwin, P. Shenoy, and K. Ramakrishnan. Here today, gone tomorrow: Exploiting transient servers in datacenters. IEEE Internet Computing, 18(4):22--29, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  48. A. I. Siqi~Shen, Kefeng~Deng and D. Epema. Scheduling jobs in the cloud using on-demand and reserved instances. In EuroPar, 2013.Google ScholarGoogle Scholar
  49. S. Subramanya, T. Guo, P. Sharma, D. Irwin, and P. Shenoy. SpotOn: A Batch Computing Service for the Spot Market. In SOCC, August 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. L. Tomas and J. Tordsson. An autonomic approach to risk-aware data center overbooking. In Transactions on Cloud Computing. IEEE, 2014.Google ScholarGoogle Scholar
  51. V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, et~al. Apache hadoop yarn: Yet another resource negotiator. In Symposium on Cloud Computing. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes. Large-scale cluster management at google with borg. In EuroSys. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. A. Vintila, A.-M. Oprescu, and T. Kielmann. Fast (re-) configuration of mixed on-demand and spot instance pools for high-throughput computing. In Workshop on Optimization techniques for resources management in clouds, pages 25--32. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. C. A. Waldspurger, T. Hogg, B. A. Huberman, J. O. Kephart, and W. S. Stornetta. Spawn: A distributed computational economy. IEEE Transactions on Software Engineering, 18(2):103--117, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. J. Wen, L. Lu, G. Casale, and E. Smirni. Less can be more: Micro-managing vms in amazon ec2. In International Conference on Cloud Computing. IEEE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. A. Wieder, P. Bhatotia, A. Post, and R. Rodrigues. Orchestrating the deployment of computations in the cloud with conductor. In NSDI 12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Y. Yan, Y. Gao, Y. Chen, Z. Guo, B. Chen, and T. Moscibroda. TR-Spark: Transient Computing for Big Data Analytics. In SOCC, October 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Y. Yang, G.-W. Kim, W. W. Song, Y. Lee, A. Chung, Z. Qian, B. Cho, and B.-G. Chun. Pado: A data processing engine for harnessing transient resources in datacenters. In EuroSys. ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. S. Yi, D. Kondo, and A. Andrzejak. Reducing costs of spot instances via checkpointing in the amazon elastic compute cloud. In International Conference on Cloud Computing, pages 236--243. IEEE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. HotCloud, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Portfolio-driven Resource Management for Transient Cloud Servers

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
          Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 1, Issue 1
          June 2017
          712 pages
          EISSN:2476-1249
          DOI:10.1145/3107080
          Issue’s Table of Contents

          Copyright © 2017 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 June 2017
          Published in pomacs Volume 1, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!