skip to main content
10.1145/3434770.3459730acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Open access

Rearchitecting Kubernetes for the Edge

Published: 26 April 2021 Publication History

Abstract

Recent years have seen Kubernetes emerge as a primary choice for container orchestration. Kubernetes largely targets the cloud environment but new use cases require performant, available and scalable orchestration at the edge. Kubernetes stores all cluster state in etcd, a strongly consistent key-value store. We find that at larger etcd cluster sizes, offering higher availability, write request latency significantly increases and throughput decreases similarly. Coupled with approximately 30% of Kubernetes requests being writes, this directly impacts the request latency and availability of Kubernetes, reducing its suitability for the edge. We revisit the requirement of strong consistency and propose an eventually consistent approach instead. This enables higher performance, availability and scalability whilst still supporting the broad needs of Kubernetes. This aims to make Kubernetes much more suitable for performance-critical, dynamically-scaled edge solutions.

References

[1]
2020. A Byzantine failure in the real world. Retrieved January 13, 2021 from https://blog.cloudflare.com/a-byzantine-failure-in-the-real-world/
[2]
2020. An open platform that extends upstream Kubernetes to Edge. Retrieved January 13, 2021 from https://openyurt.io/en-us/index.html
[3]
2020. K3s: The certified Kubernetes distribution built for IoT & Edge computing. Retrieved January 13, 2021 from https://k3s.io/
[4]
2020. KubeEdge An open platform to enable Edge computing. Retrieved January 13, 2021 from https://kubeedge.io/en/
[5]
2020. KubeFed: Kubernetes Cluster Federation. Retrieved January 13, 2021 from https://github.com/kubernetes-sigs/kubefed
[6]
2020. SuperEdge: An edge-native container management system for edge computing. Retrieved January 13, 2021 from https://github.com/superedge/superedge
[7]
2021. Cloud Controller Manager. Retrieved February 09, 2021 from https://kubernetes.io/docs/concepts/architecture/cloud-controller/
[8]
2021. Etcd: A distributed, reliable key-value store for the most critical data of a distributed system. Retrieved February 09, 2021 from https://etcd.io/
[9]
2021. Etcd: Hardware recommendations. Retrieved February 09, 2021 from https://etcd.io/docs/v3.4.0/op-guide/hardware
[10]
2021. K0s: The Simple, Solid & Certified Kubernetes Distribution. Retrieved January 13, 2021 from https://k0sproject.io/
[11]
2021. Kubernetes kubeadm resource requirements. Retrieved February 16, 2021 from https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
[12]
2021. Kubernetes: Production-Grade Container Orchestration. Retrieved February 09, 2021 from https://kubernetes.io/
[13]
2021. Rook: Open-Source, Cloud-Native Storage for Kubernetes. Retrieved February 09, 2021 from https://rook.io/
[14]
2021. Scaling up etcd clusters. Retrieved February 09, 2021 from https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#scaling-up-etcd-clusters
[15]
2021. Why Large Organizations Trust Kubernetes. Retrieved March 31, 2021 from https://tanzu.vmware.com/content/blog/why-large-organizations-trust-kubernetes
[16]
Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas, and Tevfik Kosar. 2020. WPaxos: Wide Area Network Flexible Consensus. IEEE Transactions on Parallel and Distributed Systems 31, 1 (2020), 211--223. https://doi.org/10.1109/TPDS.2019.2929793
[17]
Mohammed Alfatafta, Basil Alkhatib, Ahmed Alquraan, and Samer Al-Kiswany. 2020. Toward a Generic Fault Tolerance Technique for Partial Network Partitioning. In Operating Systems Design and Implementation (OSDI) 2020.
[18]
Paulo Sèrgio Almeida, Ali Shoker, and Carlos Baquero. 2015. Efficient state-based CRDTs by delta-mutation. https://doi.org/10.1007/978-3-319- 26850-7_5
[19]
Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, and Ion Stoica. 2012. Probabilistically Bounded Staleness for Practical Partial Quorums. Proceedings of the VLDB Endowment 5, 8 (April 2012), 776--787. https://doi.org/10.14778/2212351.2212359
[20]
Leonardo Bonati, Michele Polese, Salvatore D'Oro, Stefano Basagni, and Tommaso Melodia. 2020. Open, Programmable, and Virtualized 5G Networks: State-of-the-Art and the Road Ahead. Computer Networks 182 (2020), 107516. https://doi.org/10.1016/j.comnet.2020.107516
[21]
Hung-Li Chen and Fuchun J. Lin. 2019. Scalable IoT/M2M Platforms Based on Kubernetes-Enabled NFV MANO Architecture. In International Conference on Internet of Things (iThings) 2019. https://doi.org/10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00188
[22]
Corentin Dupont, Raffaele Giaffreda, and Luca Capra. 2017. Edge computing in IoT context: Horizontal and vertical Linux container migration. In Global Internet of Things Summit (GIoTS) 2017. https://doi.org/10.1109/GIOTS.2017.8016218
[23]
Vitor Enes, Paulo S. Almeida, Carlos Baquero, and João Leitão. 2019. Efficient Synchronization of State-Based CRDTs. In IEEE International Conference on Data Engineering (ICDE) 2019. https://doi.org/10.1109/ICDE.2019.00022
[24]
Armando Fox and Eric A. Brewer. 1999. Harvest, yield, and scalable tolerant systems. In Hot Topics in Operating Systems (HotOS) 1999. https://doi.org/10.1109/HOTOS.1999.798396
[25]
Soheil Hassas Yeganeh and Yashar Ganjali. 2012. Kandoo: A Framework for Efficient and Scalable Offloading of Control Applications. In Hot Topics in Software Defined Networks (HotSDN) 2012. https://doi.org/10.1145/2342441.2342446
[26]
Lara L. Jiménez and Olov Schelén. 2019. DOCMA: A Decentralized Orchestrator for Containerized Microservice Applications. In 2019 IEEE Cloud Summit. https://doi.org/10.1109/CloudSummit47114.2019.00014
[27]
Martin Kleppmann and Alastair R. Beresford. 2017. A Conflict-Free Replicated JSON Datatype. IEEE Transactions on Parallel and Distributed Systems 28, 10 (2017), 2733--2746. https://doi.org/10.1109/TPDS.2017.2697382
[28]
Martin Kleppmann and Heidi Howard. 2020. Byzantine Eventual Consistency and the Fundamental Limits of Peer-to-Peer Databases. arXiv:2012.00472 [cs.DC]
[29]
Teemu Koponen, Martin Casado, Natasha Gude, Jeremy Stribling, Leon Poutievski, Min Zhu, Rajiv Ramanathan, Yuichiro Iwata, Hiroaki Inoue, Takayuki Hama, and Scott Shenker. 2010. Onix: A Distributed Control Platform for Large-Scale Production Networks. In Operating Systems Design and Implementation (OSDI) 2010.
[30]
Michał Król, Spyridon Mastorakis, David Oran, and Dirk Kutscher. 2019. Compute First Networking: Distributed Computing Meets ICN. In Information-Centric Networking (ICN) 2019. https://doi.org/10.1145/3357150.3357395
[31]
Simon Kuenzer, Anton Ivanov, Filipe Manco, Jose Mendes, Yuri Volchkov, Florian Schmidt, Kenichi Yasukata, Michio Honda, and Felipe Huici. 2017. Unikernels Everywhere: The Case for Elastic CDNs. In Virtual Execution Environments (VEE) 2017. https://doi.org/10.1145/3050748.3050757
[32]
Lars Larsson, Harald Gustafsson, Cristian Klein, and Erik Elmroth. 2020. Decentralized Kubernetes Federation Control Plane. In Utility and Cloud Computing (UCC) 2020. https://doi.org/10.1109/UCC48980.2020.00056
[33]
Diego Ongaro and John Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In USENIX Annual Technical Conference (USENIX ATC) 2014.
[34]
Xiaoqi Ren, Ganesh Ananthanarayanan, Adam Wierman, and Minlan Yu. 2015. Hopper: Decentralized Speculation-Aware Cluster Scheduling at Scale. In Special Interest Group on Data Communication (SIGCOMM) 2015. https://doi.org/10.1145/2785956.2787481
[35]
Denis Rystsov. 2018. CASPaxos: Replicated State Machines without logs. arXiv:1802.07000 [cs.DC]
[36]
Ermin Sakic, Fragkiskos Sardis, Jochen W. Guck, and Wolfgang Kellerer. 2017. Towards adaptive state consistency in distributed SDN control plane. In IEEE International Conference on Communications (ICC) 2017. https://doi.org/10.1109/ICC.2017.7997164
[37]
Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011. Conflict-Free Replicated Data Types. In Stabilization, Safety, and Security of Distributed Systems.
[38]
John A Stankovic. 1984. Simulations of three adaptive, decentralized controlled, job scheduling algorithms. Computer Networks (1976) 8, 3 (1984), 199--217. https://doi.org/10.1016/0376-5075(84)90048-5
[39]
John A. Stankovic. 1985. Stability and Distributed Scheduling Algorithms. IEEE Transactions on Software Engineering SE-11, 10 (1985), 1141--1152. https://doi.org/10.1109/TSE.1985.231862
[40]
Albert van der Linde, João Leitão, and Nuno Preguiça. 2016. Δ-CRDTs: Making δ-CRDTs Delta-Based. In Principles and Practice of Consistency for Distributed Data (PaPoC) 2016. https://doi.org/10.1145/2911151.2911163
[41]
Chenggang Wu, Jose Faleiro, Yihan Lin, and Joseph Hellerstein. 2018. Anna: A KVS for Any Scale. In IEEE 34th International Conference on Data Engineering (ICDE)2018. https://doi.org/10.1109/ICDE.2018.00044
[42]
Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In European Conference on Computer Systems (EuroSys) 2010. https://doi.org/10.1145/1755913.1755940

Cited By

View all
  • (2024)On the Optimization of Kubernetes toward the Enhancement of Cloud ComputingMathematics10.3390/math1216247612:16(2476)Online publication date: 10-Aug-2024
  • (2024)Enhancing Autonomous Driving Robot Systems with Edge Computing and LDM PlatformsElectronics10.3390/electronics1314274013:14(2740)Online publication date: 12-Jul-2024
  • (2024)Komet: A Serverless Platform for Low-Earth Orbit Edge ServicesProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698517(866-882)Online publication date: 20-Nov-2024
  • Show More Cited By

Index Terms

  1. Rearchitecting Kubernetes for the Edge

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    EdgeSys '21: Proceedings of the 4th International Workshop on Edge Systems, Analytics and Networking
    April 2021
    84 pages
    ISBN:9781450382915
    DOI:10.1145/3434770
    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 April 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CRDTs
    2. Kubernetes
    3. edge
    4. eventual consistency
    5. orchestration

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    EuroSys '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 10 of 23 submissions, 43%

    Upcoming Conference

    EuroSys '25
    Twentieth European Conference on Computer Systems
    March 30 - April 3, 2025
    Rotterdam , Netherlands

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)876
    • Downloads (Last 6 weeks)107
    Reflects downloads up to 28 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)On the Optimization of Kubernetes toward the Enhancement of Cloud ComputingMathematics10.3390/math1216247612:16(2476)Online publication date: 10-Aug-2024
    • (2024)Enhancing Autonomous Driving Robot Systems with Edge Computing and LDM PlatformsElectronics10.3390/electronics1314274013:14(2740)Online publication date: 12-Jul-2024
    • (2024)Komet: A Serverless Platform for Low-Earth Orbit Edge ServicesProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698517(866-882)Online publication date: 20-Nov-2024
    • (2024)Immutability and non-repudiation in the exchange of key messages within the EU IoT-Edge-Cloud ContinuumProceedings of the 19th International Conference on Availability, Reliability and Security10.1145/3664476.3669918(1-8)Online publication date: 30-Jul-2024
    • (2024)Improving Real-Time Data Streams Performance on Autonomous Surface Vehicles using DataX2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP62718.2024.00038(222-229)Online publication date: 20-Mar-2024
    • (2024)Edge Orchestration Framework for AI-assisted Link Failure Forecasting and Recovery2024 24th International Conference on Transparent Optical Networks (ICTON)10.1109/ICTON62926.2024.10647881(1-4)Online publication date: 14-Jul-2024
    • (2024)Towards fault-tolerant deployment of mobile robot navigation in the edge: an experimental study2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10611013(6791-6797)Online publication date: 13-May-2024
    • (2024)Failover Timing Analysis in Orchestrating Container-based Critical Applications2024 19th European Dependable Computing Conference (EDCC)10.1109/EDCC61798.2024.00026(81-84)Online publication date: 8-Apr-2024
    • (2024)Cross-Domain Solutions (CDS): A Comprehensive SurveyIEEE Access10.1109/ACCESS.2024.348365912(163551-163620)Online publication date: 2024
    • (2024)Making Ecosystem Modeling Operational–A Novel Distributed Execution Framework to Systematically Explore Ecological Responses to Divergent Climate TrajectoriesEarth's Future10.1029/2023EF00429512:3Online publication date: 7-Mar-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media