skip to main content
10.1145/3434770.3459730acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Open access

Rearchitecting Kubernetes for the Edge

Published: 26 April 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Recent years have seen Kubernetes emerge as a primary choice for container orchestration. Kubernetes largely targets the cloud environment but new use cases require performant, available and scalable orchestration at the edge. Kubernetes stores all cluster state in etcd, a strongly consistent key-value store. We find that at larger etcd cluster sizes, offering higher availability, write request latency significantly increases and throughput decreases similarly. Coupled with approximately 30% of Kubernetes requests being writes, this directly impacts the request latency and availability of Kubernetes, reducing its suitability for the edge. We revisit the requirement of strong consistency and propose an eventually consistent approach instead. This enables higher performance, availability and scalability whilst still supporting the broad needs of Kubernetes. This aims to make Kubernetes much more suitable for performance-critical, dynamically-scaled edge solutions.

    References

    [1]
    2020. A Byzantine failure in the real world. Retrieved January 13, 2021 from https://blog.cloudflare.com/a-byzantine-failure-in-the-real-world/
    [2]
    2020. An open platform that extends upstream Kubernetes to Edge. Retrieved January 13, 2021 from https://openyurt.io/en-us/index.html
    [3]
    2020. K3s: The certified Kubernetes distribution built for IoT & Edge computing. Retrieved January 13, 2021 from https://k3s.io/
    [4]
    2020. KubeEdge An open platform to enable Edge computing. Retrieved January 13, 2021 from https://kubeedge.io/en/
    [5]
    2020. KubeFed: Kubernetes Cluster Federation. Retrieved January 13, 2021 from https://github.com/kubernetes-sigs/kubefed
    [6]
    2020. SuperEdge: An edge-native container management system for edge computing. Retrieved January 13, 2021 from https://github.com/superedge/superedge
    [7]
    2021. Cloud Controller Manager. Retrieved February 09, 2021 from https://kubernetes.io/docs/concepts/architecture/cloud-controller/
    [8]
    2021. Etcd: A distributed, reliable key-value store for the most critical data of a distributed system. Retrieved February 09, 2021 from https://etcd.io/
    [9]
    2021. Etcd: Hardware recommendations. Retrieved February 09, 2021 from https://etcd.io/docs/v3.4.0/op-guide/hardware
    [10]
    2021. K0s: The Simple, Solid & Certified Kubernetes Distribution. Retrieved January 13, 2021 from https://k0sproject.io/
    [11]
    2021. Kubernetes kubeadm resource requirements. Retrieved February 16, 2021 from https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
    [12]
    2021. Kubernetes: Production-Grade Container Orchestration. Retrieved February 09, 2021 from https://kubernetes.io/
    [13]
    2021. Rook: Open-Source, Cloud-Native Storage for Kubernetes. Retrieved February 09, 2021 from https://rook.io/
    [14]
    2021. Scaling up etcd clusters. Retrieved February 09, 2021 from https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#scaling-up-etcd-clusters
    [15]
    2021. Why Large Organizations Trust Kubernetes. Retrieved March 31, 2021 from https://tanzu.vmware.com/content/blog/why-large-organizations-trust-kubernetes
    [16]
    Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas, and Tevfik Kosar. 2020. WPaxos: Wide Area Network Flexible Consensus. IEEE Transactions on Parallel and Distributed Systems 31, 1 (2020), 211--223. https://doi.org/10.1109/TPDS.2019.2929793
    [17]
    Mohammed Alfatafta, Basil Alkhatib, Ahmed Alquraan, and Samer Al-Kiswany. 2020. Toward a Generic Fault Tolerance Technique for Partial Network Partitioning. In Operating Systems Design and Implementation (OSDI) 2020.
    [18]
    Paulo Sèrgio Almeida, Ali Shoker, and Carlos Baquero. 2015. Efficient state-based CRDTs by delta-mutation. https://doi.org/10.1007/978-3-319- 26850-7_5
    [19]
    Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, and Ion Stoica. 2012. Probabilistically Bounded Staleness for Practical Partial Quorums. Proceedings of the VLDB Endowment 5, 8 (April 2012), 776--787. https://doi.org/10.14778/2212351.2212359
    [20]
    Leonardo Bonati, Michele Polese, Salvatore D'Oro, Stefano Basagni, and Tommaso Melodia. 2020. Open, Programmable, and Virtualized 5G Networks: State-of-the-Art and the Road Ahead. Computer Networks 182 (2020), 107516. https://doi.org/10.1016/j.comnet.2020.107516
    [21]
    Hung-Li Chen and Fuchun J. Lin. 2019. Scalable IoT/M2M Platforms Based on Kubernetes-Enabled NFV MANO Architecture. In International Conference on Internet of Things (iThings) 2019. https://doi.org/10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00188
    [22]
    Corentin Dupont, Raffaele Giaffreda, and Luca Capra. 2017. Edge computing in IoT context: Horizontal and vertical Linux container migration. In Global Internet of Things Summit (GIoTS) 2017. https://doi.org/10.1109/GIOTS.2017.8016218
    [23]
    Vitor Enes, Paulo S. Almeida, Carlos Baquero, and João Leitão. 2019. Efficient Synchronization of State-Based CRDTs. In IEEE International Conference on Data Engineering (ICDE) 2019. https://doi.org/10.1109/ICDE.2019.00022
    [24]
    Armando Fox and Eric A. Brewer. 1999. Harvest, yield, and scalable tolerant systems. In Hot Topics in Operating Systems (HotOS) 1999. https://doi.org/10.1109/HOTOS.1999.798396
    [25]
    Soheil Hassas Yeganeh and Yashar Ganjali. 2012. Kandoo: A Framework for Efficient and Scalable Offloading of Control Applications. In Hot Topics in Software Defined Networks (HotSDN) 2012. https://doi.org/10.1145/2342441.2342446
    [26]
    Lara L. Jiménez and Olov Schelén. 2019. DOCMA: A Decentralized Orchestrator for Containerized Microservice Applications. In 2019 IEEE Cloud Summit. https://doi.org/10.1109/CloudSummit47114.2019.00014
    [27]
    Martin Kleppmann and Alastair R. Beresford. 2017. A Conflict-Free Replicated JSON Datatype. IEEE Transactions on Parallel and Distributed Systems 28, 10 (2017), 2733--2746. https://doi.org/10.1109/TPDS.2017.2697382
    [28]
    Martin Kleppmann and Heidi Howard. 2020. Byzantine Eventual Consistency and the Fundamental Limits of Peer-to-Peer Databases. arXiv:2012.00472 [cs.DC]
    [29]
    Teemu Koponen, Martin Casado, Natasha Gude, Jeremy Stribling, Leon Poutievski, Min Zhu, Rajiv Ramanathan, Yuichiro Iwata, Hiroaki Inoue, Takayuki Hama, and Scott Shenker. 2010. Onix: A Distributed Control Platform for Large-Scale Production Networks. In Operating Systems Design and Implementation (OSDI) 2010.
    [30]
    Michał Król, Spyridon Mastorakis, David Oran, and Dirk Kutscher. 2019. Compute First Networking: Distributed Computing Meets ICN. In Information-Centric Networking (ICN) 2019. https://doi.org/10.1145/3357150.3357395
    [31]
    Simon Kuenzer, Anton Ivanov, Filipe Manco, Jose Mendes, Yuri Volchkov, Florian Schmidt, Kenichi Yasukata, Michio Honda, and Felipe Huici. 2017. Unikernels Everywhere: The Case for Elastic CDNs. In Virtual Execution Environments (VEE) 2017. https://doi.org/10.1145/3050748.3050757
    [32]
    Lars Larsson, Harald Gustafsson, Cristian Klein, and Erik Elmroth. 2020. Decentralized Kubernetes Federation Control Plane. In Utility and Cloud Computing (UCC) 2020. https://doi.org/10.1109/UCC48980.2020.00056
    [33]
    Diego Ongaro and John Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In USENIX Annual Technical Conference (USENIX ATC) 2014.
    [34]
    Xiaoqi Ren, Ganesh Ananthanarayanan, Adam Wierman, and Minlan Yu. 2015. Hopper: Decentralized Speculation-Aware Cluster Scheduling at Scale. In Special Interest Group on Data Communication (SIGCOMM) 2015. https://doi.org/10.1145/2785956.2787481
    [35]
    Denis Rystsov. 2018. CASPaxos: Replicated State Machines without logs. arXiv:1802.07000 [cs.DC]
    [36]
    Ermin Sakic, Fragkiskos Sardis, Jochen W. Guck, and Wolfgang Kellerer. 2017. Towards adaptive state consistency in distributed SDN control plane. In IEEE International Conference on Communications (ICC) 2017. https://doi.org/10.1109/ICC.2017.7997164
    [37]
    Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011. Conflict-Free Replicated Data Types. In Stabilization, Safety, and Security of Distributed Systems.
    [38]
    John A Stankovic. 1984. Simulations of three adaptive, decentralized controlled, job scheduling algorithms. Computer Networks (1976) 8, 3 (1984), 199--217. https://doi.org/10.1016/0376-5075(84)90048-5
    [39]
    John A. Stankovic. 1985. Stability and Distributed Scheduling Algorithms. IEEE Transactions on Software Engineering SE-11, 10 (1985), 1141--1152. https://doi.org/10.1109/TSE.1985.231862
    [40]
    Albert van der Linde, João Leitão, and Nuno Preguiça. 2016. Δ-CRDTs: Making δ-CRDTs Delta-Based. In Principles and Practice of Consistency for Distributed Data (PaPoC) 2016. https://doi.org/10.1145/2911151.2911163
    [41]
    Chenggang Wu, Jose Faleiro, Yihan Lin, and Joseph Hellerstein. 2018. Anna: A KVS for Any Scale. In IEEE 34th International Conference on Data Engineering (ICDE)2018. https://doi.org/10.1109/ICDE.2018.00044
    [42]
    Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In European Conference on Computer Systems (EuroSys) 2010. https://doi.org/10.1145/1755913.1755940

    Cited By

    View all
    • (2023)Cloud-Native Workload Orchestration at the Edge: A Deployment Review and Future DirectionsSensors10.3390/s2304221523:4(2215)Online publication date: 16-Feb-2023
    • (2023)Vendor-Agnostic Reconfiguration of Kubernetes Clusters in Cloud FederationsFuture Internet10.3390/fi1502006315:2(63)Online publication date: 1-Feb-2023
    • (2023)Understanding the Neglected Cost of Serverless Cluster ManagementProceedings of the 4th Workshop on Resource Disaggregation and Serverless10.1145/3605181.3626286(22-28)Online publication date: 23-Oct-2023
    • Show More Cited By

    Index Terms

    1. Rearchitecting Kubernetes for the Edge

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      EdgeSys '21: Proceedings of the 4th International Workshop on Edge Systems, Analytics and Networking
      April 2021
      84 pages
      ISBN:9781450382915
      DOI:10.1145/3434770
      This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 April 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. CRDTs
      2. Kubernetes
      3. edge
      4. eventual consistency
      5. orchestration

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      EuroSys '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 10 of 23 submissions, 43%

      Upcoming Conference

      EuroSys '25
      Twentieth European Conference on Computer Systems
      March 30 - April 3, 2025
      Rotterdam , Netherlands

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)928
      • Downloads (Last 6 weeks)101

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Cloud-Native Workload Orchestration at the Edge: A Deployment Review and Future DirectionsSensors10.3390/s2304221523:4(2215)Online publication date: 16-Feb-2023
      • (2023)Vendor-Agnostic Reconfiguration of Kubernetes Clusters in Cloud FederationsFuture Internet10.3390/fi1502006315:2(63)Online publication date: 1-Feb-2023
      • (2023)Understanding the Neglected Cost of Serverless Cluster ManagementProceedings of the 4th Workshop on Resource Disaggregation and Serverless10.1145/3605181.3626286(22-28)Online publication date: 23-Oct-2023
      • (2022)Distributed point-to-point routing method for tasks in cloud control systemsJournal of Systems Engineering and Electronics10.23919/JSEE.2022.00007933:4(792-804)Online publication date: Aug-2022
      • (2022)Scalable Data Plane Caching for Kubernetes2022 18th International Conference on Network and Service Management (CNSM)10.23919/CNSM55787.2022.9964497(345-351)Online publication date: 31-Oct-2022
      • (2022)DSONProceedings of the VLDB Endowment10.14778/3510397.351040315:5(1053-1065)Online publication date: 18-May-2022
      • (2022)Accelerating kubernetes with in-network cachingProceedings of the SIGCOMM '22 Poster and Demo Sessions10.1145/3546037.3546058(40-42)Online publication date: 22-Aug-2022
      • (2022)OakestraProceedings of the SIGCOMM '22 Poster and Demo Sessions10.1145/3546037.3546056(34-36)Online publication date: 22-Aug-2022
      • (2022)KOLEProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563462(196-209)Online publication date: 7-Nov-2022
      • (2022)DICerProceedings of the 9th ACM Conference on Information-Centric Networking10.1145/3517212.3558084(45-55)Online publication date: 6-Sep-2022
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media