skip to main content
10.1145/3445814.3446714acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections

Benchmarking, analysis, and optimization of serverless function snapshots

Published:17 April 2021Publication History

ABSTRACT

Serverless computing has seen rapid adoption due to its high scalability and flexible, pay-as-you-go billing model. In serverless, developers structure their services as a collection of functions, sporadically invoked by various events like clicks. High inter-arrival time variability of function invocations motivates the providers to start new function instances upon each invocation, leading to significant cold-start delays that degrade user experience. To reduce cold-start latency, the industry has turned to snapshotting, whereby an image of a fully-booted function is stored on disk, enabling a faster invocation compared to booting a function from scratch.

This work introduces vHive, an open-source framework for serverless experimentation with the goal of enabling researchers to study and innovate across the entire serverless stack. Using vHive, we characterize a state-of-the-art snapshot-based serverless infrastructure, based on industry-leading Containerd orchestration framework and Firecracker hypervisor technologies. We find that the execution time of a function started from a snapshot is 95% higher, on average, than when the same function is memory-resident. We show that the high latency is attributable to frequent page faults as the function's state is brought from disk into guest memory one page at a time. Our analysis further reveals that functions access the same stable working set of pages across different invocations of the same function. By leveraging this insight, we build REAP, a light-weight software mechanism for serverless hosts that records functions' stable working set of guest memory pages and proactively prefetches it from disk into memory. Compared to baseline snapshotting, REAP slashes the cold-start delays by 3.7x, on average.

Skip Supplemental Material Section

Supplemental Material

References

  1. [n.d.]. Cloud Hypervisor. Available at https://github.com/cloud-hypervisor.Google ScholarGoogle Scholar
  2. [n.d.]. gRPC: A High-Performance, Open Source Universal RPC Framework. Available at https://grpc.io.Google ScholarGoogle Scholar
  3. [n.d.]. Kata Containers. Available at https://katacontainers.io.Google ScholarGoogle Scholar
  4. [n.d.]. WebAssembly. Available at https://webassembly.org.Google ScholarGoogle Scholar
  5. Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. 2020. Firecracker: Lightweight Virtualization for Serverless Applications. In Proceedings of the 17th Symposium on Networked Systems Design and Implementation (NSDI). 419-434.Google ScholarGoogle Scholar
  6. Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein, Klaus Satzke, Andre Beck, Paarijaat Aditya, and Volker Hilt. 2018. SAND: Towards HighPerformance Serverless Computing. In Proceedings of the 2018 USENIX Annual Technical Conference (ATC). 923-935.Google ScholarGoogle Scholar
  7. Amazon. [n.d.]. A Demo Running 4000 Firecracker MicroVMs. Available at https://github.com/firecracker-microvm/firecracker-demo.Google ScholarGoogle Scholar
  8. Apache. [n.d.]. OpenWhisk. Available at https://openwhisk.apache.org/.Google ScholarGoogle Scholar
  9. The Fission Authors. [n.d.]. Fission: Open Source, Kubernetes-Native Serverless Framework. Available at https://fission.io.Google ScholarGoogle Scholar
  10. The Fn Project Authors. [n.d.]. Fn Project. Available at https://fnproject.io.Google ScholarGoogle Scholar
  11. The Istio Authors. [n.d.]. Istio. Available at https://istio.io.Google ScholarGoogle Scholar
  12. The Knative Authors. [n.d.]. Knative. Available at https://knative.dev.Google ScholarGoogle Scholar
  13. AWS re:Invent. 2019. A Serverless Journey: AWS Lambda Under the Hood.Google ScholarGoogle Scholar
  14. Baidu. [n.d.]. The Application of Kata Containers in Baidu AI Cloud. Available at https://katacontainers.io/collateral/ ApplicationOfKataContainersInBaiduAICloud.pdf.Google ScholarGoogle Scholar
  15. Adam Belay, Andrea Bittau, Ali José Mashtizadeh, David Terei, David Mazières, and Christos Kozyrakis. 2012. Dune: Safe User-level Access to Privileged CPU Features. In Proceedings of the 10th Symposium on Operating System Design and Implementation (OSDI). 335-348.Google ScholarGoogle Scholar
  16. Ricardo Bianchini. [n.d.]. Serverless in Seattle: Toward Making Serverless the Future of the Cloud. Available at https://acmsocc.github.io/2020/keynotes.html.Google ScholarGoogle Scholar
  17. James Cadden, Thomas Unger, Yara Awad, Han Dong, Orran Krieger, and Jonathan Appavoo. 2020. SEUSS: Skip Redundant Paths to Make Serverless Fast. In Proceedings of the 2020 EuroSys Conference. 32 : 1-32 : 15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. CBINSIGHTS. [n.d.]. Why Serverless Computing Is The Fastest-Growing Cloud Services Segment. Available at https://www.cbinsights.com/research/serverlesscloud-computing.Google ScholarGoogle Scholar
  19. Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, and Andrew Warfield. 2005. Live Migration of Virtual Machines. In Proceedings of the 2nd Symposium on Networked Systems Design and Implementation (NSDI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Cloud Native Computing Foundation. [n.d.]. CRI-O: Lightweight Container Runtime for Kubernetes. Available at https://cri-o. io.Google ScholarGoogle Scholar
  21. Containerd. [n.d.]. An Industry-Standard Container Runtime with an Emphasis on Simplicity, Robustness and Portability. Available at https://containerd.io.Google ScholarGoogle Scholar
  22. CouldFlare. [n.d.]. CloudFlare Workers. Available at https:// workers.cloudflare.com/.Google ScholarGoogle Scholar
  23. Daniel Krook. [n.d.]. Five Minute Intro to Open Source Serverless Development with OpenWhisk. Available at https://medium.com/openwhisk/five-minuteintro-to-open-source-serverless-development-with-openwhisk-328b0ebfa160.Google ScholarGoogle Scholar
  24. Android Developers. [n.d.]. Overview of Memory Management. Available at https://developer.android.com/topic/performance/memory-overview.Google ScholarGoogle Scholar
  25. Docker. [n.d.]. Use the Device Mapper Storage Driver. Available at https: //docs.docker.com/storage/storagedriver/device-mapper-driver.Google ScholarGoogle Scholar
  26. Dong Du, Tianyi Yu, Yubin Xia, Binyu Zang, Guanglu Yan, Chenggang Qin, Qixuan Wu, and Haibo Chen. 2020. Catalyzer: Sub-Millisecond Startup for Serverless Computing with Initialization-less Booting. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXV). 467-481.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Adam Everspaugh, Yan Zhai, Robert Jellinek, Thomas Ristenpart, and Michael M. Swift. 2014. Not-So-Random Numbers in Virtualized Linux and the Whirlwind RNG. In Proceedings of the 35th IEEE Symposium on Security and Privacy (S&P). 559-574.Google ScholarGoogle Scholar
  28. Google. [n.d.]. gVisor. Available at https://gvisor.dev.Google ScholarGoogle Scholar
  29. Google Cloud. [n.d.]. Configuring Warmup Requests to Improve Performance. Available at https://cloud.google.com/appengine/docs/standard/python/ configuring-warmup-requests.Google ScholarGoogle Scholar
  30. Scott Hendrickson, Stephen Sturdevant, Tyler Harter, Venkateshwaran Venkataramani, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. Serverless Computation with OpenLambda. In 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud).Google ScholarGoogle Scholar
  31. Kostis Kafes, Neeraja J. Yadwadkar, and Christos Kozyrakis. 2019. Centralized Core-Granular Scheduling for Serverless Functions. In Proceedings of the 2019 ACM Symposium on Cloud Computing (SOCC). 158-164.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jeongchul Kim and Kyungyong Lee. 2019. FunctionBench: A Suite of Workloads for Serverless Cloud Function Service. In Proceedings of the 12th IEEE International Conference on Cloud Computing (CLOUD). 502-504.Google ScholarGoogle ScholarCross RefCross Ref
  33. Jeongchul Kim and Kyungyong Lee. 2019. Practical Cloud Workloads for Serverless FaaS. In Proceedings of the 2019 ACM Symposium on Cloud Computing (SOCC). 477.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Avi Kivity, Dor Laor, Glauber Costa, Pekka Enberg, Nadav Har'El, Don Marti, and Vlad Zolotarov. 2014. OSv-Optimizing the Operating System for Virtual Machines. In Proceedings of the 2014 USENIX Annual Technical Conference (ATC). 61-72.Google ScholarGoogle Scholar
  35. Thomas Knauth and Christof Fetzer. 2014. DreamServer: Truly On-Demand Cloud Services. In Proceedings of the 7th ACM International Systems and Storage Conference (SYSTOR). 9 : 1-9 : 11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Kubeless. [n.d.]. Kubeless: The Kubernetes Native Serverless Framework. Available at https://kubeless.io.Google ScholarGoogle Scholar
  37. Kubernetes. [n.d.]. Production-Grade Container Orchestration. Available at https://kubernetes.io.Google ScholarGoogle Scholar
  38. Horacio Andrés Lagar-Cavilla, Joseph Andrew Whitney, Adin Matthew Scannell, Philip Patchin, Stephen M. Rumble, Eyal de Lara, Michael Brudno, and Mahadev Satyanarayanan. 2009. SnowFlock: rapid virtual machine cloning for cloud computing. In Proceedings of the 2009 EuroSys Conference. 1-12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Linux programmer's manual. [n.d.]. Userfaultfd. Available at https://man7.org/ linux/man-pages/man2/userfaultfd.2.html.Google ScholarGoogle Scholar
  40. Kangjie Lu, Wenke Lee, Stefan Nürnberger, and Michael Backes. 2016. How to Make ASLR Win the Clone Wars: Runtime Re-Randomization. In Proceedings of the 2016 Annual Network and Distributed System Security Symposium (NDSS).Google ScholarGoogle ScholarCross RefCross Ref
  41. Anil Madhavapeddy, Richard Mortier, Charalampos Rotsos, David J. Scott, Balraj Singh, Thomas Gazagnaire, Steven Smith, Steven Hand, and Jon Crowcroft. 2013. Unikernels: Library Operating Systems for the Cloud. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XVIII). 461-472.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Linux man page. [n.d.]. fio. Available at https://linux.die.net/man/1/fio.Google ScholarGoogle Scholar
  43. Filipe Manco, Costin Lupu, Florian Schmidt, Jose Mendes, Simon Kuenzer, Sumit Sati, Kenichi Yasukata, Costin Raiciu, and Felipe Huici. 2017. My VM is Lighter (and Safer) than your Container. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP). 218-233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Market Reports World. 2019. Serverless Architecture Market by End-Users and Geography-Global Forecast 2019-2023. Available at https:// www.marketreportsworld.com/serverless-architecture-market-13684687.Google ScholarGoogle Scholar
  45. Microsoft. 2019. Azure Functions. Available at https://azure.microsoft.com/engb/services/functions.Google ScholarGoogle Scholar
  46. MinIO. [n.d.]. Kubernetes Native, High Performance Object Storage. Available at https://min.io.Google ScholarGoogle Scholar
  47. Michael Nelson, Beng-Hong Lim, and Greg Hutchins. 2005. Fast Transparent Migration for Virtual Machines. In USENIX Annual Technical Conference. 391-394.Google ScholarGoogle Scholar
  48. Goncalo Neves. [n.d.]. Keeping Functions Warm-How To Fix AWS Lambda Cold Start Issues. Available at https://serverless.com/blog/keep-your-lambdas-warm.Google ScholarGoogle Scholar
  49. Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Harter, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2018. SOCK: Rapid Task Provisioning with Serverless-Optimized Containers. In Proceedings of the 2018 USENIX Annual Technical Conference (ATC). 57-70.Google ScholarGoogle Scholar
  50. OpenNebula. [n.d.]. OpenNebula + Firecracker: Building the Future of OnPremises Serverless Computing. Available at https://opennebula.io /opennebulaifrecracker-building-the-future-of-on-premises-serverless-computing.Google ScholarGoogle Scholar
  51. Allison Randal. 2020. The Ideal Versus the Real: Revisiting the History of Virtual Machines and Containers. ACM Comput. Surv. 53, 1 ( 2020 ), 5 : 1-5 : 31.Google ScholarGoogle Scholar
  52. Samuel Karp. [n.d.]. Deep Dive into Firecracker-Containerd. Available at https://speakerdeck.com/samuelkarp/deep-dive-into-firecracker-containerdre-invent-2019-con408.Google ScholarGoogle Scholar
  53. Mohammad Shahrad, Rodrigo Fonseca, Iñigo Goiri, Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, and Ricardo Bianchini. 2020. Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider. In Proceedings of the 2020 USENIX Annual Technical Conference (ATC). 205-218.Google ScholarGoogle Scholar
  54. Mikhail Shilkov. [n.d.]. Serverless: Cold Start War. Available at https://mikhail.io/ 2018 /08/serverless-cold-start-war.Google ScholarGoogle Scholar
  55. Simon Shillaker and Peter R. Pietzuch. 2020. Faasm: Lightweight Isolation for Eficient Stateful Serverless Computing. In Proceedings of the 2020 USENIX Annual Technical Conference (ATC). 419-433.Google ScholarGoogle Scholar
  56. Bernd Strehl. [n.d.]. Lambda Serverless Benchmark. Available at https:// serverless-benchmark.com.Google ScholarGoogle Scholar
  57. The Firecracker Authors. [n.d.]. Entropy for Clones. Available at https://github.com/firecracker-microvm/firecracker/blob/master/docs/ snapshotting/random-for-clones. md.Google ScholarGoogle Scholar
  58. The Firecracker Authors. [n.d.]. Firecracker Snapshotting. Available at https://github.com/firecracker-microvm/firecracker/blob/master/docs/ snapshotting/snapshot-support. md.Google ScholarGoogle Scholar
  59. The Firecracker Authors. [n.d.]. Production Host Setup Recommendations. Available at https://github.com/firecracker-microvm/firecracker/blob/master/docs/ prod-host-setup. md.Google ScholarGoogle Scholar
  60. The Firecracker-Containerd Authors. [n.d.]. Firecracker-Containerd. Available at https://github.com/firecracker-microvm/firecracker-containerd.Google ScholarGoogle Scholar
  61. The Linux Foundation Projects. [n.d.]. Open Container Initiative. Available at https://opencontainers.org.Google ScholarGoogle Scholar
  62. V8. [n.d.]. Isolate Class Reference. Available at https://v8docs.nodesource. com/ node-0.8/d5/dda/classv8_1_1_isolate.html.Google ScholarGoogle Scholar
  63. Michael Vrable, Justin Ma, Jay Chen, David Moore, Erik Vandekieft, Alex C. Snoeren, Geofrey M. Voelker, and Stefan Savage. 2005. Scalability, Fidelity, and Containment in the Potemkin Virtual Honeyfarm. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP). 148-162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Kai-Ting Amy Wang, Rayson Ho, and Peng Wu. 2019. Replayable Execution Optimized for Page Sharing for a Managed Runtime Environment. In Proceedings of the 2019 EuroSys Conference. 39 : 1-39 : 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Tianyi Yu, Qingyuan Liu, Dong Du, Yubin Xia, Binyu Zang, Ziqian Lu, Pingchao Yang, Chenggang Qin, and Haibo Chen. 2020. Characterizing Serverless Platforms with ServerlessBench. In Proceedings of the 2020 ACM Symposium on Cloud Computing (SOCC). 30-44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Irene Zhang, Tyler Denniston, Yury Baskakov, and Alex Garthwaite. 2013. Optimizing VM Checkpointing for Restore Performance in VMware ESXi. In Proceedings of the 2013 USENIX Annual Technical Conference (ATC). 1-12.Google ScholarGoogle Scholar
  67. Irene Zhang, Alex Garthwaite, Yury Baskakov, and Kenneth C. Barr. 2011. Fast Restore of Checkpointed Memory using Working Set Estimation. In Proceedings of the 7th International Conference on Virtual Execution Environments (VEE). 87-98.Google ScholarGoogle Scholar
  68. Jun Zhu, Zhefu Jiang, and Zhen Xiao. 2011. Twinkle: A Fast Resource Provisioning Mechanism for Internet Services. In Proceedings of the 2011 IEEE Conference on Computer Communications (INFOCOM). 802-810.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Benchmarking, analysis, and optimization of serverless function snapshots

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader