ABSTRACT
Serverless computing has seen rapid adoption due to its high scalability and flexible, pay-as-you-go billing model. In serverless, developers structure their services as a collection of functions, sporadically invoked by various events like clicks. High inter-arrival time variability of function invocations motivates the providers to start new function instances upon each invocation, leading to significant cold-start delays that degrade user experience. To reduce cold-start latency, the industry has turned to snapshotting, whereby an image of a fully-booted function is stored on disk, enabling a faster invocation compared to booting a function from scratch.
This work introduces vHive, an open-source framework for serverless experimentation with the goal of enabling researchers to study and innovate across the entire serverless stack. Using vHive, we characterize a state-of-the-art snapshot-based serverless infrastructure, based on industry-leading Containerd orchestration framework and Firecracker hypervisor technologies. We find that the execution time of a function started from a snapshot is 95% higher, on average, than when the same function is memory-resident. We show that the high latency is attributable to frequent page faults as the function's state is brought from disk into guest memory one page at a time. Our analysis further reveals that functions access the same stable working set of pages across different invocations of the same function. By leveraging this insight, we build REAP, a light-weight software mechanism for serverless hosts that records functions' stable working set of guest memory pages and proactively prefetches it from disk into memory. Compared to baseline snapshotting, REAP slashes the cold-start delays by 3.7x, on average.
Supplemental Material
Available for Download
vHive logo
- [n.d.]. Cloud Hypervisor. Available at https://github.com/cloud-hypervisor.Google Scholar
- [n.d.]. gRPC: A High-Performance, Open Source Universal RPC Framework. Available at https://grpc.io.Google Scholar
- [n.d.]. Kata Containers. Available at https://katacontainers.io.Google Scholar
- [n.d.]. WebAssembly. Available at https://webassembly.org.Google Scholar
- Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. 2020. Firecracker: Lightweight Virtualization for Serverless Applications. In Proceedings of the 17th Symposium on Networked Systems Design and Implementation (NSDI). 419-434.Google Scholar
- Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein, Klaus Satzke, Andre Beck, Paarijaat Aditya, and Volker Hilt. 2018. SAND: Towards HighPerformance Serverless Computing. In Proceedings of the 2018 USENIX Annual Technical Conference (ATC). 923-935.Google Scholar
- Amazon. [n.d.]. A Demo Running 4000 Firecracker MicroVMs. Available at https://github.com/firecracker-microvm/firecracker-demo.Google Scholar
- Apache. [n.d.]. OpenWhisk. Available at https://openwhisk.apache.org/.Google Scholar
- The Fission Authors. [n.d.]. Fission: Open Source, Kubernetes-Native Serverless Framework. Available at https://fission.io.Google Scholar
- The Fn Project Authors. [n.d.]. Fn Project. Available at https://fnproject.io.Google Scholar
- The Istio Authors. [n.d.]. Istio. Available at https://istio.io.Google Scholar
- The Knative Authors. [n.d.]. Knative. Available at https://knative.dev.Google Scholar
- AWS re:Invent. 2019. A Serverless Journey: AWS Lambda Under the Hood.Google Scholar
- Baidu. [n.d.]. The Application of Kata Containers in Baidu AI Cloud. Available at https://katacontainers.io/collateral/ ApplicationOfKataContainersInBaiduAICloud.pdf.Google Scholar
- Adam Belay, Andrea Bittau, Ali José Mashtizadeh, David Terei, David Mazières, and Christos Kozyrakis. 2012. Dune: Safe User-level Access to Privileged CPU Features. In Proceedings of the 10th Symposium on Operating System Design and Implementation (OSDI). 335-348.Google Scholar
- Ricardo Bianchini. [n.d.]. Serverless in Seattle: Toward Making Serverless the Future of the Cloud. Available at https://acmsocc.github.io/2020/keynotes.html.Google Scholar
- James Cadden, Thomas Unger, Yara Awad, Han Dong, Orran Krieger, and Jonathan Appavoo. 2020. SEUSS: Skip Redundant Paths to Make Serverless Fast. In Proceedings of the 2020 EuroSys Conference. 32 : 1-32 : 15.Google Scholar
Digital Library
- CBINSIGHTS. [n.d.]. Why Serverless Computing Is The Fastest-Growing Cloud Services Segment. Available at https://www.cbinsights.com/research/serverlesscloud-computing.Google Scholar
- Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, and Andrew Warfield. 2005. Live Migration of Virtual Machines. In Proceedings of the 2nd Symposium on Networked Systems Design and Implementation (NSDI).Google Scholar
Digital Library
- Cloud Native Computing Foundation. [n.d.]. CRI-O: Lightweight Container Runtime for Kubernetes. Available at https://cri-o. io.Google Scholar
- Containerd. [n.d.]. An Industry-Standard Container Runtime with an Emphasis on Simplicity, Robustness and Portability. Available at https://containerd.io.Google Scholar
- CouldFlare. [n.d.]. CloudFlare Workers. Available at https:// workers.cloudflare.com/.Google Scholar
- Daniel Krook. [n.d.]. Five Minute Intro to Open Source Serverless Development with OpenWhisk. Available at https://medium.com/openwhisk/five-minuteintro-to-open-source-serverless-development-with-openwhisk-328b0ebfa160.Google Scholar
- Android Developers. [n.d.]. Overview of Memory Management. Available at https://developer.android.com/topic/performance/memory-overview.Google Scholar
- Docker. [n.d.]. Use the Device Mapper Storage Driver. Available at https: //docs.docker.com/storage/storagedriver/device-mapper-driver.Google Scholar
- Dong Du, Tianyi Yu, Yubin Xia, Binyu Zang, Guanglu Yan, Chenggang Qin, Qixuan Wu, and Haibo Chen. 2020. Catalyzer: Sub-Millisecond Startup for Serverless Computing with Initialization-less Booting. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXV). 467-481.Google Scholar
Digital Library
- Adam Everspaugh, Yan Zhai, Robert Jellinek, Thomas Ristenpart, and Michael M. Swift. 2014. Not-So-Random Numbers in Virtualized Linux and the Whirlwind RNG. In Proceedings of the 35th IEEE Symposium on Security and Privacy (S&P). 559-574.Google Scholar
- Google. [n.d.]. gVisor. Available at https://gvisor.dev.Google Scholar
- Google Cloud. [n.d.]. Configuring Warmup Requests to Improve Performance. Available at https://cloud.google.com/appengine/docs/standard/python/ configuring-warmup-requests.Google Scholar
- Scott Hendrickson, Stephen Sturdevant, Tyler Harter, Venkateshwaran Venkataramani, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. Serverless Computation with OpenLambda. In 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud).Google Scholar
- Kostis Kafes, Neeraja J. Yadwadkar, and Christos Kozyrakis. 2019. Centralized Core-Granular Scheduling for Serverless Functions. In Proceedings of the 2019 ACM Symposium on Cloud Computing (SOCC). 158-164.Google Scholar
Digital Library
- Jeongchul Kim and Kyungyong Lee. 2019. FunctionBench: A Suite of Workloads for Serverless Cloud Function Service. In Proceedings of the 12th IEEE International Conference on Cloud Computing (CLOUD). 502-504.Google Scholar
Cross Ref
- Jeongchul Kim and Kyungyong Lee. 2019. Practical Cloud Workloads for Serverless FaaS. In Proceedings of the 2019 ACM Symposium on Cloud Computing (SOCC). 477.Google Scholar
Digital Library
- Avi Kivity, Dor Laor, Glauber Costa, Pekka Enberg, Nadav Har'El, Don Marti, and Vlad Zolotarov. 2014. OSv-Optimizing the Operating System for Virtual Machines. In Proceedings of the 2014 USENIX Annual Technical Conference (ATC). 61-72.Google Scholar
- Thomas Knauth and Christof Fetzer. 2014. DreamServer: Truly On-Demand Cloud Services. In Proceedings of the 7th ACM International Systems and Storage Conference (SYSTOR). 9 : 1-9 : 11.Google Scholar
Digital Library
- Kubeless. [n.d.]. Kubeless: The Kubernetes Native Serverless Framework. Available at https://kubeless.io.Google Scholar
- Kubernetes. [n.d.]. Production-Grade Container Orchestration. Available at https://kubernetes.io.Google Scholar
- Horacio Andrés Lagar-Cavilla, Joseph Andrew Whitney, Adin Matthew Scannell, Philip Patchin, Stephen M. Rumble, Eyal de Lara, Michael Brudno, and Mahadev Satyanarayanan. 2009. SnowFlock: rapid virtual machine cloning for cloud computing. In Proceedings of the 2009 EuroSys Conference. 1-12.Google Scholar
Digital Library
- Linux programmer's manual. [n.d.]. Userfaultfd. Available at https://man7.org/ linux/man-pages/man2/userfaultfd.2.html.Google Scholar
- Kangjie Lu, Wenke Lee, Stefan Nürnberger, and Michael Backes. 2016. How to Make ASLR Win the Clone Wars: Runtime Re-Randomization. In Proceedings of the 2016 Annual Network and Distributed System Security Symposium (NDSS).Google Scholar
Cross Ref
- Anil Madhavapeddy, Richard Mortier, Charalampos Rotsos, David J. Scott, Balraj Singh, Thomas Gazagnaire, Steven Smith, Steven Hand, and Jon Crowcroft. 2013. Unikernels: Library Operating Systems for the Cloud. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XVIII). 461-472.Google Scholar
Digital Library
- Linux man page. [n.d.]. fio. Available at https://linux.die.net/man/1/fio.Google Scholar
- Filipe Manco, Costin Lupu, Florian Schmidt, Jose Mendes, Simon Kuenzer, Sumit Sati, Kenichi Yasukata, Costin Raiciu, and Felipe Huici. 2017. My VM is Lighter (and Safer) than your Container. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP). 218-233.Google Scholar
Digital Library
- Market Reports World. 2019. Serverless Architecture Market by End-Users and Geography-Global Forecast 2019-2023. Available at https:// www.marketreportsworld.com/serverless-architecture-market-13684687.Google Scholar
- Microsoft. 2019. Azure Functions. Available at https://azure.microsoft.com/engb/services/functions.Google Scholar
- MinIO. [n.d.]. Kubernetes Native, High Performance Object Storage. Available at https://min.io.Google Scholar
- Michael Nelson, Beng-Hong Lim, and Greg Hutchins. 2005. Fast Transparent Migration for Virtual Machines. In USENIX Annual Technical Conference. 391-394.Google Scholar
- Goncalo Neves. [n.d.]. Keeping Functions Warm-How To Fix AWS Lambda Cold Start Issues. Available at https://serverless.com/blog/keep-your-lambdas-warm.Google Scholar
- Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Harter, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2018. SOCK: Rapid Task Provisioning with Serverless-Optimized Containers. In Proceedings of the 2018 USENIX Annual Technical Conference (ATC). 57-70.Google Scholar
- OpenNebula. [n.d.]. OpenNebula + Firecracker: Building the Future of OnPremises Serverless Computing. Available at https://opennebula.io /opennebulaifrecracker-building-the-future-of-on-premises-serverless-computing.Google Scholar
- Allison Randal. 2020. The Ideal Versus the Real: Revisiting the History of Virtual Machines and Containers. ACM Comput. Surv. 53, 1 ( 2020 ), 5 : 1-5 : 31.Google Scholar
- Samuel Karp. [n.d.]. Deep Dive into Firecracker-Containerd. Available at https://speakerdeck.com/samuelkarp/deep-dive-into-firecracker-containerdre-invent-2019-con408.Google Scholar
- Mohammad Shahrad, Rodrigo Fonseca, Iñigo Goiri, Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, and Ricardo Bianchini. 2020. Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider. In Proceedings of the 2020 USENIX Annual Technical Conference (ATC). 205-218.Google Scholar
- Mikhail Shilkov. [n.d.]. Serverless: Cold Start War. Available at https://mikhail.io/ 2018 /08/serverless-cold-start-war.Google Scholar
- Simon Shillaker and Peter R. Pietzuch. 2020. Faasm: Lightweight Isolation for Eficient Stateful Serverless Computing. In Proceedings of the 2020 USENIX Annual Technical Conference (ATC). 419-433.Google Scholar
- Bernd Strehl. [n.d.]. Lambda Serverless Benchmark. Available at https:// serverless-benchmark.com.Google Scholar
- The Firecracker Authors. [n.d.]. Entropy for Clones. Available at https://github.com/firecracker-microvm/firecracker/blob/master/docs/ snapshotting/random-for-clones. md.Google Scholar
- The Firecracker Authors. [n.d.]. Firecracker Snapshotting. Available at https://github.com/firecracker-microvm/firecracker/blob/master/docs/ snapshotting/snapshot-support. md.Google Scholar
- The Firecracker Authors. [n.d.]. Production Host Setup Recommendations. Available at https://github.com/firecracker-microvm/firecracker/blob/master/docs/ prod-host-setup. md.Google Scholar
- The Firecracker-Containerd Authors. [n.d.]. Firecracker-Containerd. Available at https://github.com/firecracker-microvm/firecracker-containerd.Google Scholar
- The Linux Foundation Projects. [n.d.]. Open Container Initiative. Available at https://opencontainers.org.Google Scholar
- V8. [n.d.]. Isolate Class Reference. Available at https://v8docs.nodesource. com/ node-0.8/d5/dda/classv8_1_1_isolate.html.Google Scholar
- Michael Vrable, Justin Ma, Jay Chen, David Moore, Erik Vandekieft, Alex C. Snoeren, Geofrey M. Voelker, and Stefan Savage. 2005. Scalability, Fidelity, and Containment in the Potemkin Virtual Honeyfarm. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP). 148-162.Google Scholar
Digital Library
- Kai-Ting Amy Wang, Rayson Ho, and Peng Wu. 2019. Replayable Execution Optimized for Page Sharing for a Managed Runtime Environment. In Proceedings of the 2019 EuroSys Conference. 39 : 1-39 : 16.Google Scholar
Digital Library
- Tianyi Yu, Qingyuan Liu, Dong Du, Yubin Xia, Binyu Zang, Ziqian Lu, Pingchao Yang, Chenggang Qin, and Haibo Chen. 2020. Characterizing Serverless Platforms with ServerlessBench. In Proceedings of the 2020 ACM Symposium on Cloud Computing (SOCC). 30-44.Google Scholar
Digital Library
- Irene Zhang, Tyler Denniston, Yury Baskakov, and Alex Garthwaite. 2013. Optimizing VM Checkpointing for Restore Performance in VMware ESXi. In Proceedings of the 2013 USENIX Annual Technical Conference (ATC). 1-12.Google Scholar
- Irene Zhang, Alex Garthwaite, Yury Baskakov, and Kenneth C. Barr. 2011. Fast Restore of Checkpointed Memory using Working Set Estimation. In Proceedings of the 7th International Conference on Virtual Execution Environments (VEE). 87-98.Google Scholar
- Jun Zhu, Zhefu Jiang, and Zhen Xiao. 2011. Twinkle: A Fast Resource Provisioning Mechanism for Internet Services. In Proceedings of the 2011 IEEE Conference on Computer Communications (INFOCOM). 802-810.Google Scholar
Cross Ref
Index Terms
Benchmarking, analysis, and optimization of serverless function snapshots
Recommendations
FaaSnap: FaaS made fast using snapshot-based VMs
EuroSys '22: Proceedings of the Seventeenth European Conference on Computer SystemsFaaSnap is a VM snapshot-based platform that uses a set of complementary optimizations to improve function cold-start performance for Function-as-a-Service (FaaS) applications. Compact loading set files take better advantage of prefetching. Per-region ...
Protocol Responsibility Offloading to Improve TCP Throughput in Virtualized Environments
Virtualization is a key technology that powers cloud computing platforms such as Amazon EC2. Virtual machine (VM) consolidation, where multiple VMs share a physical host, has seen rapid adoption in practice, with increasingly large numbers of VMs per ...
Opportunistic flooding to improve TCP transmit performance in virtualized clouds
SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud ComputingVirtualization is a key technology that powers cloud computing platforms such as Amazon EC2. Virtual machine (VM) consolidation, where multiple VMs share a physical host, has seen rapid adoption in practice with increasingly large number of VMs per ...






Comments