Abstract
OS-level virtualization is often used for server consolidation in data centers because of its high efficiency. However, the sharing of storage stack services among the colocated containers incurs contention on shared kernel data structures and locks within I/O stack, leading to severe performance degradation on manycore platforms incorporating fast storage technologies (e.g., SSDs based on nonvolatile memories).
This article presents MultiLanes, a virtualized storage system for OS-level virtualization on manycores. MultiLanes builds an isolated I/O stack on top of a virtualized storage device for each container to eliminate contention on kernel data structures and locks between them, thus scaling them to manycores. Meanwhile, we propose a set of techniques to tune the overhead induced by storage-device virtualization to be negligible, and to scale the virtualized devices to manycores on the host, which itself scales poorly. To reduce the contention within each single container, we further propose SFS, which runs multiple file-system instances through the proposed virtualized storage devices, distributes all files under each directory among the underlying file-system instances, then stacks a unified namespace on top of them.
The evaluation of our prototype system built for Linux container (LXC) on a 32-core machine with both a RAM disk and a modern flash-based SSD demonstrates that MultiLanes scales much better than Linux in micro- and macro-benchmarks, bringing significant performance improvements, and that MultiLanes with SFS can further reduce the contention within each single container.
- Jonathan Appavoo, Dilma Da Silva, Orran Krieger, Marc A. Auslander, Michal Ostrowski, Bryan S. Rosenburg, Amos Waterland, Robert W. Wisniewski, Jimi Xenidis, Michael Stumm, and Livio Soares. 2007. Experience distributing objects in an SMMP OS. ACM Transactions on Computer Systems 25, 3. Google Scholar
Digital Library
- Gaurav Banga, Peter Druschel, and Jeffrey C. Mogul. 1999. Resource containers: A new facility for resource management in server systems. In Proceedings of the 3rd USENIX Symposium on Operating Systems Design and Implementation (OSDI’99). Google Scholar
Digital Library
- Andrew Baumann, Paul Barham, Pierre-Évariste Dagand, Timothy L. Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP’09). Google Scholar
Digital Library
- Matias Bjørling, Jens Axboe, David W. Nellans, and Philippe Bonnet. 2013. Linux block IO: Introducing multi-queue SSD access on multi-core systems. In 6th Annual International Systems and Storage Conference (SYSTOR’13). Google Scholar
Digital Library
- Silas Boyd-Wickizer, Haibo Chen, Rong Chen, Yandong Mao, M. Frans Kaashoek, Robert Morris, Aleksey Pesterev, Lex Stein, Ming Wu, Yue-hua Dai, Yang Zhang, and Zheng Zhang. 2008. Corey: An operating system for many cores. In 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI’08). Google Scholar
Digital Library
- Silas Boyd-Wickizer, Austin T. Clements, Yandong Mao, Aleksey Pesterev, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. 2010. An analysis of Linux scalability to many cores. In 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10). Google Scholar
Digital Library
- John L. Bruno, Eran Gabber, Banu Özden, and Avi Silberschatz. 1998. The eclipse operating system: Providing quality of service via reservation domains. In 1998 USENIX Annual Technical Conference. Google Scholar
Digital Library
- Edouard Bugnion, Scott Devine, and Mendel Rosenblum. 1997. DISCO: Running commodity operating systems on scalable multiprocessors. In Proceedings of the 16th ACM Symposium on Operating System Principles (SOSP’97). Google Scholar
Digital Library
- Bryan Cantrill and Jeff Bonwick. 2008. Real-world concurrency. ACM Queue 6, 5, 16--25. Google Scholar
Digital Library
- Adrian M. Caulfield, Arup De, Joel Coburn, Todor I. Mollow, Rajesh K. Gupta, and Steven Swanson. 2010. Moneta: A high-performance storage array architecture for next-generation, non-volatile memories. In 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’10). Google Scholar
Digital Library
- Adrian M. Caulfield, Todor I. Mollov, Louis Alex Eisner, Arup De, Joel Coburn, and Steven Swanson. 2012. Providing safe, user space access to fast, solid state disks. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). Google Scholar
Digital Library
- John Chapin, Mendel Rosenblum, Scott Devine, Tirthankar Lahiri, Dan Teodosiu, and Anoop Gupta. 1995. Hive: Fault containment for shared-memory multiprocessors. In Proceedings of the Fifteenth ACM Symposium on Operating System Principles, SOSP 1995. Google Scholar
Digital Library
- Feng Chen, Rubao Lee, and Xiaodong Zhang. 2011. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In 17th International Conference on High-Performance Computer Architecture (HPCA’11). Google Scholar
Digital Library
- Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In ACM SIGOPS 24th Symposium on Operating Systems Principles (SOSP’13). Google Scholar
Digital Library
- Dave Chinner. 2011. dentry: move to per-sb LRU locks. Retrieved April 4, 2016 from https://lkml.org/lkml/2011/8/8/34.Google Scholar
- Dave Chinner. 2013. Sync and VFS scalability improvements. Retrieved April 4, 2016 from http://lwn.net/Articles/561569/.Google Scholar
- Austin T. Clements, M. Frans Kaashoek, Nickolai Zeldovich, Robert Tappan Morris, and Eddie Kohler. 2013. The scalable commutativity rule: Designing scalable software for multicore processors. In ACM SIGOPS 24th Symposium on Operating Systems Principles (SOSP’13). Google Scholar
Digital Library
- Yan Cui, Yingxin Wang, Yu Chen, and Yuanchun Shi. 2013. Lock-contention-aware scheduler: A scalable and energy-efficient method for addressing scalability collapse on multicore systems. ACM Transactions on Architecture and Code Optimization 9, 4, 44:1--44:25. Google Scholar
Digital Library
- Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2013. Everything you always wanted to know about synchronization but were afraid to ask. In ACM SIGOPS 24th Symposium on Operating Systems Principles (SOSP’13). Google Scholar
Digital Library
- Hugh Dickins. 2012. mm/memcg: per-memcg per-zone lru locking. Retrieved April 4, 2016 from https://lwn.net/Articles/482726/.Google Scholar
- Rasha Eqbal. 2014. ScaleFS: A Multicore-Scalable File System. Master’s thesis. Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
- Benjamin Gamsa, Orran Krieger, Jonathan Appavoo, and Michael Stumm. 1999. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In Proceedings of the Third USENIX Symposium on Operating Systems Design and Implementation (OSDI’99). Google Scholar
Digital Library
- Abel Gordon, Nadav Amit, Nadav Har’El, Muli Ben-Yehuda, Alex Landau, Assaf Schuster, and Dan Tsafrir. 2012. ELI: Bare-metal performance for I/O virtualization. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). Google Scholar
Digital Library
- Charles Gruenwald III. 2014. Providing a Shared File System in the Hare POSIX Multikernel. Ph.D. Dissertation. Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
- Charles Gruenwald III, Filippo Sironi, M. Frans Kaashoek, and Nickolai Zeldovich. 2015. Hare: A file system for non-cache-coherent multicores. In Proceedings of the 10th European Conference on Computer Systems (EuroSys’15). Google Scholar
Digital Library
- Junbin Kang, Benlong Zhang, Tianyu Wo, Chunming Hu, and Jinpeng Huai. 2014. MultiLanes: Providing virtualized storage for OS-level virtualization on many cores. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). Google Scholar
Digital Library
- Junbin Kang, Benlong Zhang, Tianyu Wo, Weiren Yu, Lian Du, Shuai Ma, and Jinpeng Huai. 2015. SpanFS: A scalable file system on fast storage devices. In 2015 USENIX Annual Technical Conference (USENIX ATC’15). Google Scholar
Digital Library
- Kir Kolyshkin. 2012. Introducing container in a file aka ploop. Retrieved April 4, 2016 from http://openvz.livejournal.com/40830.html.Google Scholar
- Duy Le, Hai Huang, and Haining Wang. 2012. Understanding performance implications of nested file systems in a virtualized environment. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). Google Scholar
Digital Library
- Lanyue Lu, Yupu Zhang, Thanh Do, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. Physical disentanglement in a container-based file system. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). Google Scholar
Digital Library
- Stelios Mavridis, Yannis Sfakianakis, Anastasios Papagiannis, Manolis Marazakis, and Angelos Bilas. 2014. Jericho: Achieving scalability through optimal data placement on multicore systems. In IEEE 30th Symposium on Mass Storage Systems and Technologies (MSST’14).Google Scholar
Cross Ref
- Paul E. McKenney, Jonathan Appavoo, Andi Kleen, Orran Krieger, Rusty Russell, Dipankar Sarma, and Maneesh Soni. 2001. Read-copy update. In Ottawa Linux Symposium.Google Scholar
- Steven Osman, Dinesh Subhraveti, Gong Su, and Jason Nieh. 2002. The design and implementation of zap: A system for migrating computing environments. In 5th Symposium on Operating System Design and Implementation (OSDI’02). Google Scholar
Digital Library
- Maxim Patlasov. 2011. Containers in a File. Retrieved April 4, 2016 from https://openvz.org/images/f/f3/Ct_in_a_file.pdf. (2011).Google Scholar
- Jan-Simon Pendry and Marshall K. McKusick. 1995. Union mounts in 4.4BSD-lite. In USENIX 1995 Technical Conference on UNIX and Advanced Computing Systems. Google Scholar
Digital Library
- Dai Qin, Angela Demke Brown, and Ashvin Goel. 2014. Reliable writeback for client-side flash caches. In 2014 USENIX Annual Technical Conference (USENIX ATC’14). Google Scholar
Digital Library
- Rusty Russell. 2008. Virtio: Towards a de-facto standard for virtual I/O devices. Operating Systems Review 42, 5, 95--103. Google Scholar
Digital Library
- Eric Seppanen, Matthew T. O’Keefe, and David J. Lilja. 2010. High performance solid state storage under Linux. In IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10). Google Scholar
Digital Library
- Yannis Sfakianakis, Stelios Mavridis, Anastasios Papagiannis, Spyridon Papageorgiou, Markos Fountoulakis, Manolis Marazakis, and Angelos Bilas. 2014. Vanguard: Increasing server efficiency via workload isolation in the storage I/O path. In Proceedings of the ACM Symposium on Cloud Computing. Google Scholar
Digital Library
- Stephen Soltesz, Herbert Pötzl, Marc E. Fiuczynski, Andy C. Bavier, and Larry L. Peterson. 2007. Container-based operating system virtualization: A scalable, high-performance alternative to hypervisors. In Proceedings of the 2007 EuroSys Conference. Google Scholar
Digital Library
- Xiang Song, Haibo Chen, Rong Chen, Yuanxuan Wang, and Binyu Zang. 2011. A case for scaling applications to many-core with OS clustering. In Proceedings of the 6th European Conference on Computer Systems (EuroSys’11). Google Scholar
Digital Library
- Jeremy Sugerman, Ganesh Venkitachalam, and Beng-Hong Lim. 2001. Virtualizing I/O devices on VMware workstation’s hosted virtual machine monitor. In Proceedings of the General Track: 2001 USENIX Annual Technical Conference. Google Scholar
Digital Library
- Ben Verghese, Anoop Gupta, and Mendel Rosenblum. 1998. Performance isolation: Sharing and isolation in shared-memory multiprocessors. In ASPLOS-VIII Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems. Google Scholar
Digital Library
- Matthew Wachs, Michael Abd-El-Malek, Eno Thereska, and Gregory R. Ganger. 2007. Argon: Performance insulation for shared storage servers. In 5th USENIX Conference on File and Storage Technologies (FAST’07). Google Scholar
Digital Library
- Charles P. Wright, Jay Dave, Puja Gupta, Harikesavan Krishnan, David P. Quigley, Erez Zadok, and Mohammad Nayyer Zubair. 2006. Versatility and Unix semantics in namespace unification. ACM Transactions on Storage 2, 1, 74--105. Google Scholar
Digital Library
- Erez Zadok, Ion Badulescu, and Alex Shender. 1999. Extending file systems using stackable templates. In Proceedings of the 1999 USENIX Annual Technical Conference. Google Scholar
Digital Library
- Da Zheng, Randal Burns, and Alexander S. Szalay. 2013. Toward millions of file system IOPS on low-cost, commodity hardware. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC’13). Google Scholar
Digital Library
Index Terms
MultiLanes: Providing Virtualized Storage for OS-Level Virtualization on Manycores
Recommendations
MultiLanes: providing virtualized storage for OS-level virtualization on many cores
FAST'14: Proceedings of the 12th USENIX conference on File and Storage TechnologiesOS-level virtualization is an efficient method for server consolidation. However, the sharing of kernel services among the co-located virtualized environments (VEs) incurs performance interference between each other. Especially, interference effects ...
vCache: architectural support for transparent and isolated virtual LLCs in virtualized environments
MICRO-48: Proceedings of the 48th International Symposium on MicroarchitectureA key role of virtualization is to give an illusion that a consolidated workload runs on a dedicated machine although the underlying resources are actively shared by multiple workloads. Technical advances have enabled a virtual machine (VM) to exercise ...
Batch scheduling of consolidated virtual machines based on their workload interference model
The use of virtualization technology (VT) has become widespread in modern datacenters and Clouds in recent years. In spite of their many advantages, such as provisioning of isolated execution environments and migration, current implementations of VT do ...






Comments