Abstract
Content based page sharing techniques improve memory efficiency in virtualized systems by identifying and merging identical pages. Kernel Same-page Merging (KSM), a Linux kernel utility for page sharing, sequentially scans memory pages of virtual machines to deduplicate pages. Sequential scanning of pages has several undesirable side effects---wasted CPU cycles when no sharing opportunities exist, and rate of discovery of sharing being dependent on the scanning rate and corresponding CPU availability. In this work, we exploit presence of GPUs on modern systems to enable rapid memory sharing through targeted scanning of pages. Our solution, Catalyst, works in two phases, the first where pages of virtual machines are processed by the GPU to identify likely pages for sharing and a second phase that performs page-level similarity checks on a targeted set of shareable pages. Opportunistic usage of the GPU to produce sharing hints enables rapid and low-overhead duplicate detection, and sharing of memory pages in virtualization environments. We evaluate Catalyst against various benchmarks and workloads to demonstrate that Catalyst can achieve higher memory sharing in lesser time compared to different scan rate configurations of KSM, at lower or comparable compute costs.
- Heterogeneous system architecture (hsa) foundation. URL http://www.hsafoundation.com/.Google Scholar
- Cuda toolkit documentation. URL http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/#zero-copy.Google Scholar
- A. Arcangeli, I. Eidus, and C. Wright. Increasing memory density by using ksm. In Proceedings of the 11th Ottawa Linux Symposium (OLS), 2009.Google Scholar
- E. Bugnion, S. Devine, K. Govil, and M. Rosenblum. Disco: Running commodity operating systems on scalable multiprocessors. ACM Transactions on Computer Systems (TOCS), 15 (4):412--447, 1997. Google Scholar
Digital Library
- D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. Oltp-bench: An extensible testbed for benchmarking relational databases. VLDB Endowment, 7(4):277--288, 2013. Google Scholar
Digital Library
- J. Duato, A. J. Pena, F. Silla, J. C. Fernandez, R. Mayo, and E. S. Quintana-Orti. Enabling cuda acceleration within virtual machines using rcuda. In Proceedings of the 18th Annual International Conference on High Performance Computing (HiPC), 2011. Google Scholar
Digital Library
- G. Giunta, R. Montella, G. Agrillo, and G. Coviello. A gpgpu transparent virtualization component for high performance computing clouds. In Proceedings of the 16th International European Conference on Parallel Processing (EuroPar). 2010. Google Scholar
Cross Ref
- F. Guo, S. Kim, Y. Baskakov, and I. Banerjee. Proactively breaking large pages to improve memory overcommitment performance in vmware esxi. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), 2015. Google Scholar
Digital Library
- D. Gupta, S. Lee, M. Vrable, S. Savage, A. C. Snoeren, G. Varghese, G. M. Voelker, and A. Vahdat. Difference engine: Harnessing memory redundancy in virtual machines. Communications of the ACM, 53(10):85--93, 2010. Google Scholar
Digital Library
- V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, and P. Ranganathan. Gvim: Gpu-accelerated virtual machines. In Proceedings of the 3rd Workshop on System-level Virtualization for High Performance Computing (HPCVirt), 2009. Google Scholar
Digital Library
- A. Herrera. Nvidia grid: Graphics accelerated vdi with the visual performance of a workstation, 2014. URL http://www.nvidia.com/content/grid/vdi-whitepaper.pdf.Google Scholar
- K. Jang, S. Han, S. Han, S. B. Moon, and K. Park. Sslshader: Cheap ssl acceleration with commodity processors. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2011.Google Scholar
Digital Library
- S. T. Jones, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Geiger: Monitoring the buffer cache in a virtual machine environment. SIGARCH Computer Architecture News, 34(5): 14--24, 2006. Google Scholar
Digital Library
- Khronos. The open standard for parallel programming of heterogeneous systems, 2015. URL https://www.khronos.org/opencl/.Google Scholar
- D. Magenheimer, C. Mason, D. McCracken, and K. Hackel. Transcendent memory and linux. In Proceedings of the 11th Ottawa Linux Symposium (OLS), 2009.Google Scholar
- K. Miller, F. Franz, M. Rittinghaus, M. Hillenbrand, and F. Bellosa. Xlh: More effective memory deduplication scanners through cross-layer hints. In Proceedings of the 24th USENIX Annual Technical Conference (ATC), 2013.Google Scholar
- G. Miłós, D. G. Murray, S. Hand, and M. A. Fetterman. Satori: Enlightened page sharing. In Proceedings of the 20th USENIX Annual Technical Conference (ATC), 2009.Google Scholar
- D. Mishra and P. Kulkarni. Comparative analysis of page cache provisioning in virtualized environments. In Proceedings of the 22nd International Symposium on Modelling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2014. Google Scholar
Digital Library
- R. Montella, G. Coviello, G. Giunta, G. Laccetti, F. Isaila, and J. G. Blas. A general-purpose virtualization service for hpc on cloud computing: an application to gpus. In Proceedings of the 9th International Conference on Parallel Processing and Applied Mathematics (PPAM), 2011.Google Scholar
- Y. Naoi and H. Yamada. A gpu-accelerated vm live migration for big memory workloads. In Proceedings of the 5th ACM Symposium on Cloud Computing (SoCC), 2014.Google Scholar
- NVIDIA. Nvidia grid k1 and k2 graphics-accelerated virtual desktops and applications, June 2013. URL http://www.nvidia.in/content/cloud-computing/pdf/nvidia-grid-datasheet-k1-k2.pdf.Google Scholar
- NVIDIA. Cuda parallel computing platform, 2015. URL http://www.nvidia.com/object/cuda_home_new.html.Google Scholar
- NVIDIA. Nvidia nvlink high-speed interconnect, 2017. URL http://www.nvidia.com/object/nvlink.html.Google Scholar
- NVIDIA. Unified memory in cuda 6, 2017. URL https://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/.Google Scholar
- S. Rachamalla, D. Mishra, and P. Kulkarni. Share-o-meter: An empirical analysis of ksm based memory sharing in virtualized systems. In Proceeding of the 20th Annual IEEE International Conference on High Performance Computing (HiPC), 2013. Google Scholar
Cross Ref
- C. Reano, A. Pea, F. Silla, J. Duato, R. Mayo, and E. Quintana-Orti. Cu2rcu: Towards the complete rcuda remote gpu virtualization and sharing solution. In Proceedings of the 19th Annual International Conference on High Performance Computing (HiPC), 2012. Google Scholar
Cross Ref
- redislabs. redis. URL https://redis.io/.Google Scholar
- P. Sharma and P. Kulkarni. Singleton: system-wide page deduplication in virtual environments. In Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2012. Google Scholar
Digital Library
- L. Shi, H. Chen, and J. Sun. vcuda: Gpu accelerated high performance computing in virtual machines. In Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2009.Google Scholar
- L. Shi, H. Chen, J. Sun, and K. Li. vcuda: Gpu-accelerated high-performance computing in virtual machines. IEEE Transactions on Computers, 61(6):804--816, 2012. Google Scholar
Digital Library
- W. Sun and R. Ricci. Fast and flexible: Parallel packet processing with gpus and click. In Proceedings of the 9th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), 2013. Google Scholar
Cross Ref
- W. Sun, R. Ricci, and M. L. Curry. Gpustore: harnessing gpu computing for storage systems in the os kernel. In Proceedings of the 5th Annual International Systems and Storage Conference (SYSTOR), 2012. Google Scholar
Digital Library
- J. Tölke and M. Krafczyk. Teraflop computing on a desktop pc with gpus for 3d cfd. International Journal of Computational Fluid Dynamics, 22(7):443--456, 2008. Google Scholar
Digital Library
- G. Vasiliadis, S. Antonatos, M. Polychronakis, E. P. Markatos, and S. Ioannidis. Gnort: High performance network intrusion detection using graphics processors. In Proceedings of the 11th International Symposium on Recent Advances in Intrusion Detection (RAID), 2008. Google Scholar
Digital Library
- E. Z. Vasily Tarasov and S. Shepler. Filebench: A flexible framework for file system benchmarking.; login:THE USENIX MAGAZINE, 41(1):6--12, 2016.Google Scholar
- F. Vazquez, E. Garzon, J. Martinez, and J. Fernandez. The sparse matrix vector product on gpus. In Proceedings of the 9th International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE), 2009.Google Scholar
- C. A. Waldspurger. Memory resource management in vmware esx server. ACM SIGOPS Operating Systems Review, 36(SI): 181--194, 2002.Google Scholar
- T. Wood, P. Shenoy, A. Venkataramani, and M. Yousif. Sandpiper: Black-box and gray-box resource management for virtual machines. Computer Networks, 53(17):2923--2938, 2009. Google Scholar
Digital Library
- Z. Yang, Y. Zhu, and Y. Pu. Parallel image processing based on cuda. In Proceedings of the 2nd International Conference on Computer Science and Software Engineering (CSSE), 2009.Google Scholar
Recommendations
Catalyst: GPU-assisted rapid memory deduplication in virtualization environments
VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution EnvironmentsContent based page sharing techniques improve memory efficiency in virtualized systems by identifying and merging identical pages. Kernel Same-page Merging (KSM), a Linux kernel utility for page sharing, sequentially scans memory pages of virtual ...
On implementation of GPU virtualization using PCI pass-through
CLOUDCOM '12: Proceedings of the 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom)In this paper, we use PCI pass-through technology and make the virtual machines in a virtual environment are able to use the NVIDIA graphics card, which uses the CUDA parallel progamming. It makes the virtual machine have not only the virtual CPU but ...
AKC: advanced KSM for cloud computing
SoCC '17: Proceedings of the 2017 Symposium on Cloud ComputingKernel samepage merging (KSM) in Linux kernel archive is a memory deduplication scheme that finds duplicate pages and shares the page in order to alleviate memory bottleneck in cloud. However, because the KSM has to scan all pages in memory to find ...







Comments