Abstract
Most hardware and software venders suggest disabling hardware prefetching in virtualized environments. They claim that prefetching is detrimental to application performance due to inaccurate prediction caused by workload diversity and VM interference on shared cache. However, no comprehensive or quantitative measurements to support this belief have been performed.
This paper is the first to systematically measure the influence of hardware prefetching in virtualized environments. We examine a wide variety of benchmarks on three types of chip-multiprocessors (CMPs) to analyze the hardware prefetching performance. We conduct extensive experiments by taking into account a number of important virtualization factors. We find that hardware prefetching has minimal destructive influence under most configurations. Only with certain application combinations does prefetching influence the overall performance.
To leverage these findings and make hardware prefetching effective across a diversity of virtualized environments, we propose a dynamic prefetching-aware VCPU-core binding approach (PAVCB), which includes two phases - classifying and binding. The workload of each VM is classified into different cache sharing constraint categories based upon its cache access characteristics, considering both prefetch requests and demand requests. Then following heuristic rules, the VCPUs of each VM are scheduled onto appropriate cores subject to cache sharing constraints. We show that the proposed approach can improve performance by 12% on average over the default scheduler and 46% over manual system administrator bindings across different workload combinations in the presence of hardware prefetching.
- Adams, K., and Agesen, O. A comparison of software and hardware techniques for x86 virtualization. In ASPLOS (2006), pp. 2--13. Google Scholar
Digital Library
- AMD. BIOS and kernel developer's guide for AMD family 10h processors. White Paper, 2010.Google Scholar
- Barrow-Williams, N., Fensch, C., and Moore, S. A communication characterisation of Splash-2 and Parsec. In IISWC (2009), pp. 86--97. Google Scholar
Digital Library
- Bhattacharjee, A., and Martonosi, M. Characterizing the TLB behavior of emerging parallel workloads on chip multiprocessors. In PACT (2009), pp. 29--40. Google Scholar
Digital Library
- Bienia, C. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011. Google Scholar
Digital Library
- Ebrahimi, E., Mutlu, O., Lee, C. J., and Patt, Y. N. Coordinated control of multiple prefetchers in multi-core systems. In Micro (2009), pp. 316--326. Google Scholar
Digital Library
- Ebrahimi, E., Mutlu, O., and Patt, Y. N. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA (2009), pp. 7 -- 17.Google Scholar
Cross Ref
- Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A. D., Ailamaki, A., and Falsafi, B. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In ASPLOS (2012), pp. 37--48. Google Scholar
Digital Library
- Filebench. Filebench. http://sourceforge.net/apps/mediawiki/filebench.Google Scholar
- Govindan, S., Liu, J., Kansal, A., and Sivasubramaniam, A. Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines. In SoCC (2011), pp. 22:1--22:14. Google Scholar
Digital Library
- IBM. IBM eServer xSeries 366 tuning tips. Technical Report, 2005.Google Scholar
- IBM. Virtualization on the IBM system x3950 server. Technical Report, 2006.Google Scholar
- IBM. Tuning IBM system x servers for performance. Technical Report, 2007.Google Scholar
- Intel. Achieving fast, scalable I/O for virtualized servers. White Paper, 2009.Google Scholar
- Jaleel, A., Najaf-abadi, H. H., Subramaniam, S., Steely, S. C., and Emer, J. CRUISE: cache replacement and utility-aware scheduling. In ASPLOS (2012), pp. 249--260. Google Scholar
Digital Library
- Jaleel, A., Theobald, K. B., Steely, Jr., S. C., and Emer, J. High performance cache replacement using re-reference interval prediction (RRIP). In ISCA (2010), pp. 60--71. Google Scholar
Digital Library
- Jones, S. T., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. Geiger: monitoring the buffer cache in a virtual machine environment. In ASPLOS (2006), pp. 14--24. Google Scholar
Digital Library
- Khan, S. M., Tian, Y., and Jimenez, D. A. Sampling dead block prediction for last-level caches. In Micro (2010), pp. 175--186. Google Scholar
Digital Library
- Lee, C. J., Mutlu, O., Narasiman, V., and Patt, Y. N. Prefetch-Aware DRAM controllers. In Micro (2008), pp. 200--209. Google Scholar
Digital Library
- Lee, C. J., Mutlu, O., Narasiman, V., and Patt, Y. N. Prefetch-aware shared resource management for multi-core systems. In ISCA (2011), pp. 141--152.Google Scholar
- Liu, F., Jiang, X., and Solihin, Y. Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance. In HPCA (2010), pp. 1--12.Google Scholar
- Liu, F., and Solihin, Y. Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors. In SIGMETRICS (2011), pp. 37--48. Google Scholar
Digital Library
- Lo, J., Barroso, L. A., Eggers, S. J., Gharachorloo, K., Levy, H. M., and Parekh, S. S. An analysis of database workload performance on simultaneous multithreaded processors. In ISCA (1998), pp. 39--50. Google Scholar
Digital Library
- Ma, Z., Sheng, Z., Gu, L., Wen, L., and Zhang, G. DVM: towards a datacenter-scale virtual machine. In VEE (2012), pp. 39--50. Google Scholar
Digital Library
- Muralidhara, S. P., Subramanian, L., Mutlu, O., Kandemir, M., and Moscibroda, T. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Micro (2011), pp. 374--385. Google Scholar
Digital Library
- Ongaro, D., Cox, A. L., and Rixner, S. Scheduling I/O in virtual machine monitors. In VEE (2008), pp. 14--24. Google Scholar
Digital Library
- OProfile. A system profiler for Linux. http://http://oprofile.sourceforge.net.Google Scholar
- Pan, S., Cherng, C., Dick, K., and Ladner, R. E. Algorithms to take advantage of hardware prefetching. In ALENEX (2007).Google Scholar
- Singh, B. Page/slab cache control in a virtualized environment. In Linux Symposium (2010), pp. 252--262.Google Scholar
- Soares, L., Tam, D., and Stumm, M. Reducing the harmful effects of last-level cache polluters with an os-level, software-only pollute buffer. In MICRO (2008), pp. 258--269. Google Scholar
Digital Library
- Srikantaiah, S., Kandemir, M., and Irwin, M. J. Adaptive set pinning: managing shared caches in chip multiprocessors. In ASPLOS (2008), pp. 135--144. Google Scholar
Digital Library
- Srinath, S., Mutlu, O., Kim, H., and Patt, Y. N. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA (2007), pp. 63--74. Google Scholar
Digital Library
- SysBench. Sysbench: a system performance benchmark. http://sysbench.sourceforge.net.Google Scholar
- Tam, D., Azimi, R., Soares, L., and Stumm, M. Managing shared L2 caches on multicore systems in software. In WIOSCA (2007).Google Scholar
- Tang, L., Mars, J., Vachharajani, N., Hundt, R., and Soffa, M. L. The impact of memory subsystem resource sharing on datacenter applications. In ISCA (2011), pp. 283--294. Google Scholar
Digital Library
- Verma, S., Koppelman, D. M., and Peng, L. Efficient prefetching with hybrid schemes and use of program feedback to adjust prefetcher aggressiveness. Journal of Instruction-Level Parallelism, 13 (2011), 1--14.Google Scholar
- VMware. VMware VMmark v1.0.0 Results - Dell PowerEdge R900. Technical Report, 2008.Google Scholar
- VMware. Performance best practices for VMware vSphere 5.0. Technical Report, 2011.Google Scholar
- Waldspurger, C. A. Memory resource management in vmware esx server. In SIGOPS Oper. Syst. Rev. (2002), pp. 181--194. Google Scholar
Digital Library
- Wu, C.-J., Jaleel, A., Martonosi, M., Steely, Jr., S. C., and Emer, J. PACMan: prefetch-aware cache management for high performance caching. In Micro (2011), pp. 442--453. Google Scholar
Digital Library
- Xie, Y., and Loh, G. H. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches. In ISCA (2009), pp. 174--183. Google Scholar
Digital Library
- Zhang, E., Jiang, Y., and Shen, X. Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In PPoPP (2010), pp. 203--212. Google Scholar
Digital Library
- Zhang, X., Dwarkadas, S., and Shen, K. Towards practical page coloring-based multicore cache management. In Eurosys (2009), pp. 89--102. Google Scholar
Digital Library
- Zhuravlev, S., Blagodurov, S., and Fedorova, A. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS (2010), pp. 129--142. Google Scholar
Digital Library
Index Terms
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach
Recommendations
Data prefetch mechanisms
The expanding gap between microprocessor and DRAM performance has necessitated the use of increasingly aggressive techniques designed to reduce or hide the latency of main memory access. Although large cache hierarchies have proven to be effective in ...
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach
ASPLOS '13Most hardware and software venders suggest disabling hardware prefetching in virtualized environments. They claim that prefetching is detrimental to application performance due to inaccurate prediction caused by workload diversity and VM interference on ...
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systemsMost hardware and software venders suggest disabling hardware prefetching in virtualized environments. They claim that prefetching is detrimental to application performance due to inaccurate prediction caused by workload diversity and VM interference on ...







Comments