skip to main content
research-article

To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach

Published:16 March 2013Publication History
Skip Abstract Section

Abstract

Most hardware and software venders suggest disabling hardware prefetching in virtualized environments. They claim that prefetching is detrimental to application performance due to inaccurate prediction caused by workload diversity and VM interference on shared cache. However, no comprehensive or quantitative measurements to support this belief have been performed.

This paper is the first to systematically measure the influence of hardware prefetching in virtualized environments. We examine a wide variety of benchmarks on three types of chip-multiprocessors (CMPs) to analyze the hardware prefetching performance. We conduct extensive experiments by taking into account a number of important virtualization factors. We find that hardware prefetching has minimal destructive influence under most configurations. Only with certain application combinations does prefetching influence the overall performance.

To leverage these findings and make hardware prefetching effective across a diversity of virtualized environments, we propose a dynamic prefetching-aware VCPU-core binding approach (PAVCB), which includes two phases - classifying and binding. The workload of each VM is classified into different cache sharing constraint categories based upon its cache access characteristics, considering both prefetch requests and demand requests. Then following heuristic rules, the VCPUs of each VM are scheduled onto appropriate cores subject to cache sharing constraints. We show that the proposed approach can improve performance by 12% on average over the default scheduler and 46% over manual system administrator bindings across different workload combinations in the presence of hardware prefetching.

References

  1. Adams, K., and Agesen, O. A comparison of software and hardware techniques for x86 virtualization. In ASPLOS (2006), pp. 2--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AMD. BIOS and kernel developer's guide for AMD family 10h processors. White Paper, 2010.Google ScholarGoogle Scholar
  3. Barrow-Williams, N., Fensch, C., and Moore, S. A communication characterisation of Splash-2 and Parsec. In IISWC (2009), pp. 86--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bhattacharjee, A., and Martonosi, M. Characterizing the TLB behavior of emerging parallel workloads on chip multiprocessors. In PACT (2009), pp. 29--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bienia, C. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ebrahimi, E., Mutlu, O., Lee, C. J., and Patt, Y. N. Coordinated control of multiple prefetchers in multi-core systems. In Micro (2009), pp. 316--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ebrahimi, E., Mutlu, O., and Patt, Y. N. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA (2009), pp. 7 -- 17.Google ScholarGoogle ScholarCross RefCross Ref
  8. Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A. D., Ailamaki, A., and Falsafi, B. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In ASPLOS (2012), pp. 37--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Filebench. Filebench. http://sourceforge.net/apps/mediawiki/filebench.Google ScholarGoogle Scholar
  10. Govindan, S., Liu, J., Kansal, A., and Sivasubramaniam, A. Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines. In SoCC (2011), pp. 22:1--22:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. IBM. IBM eServer xSeries 366 tuning tips. Technical Report, 2005.Google ScholarGoogle Scholar
  12. IBM. Virtualization on the IBM system x3950 server. Technical Report, 2006.Google ScholarGoogle Scholar
  13. IBM. Tuning IBM system x servers for performance. Technical Report, 2007.Google ScholarGoogle Scholar
  14. Intel. Achieving fast, scalable I/O for virtualized servers. White Paper, 2009.Google ScholarGoogle Scholar
  15. Jaleel, A., Najaf-abadi, H. H., Subramaniam, S., Steely, S. C., and Emer, J. CRUISE: cache replacement and utility-aware scheduling. In ASPLOS (2012), pp. 249--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jaleel, A., Theobald, K. B., Steely, Jr., S. C., and Emer, J. High performance cache replacement using re-reference interval prediction (RRIP). In ISCA (2010), pp. 60--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jones, S. T., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. Geiger: monitoring the buffer cache in a virtual machine environment. In ASPLOS (2006), pp. 14--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Khan, S. M., Tian, Y., and Jimenez, D. A. Sampling dead block prediction for last-level caches. In Micro (2010), pp. 175--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lee, C. J., Mutlu, O., Narasiman, V., and Patt, Y. N. Prefetch-Aware DRAM controllers. In Micro (2008), pp. 200--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lee, C. J., Mutlu, O., Narasiman, V., and Patt, Y. N. Prefetch-aware shared resource management for multi-core systems. In ISCA (2011), pp. 141--152.Google ScholarGoogle Scholar
  21. Liu, F., Jiang, X., and Solihin, Y. Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance. In HPCA (2010), pp. 1--12.Google ScholarGoogle Scholar
  22. Liu, F., and Solihin, Y. Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors. In SIGMETRICS (2011), pp. 37--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lo, J., Barroso, L. A., Eggers, S. J., Gharachorloo, K., Levy, H. M., and Parekh, S. S. An analysis of database workload performance on simultaneous multithreaded processors. In ISCA (1998), pp. 39--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ma, Z., Sheng, Z., Gu, L., Wen, L., and Zhang, G. DVM: towards a datacenter-scale virtual machine. In VEE (2012), pp. 39--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Muralidhara, S. P., Subramanian, L., Mutlu, O., Kandemir, M., and Moscibroda, T. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Micro (2011), pp. 374--385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ongaro, D., Cox, A. L., and Rixner, S. Scheduling I/O in virtual machine monitors. In VEE (2008), pp. 14--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. OProfile. A system profiler for Linux. http://http://oprofile.sourceforge.net.Google ScholarGoogle Scholar
  28. Pan, S., Cherng, C., Dick, K., and Ladner, R. E. Algorithms to take advantage of hardware prefetching. In ALENEX (2007).Google ScholarGoogle Scholar
  29. Singh, B. Page/slab cache control in a virtualized environment. In Linux Symposium (2010), pp. 252--262.Google ScholarGoogle Scholar
  30. Soares, L., Tam, D., and Stumm, M. Reducing the harmful effects of last-level cache polluters with an os-level, software-only pollute buffer. In MICRO (2008), pp. 258--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Srikantaiah, S., Kandemir, M., and Irwin, M. J. Adaptive set pinning: managing shared caches in chip multiprocessors. In ASPLOS (2008), pp. 135--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Srinath, S., Mutlu, O., Kim, H., and Patt, Y. N. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA (2007), pp. 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. SysBench. Sysbench: a system performance benchmark. http://sysbench.sourceforge.net.Google ScholarGoogle Scholar
  34. Tam, D., Azimi, R., Soares, L., and Stumm, M. Managing shared L2 caches on multicore systems in software. In WIOSCA (2007).Google ScholarGoogle Scholar
  35. Tang, L., Mars, J., Vachharajani, N., Hundt, R., and Soffa, M. L. The impact of memory subsystem resource sharing on datacenter applications. In ISCA (2011), pp. 283--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Verma, S., Koppelman, D. M., and Peng, L. Efficient prefetching with hybrid schemes and use of program feedback to adjust prefetcher aggressiveness. Journal of Instruction-Level Parallelism, 13 (2011), 1--14.Google ScholarGoogle Scholar
  37. VMware. VMware VMmark v1.0.0 Results - Dell PowerEdge R900. Technical Report, 2008.Google ScholarGoogle Scholar
  38. VMware. Performance best practices for VMware vSphere 5.0. Technical Report, 2011.Google ScholarGoogle Scholar
  39. Waldspurger, C. A. Memory resource management in vmware esx server. In SIGOPS Oper. Syst. Rev. (2002), pp. 181--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Wu, C.-J., Jaleel, A., Martonosi, M., Steely, Jr., S. C., and Emer, J. PACMan: prefetch-aware cache management for high performance caching. In Micro (2011), pp. 442--453. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Xie, Y., and Loh, G. H. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches. In ISCA (2009), pp. 174--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Zhang, E., Jiang, Y., and Shen, X. Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In PPoPP (2010), pp. 203--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zhang, X., Dwarkadas, S., and Shen, K. Towards practical page coloring-based multicore cache management. In Eurosys (2009), pp. 89--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Zhuravlev, S., Blagodurov, S., and Fedorova, A. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS (2010), pp. 129--142. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 48, Issue 4
      ASPLOS '13
      April 2013
      540 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2499368
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
        March 2013
        574 pages
        ISBN:9781450318709
        DOI:10.1145/2451116

      Copyright © 2013 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 March 2013

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!