skip to main content
research-article

Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems

Published:16 March 2013Publication History
Skip Abstract Section

Abstract

We propose a novel kernel-level memory allocator, called M3 (M-cube, Multi-core Multi-bank Memory allocator), that has the following two features. First, it introduces and makes use of a notion of a memory container, which is defined as a unit of memory that comprises the minimum number of page frames that can cover all the banks of the memory organization, by exclusively assigning a container to a core so that each core achieves bank parallelism as much as possible. Second, it orchestrates page frame allocation so that pages that threads access are dispersed randomly across multiple banks so that each thread's access pattern is randomized. The development of M3 is based on a tool that we develop to fully understand the architectural characteristics of the underlying memory organization. Using an extension of this tool, we observe that the same application that accesses pages in a random manner outperforms one that accesses pages in a regular pattern such as sequential or same ordered accesses. This is because such randomized accesses reduces inter-thread access interference on the row-buffer in memory banks. We implement M3 in the Linux kernel version 2.6.32 on the Intel Xeon system that has 16 cores and 32GB DRAM. Performance evaluation with various workloads show that M3 improves the overall performance for memory intensive benchmarks by up to 85% with an average of about 40%.

References

  1. AMD Multi-core, http://www.amd.com.Google ScholarGoogle Scholar
  2. ARM Cortex-A9 Processor, http://www.arm.com.Google ScholarGoogle Scholar
  3. Intel Multi-core Technology, http://www.intel.com.Google ScholarGoogle Scholar
  4. J. H. Ahn, N. P. Jouppi, C. Kozyrakis, J. Leverich, and R. S. Schreiber. Future Scaling of Processor-Memory Interfaces. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 42:1--42:12, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Benchmark. Linux Benchmark Suite Home Page. http://lbs.sourceforge.net/.Google ScholarGoogle Scholar
  6. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT '08, pages 72--81, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. J. Bligh, M. Dobson, D. Hart, and G. Huizenga. Linux on NUMA Systems. In Proceedings of the Linux Symposium, pages 295--306, 2004.Google ScholarGoogle Scholar
  8. G. S. Brodal, E. D. Demaine, and J. I. Munro. Fast Allocation and Deallocation with an Improved Buddy System. Acta Informatica, 41:273--291, March 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Choi, J. Lee, and W. Sung. Memory Access Pattern-Aware DRAM Performance Model for Multi-core Systems. In Proceedings of the 2011 IEEE International Symposium on Performance Analysis of Systems & Software, ISPASS '11, pages 66--75, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Correa, A. Zorzo, and R. Scheer. Operating System Multilevel Load Balancing. In Proceedings of the 2006 ACM Symposium on Applied Computing, SAC '06, pages 1467--1471, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. K. Ganesh Balakrishnan, Ralph M. Begun. Understanding Intel Xeon 5600 Series Memory Performance and Optimization in IBM System x and BladeCenter Platforms. White paper, IBM, May 2010.Google ScholarGoogle Scholar
  12. Hewlett-Packard. DDR3 Memory Technology, Technology Brief, 3rd edition. White paper, HP, April 2012.Google ScholarGoogle Scholar
  13. JEDEC. JEDEC Standard : DDR3 SDRAM Specification. White paper, JEDEC, July 2012. http://www.jedec.org/standards-documents/docs/jesd-79-3d.Google ScholarGoogle Scholar
  14. M. K. Jeong, D. H. Yoon, D. Sunwoo, M. Sullivan, I. Lee, and M. Erez. Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems. In Proceedings of the IEEE 18th International Symposium on High-Performance Computer Architecture, HPCA-18 '12, pages 1--12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Kaseridis, J. Stuecheli, and L. K. John. Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core era. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, pages 24--35, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-43 '10, pages 65--76, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. J. Lee, V. Narasiman, O. Mutlu, and Y. N. Patt. Improving Memory Bank-level Parallelism in the Presence of Prefetching. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-42 '09, pages 327--336, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Micron Technology. DDR3 SDRAM RDIMM : MT18JSF25672PD 2GB. White paper, Micron, July 2010. http://www.micron.com/products/dram-modules/.Google ScholarGoogle Scholar
  19. S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda. Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, pages 374--385, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. O. Mutlu and T. Moscibroda. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-40 '07, pages 146--160, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. I. P. Page and J. Hagins. Improving the Performance of Buddy Systems. IEEE Transactions on Computers, 35:441--447, May 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. L. Peterson and T. A. Norman. Buddy Systems. Communications of the ACM, 20:421--431, June 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. K. Pusukuri, R. Gupta, and L. N. Bhuyan. Thread Tranquilizer: Dynamically Reducing Performance Variation. ACM Transactions on Architecture and Code Optimization, 8(4):46:1--46:21, January 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ramspeed. Ramspeed Benchmark. http://alasir.com/software/ramspeed/.Google ScholarGoogle Scholar
  25. S. Rixner. Memory Controller Optimizations for Web Servers. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-37 '04, pages 355--366, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory Access Scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture, ISCA-27 '00, pages 128--138, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. STREAM. STREAM Benchmark. http://www.cs.virginia.edu/stream.Google ScholarGoogle Scholar
  28. K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, R. Balasubramonian, and A. Davis. Micro-pages: Increasing DRAM Efficiency with Locality-Aware Data Placement. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS-15 '10, pages 219--230, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. SysBench. Sysbench: A System Performance Benchmark. http://sysbench.sourceforge.net/.Google ScholarGoogle Scholar
  30. UnixBench. UnixBench: A Fundamental High-level Linux Benchmark Suite. http://www.tux.org/pub/tux/benchmarks/System/unixbench/.Google ScholarGoogle Scholar
  31. D. Wang, B. Ganesh, N. Tuaycharoen, K. Baynes, A. Jaleel, and B. Jacob. DRAMsim: A Memory System Simulator. ACM SIGARCH Computer Architecture News, 33(4):100--107, November 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. W. Wang, T. Dey, J. Mars, L. Tang, J. W. Davidson, and M. L. Soffa. Performance Analysis of Thread Mappings with a Holistic View of the Hardware Resources. In Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, ISPASS '12, pages 156--167, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. H. Yoon, M. K. Jeong, and M. Erez. Adaptive Granularity Memory Systems: A Tradeoff between Storage Efficiency and Throughput. In Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA-38 '11, pages 295--306, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Z. Zhang, Z. Zhu, and X. Zhang. A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture, MICRO-33 '00, pages 32--41, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Zheng, J. Lin, Z. Zhang, E. Gorbatov, H. David, and Z. Zhu. Minirank: Adaptive DRAM Architecture for Improving Memory Power Efficiency. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-41 '08, pages 210--221, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 48, Issue 4
    ASPLOS '13
    April 2013
    540 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2499368
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
      March 2013
      574 pages
      ISBN:9781450318709
      DOI:10.1145/2451116

    Copyright © 2013 ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 16 March 2013

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!