Abstract
The need to carefully schedule memory operations has increased as memory performance has become increasingly important to overall system performance. This article describes the adaptive history-based (AHB) scheduler, which uses the history of recently scheduled operations to provide three conceptual benefits: (1) it allows the scheduler to better reason about the delays associated with its scheduling decisions, (2) it provides a mechanism for combining multiple constraints, which is important for increasingly complex DRAM structures, and (3) it allows the scheduler to select operations so that they match the program's mixture of Reads and Writes, thereby avoiding certain bottlenecks within the memory controller.
We have previously evaluated this scheduler in the context of the IBM Power5. When compared with the state of the art, this scheduler improves performance by 15.6%, 9.9%, and 7.6% for the Stream, NAS, and commercial benchmarks, respectively. This article expands our understanding of the AHB scheduler in a variety of ways. Looking backwards, we describe the scheduler in the context of prior work that focused exclusively on avoiding bank conflicts, and we show that the AHB scheduler is superior for the IBM Power5, which we argue will be representative of future microprocessor memory controllers. Looking forwards, we evaluate this scheduler in the context of future systems by varying a number of microarchitectural features and hardware parameters. For example, we show that the benefit of this scheduler increases as we move to multithreaded environments.
- Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V., and Weeratunga, S. 1994. The NAS parallel benchmarks (94). Tech. rep. RNR-94-007, NASA Ames Research Center.Google Scholar
- Carter, J., Hsieh, W., Stoller, L., Swanson, M., Zhang, L., Brunvand, E., Davis, A., Kuo, C.-C., Kuramkote, R., Parker, M., Schaelicke, L., and Tateyama, T. 1999. Impulse: Building a smarter memory controller. In Proceedings of the 5th International Symposium on High Performance Computer Architecture. 70--79. Google Scholar
- Clabes, J., Friedrich, J., Sweet, M., DiLullo, J., Chu, S., Plass, D., Dawson, J., Muench, P., Powell, L., Floyd, M., Sinharoy, B., Lee, M., Goulet, M., Wagoner, J., Schwartz, N., Runyon, S., Gorman, G., Restle, P., Kalla, R., McGill, J., and Dodson, S. 2004. Design and implementation of the Power5 microprocessor. In Proceedings of the 41st Annual Conference on Design Automation. 670--672. Google Scholar
- Cragon, H. G. 1996. Memory Systems and Pipelined Processors. Jones and Bartlett. Google Scholar
- Cvetanovic, Z. 2003. Performance analysis of the Alpha 21364-based HP GS1280 multiprocessor. In Proceedings of the 30th Annual International Symposium on Computer Architecture. 218--229. Google Scholar
- Foster, J. E. 2000. Memory controller and method for dynamic page management. U.S. Patent 6,052,134.Google Scholar
- Gao, Q. S. 1993. The Chinese remainder theorem and the prime memory system. In Proceedings of the 20th Annual International Symposium on Computer Architecture. 337--340. Google Scholar
- Harper, III, D. T. and Jump, J. R. 1986. Performance evaluation of vector accesses in parallel memories using a skewed storage scheme. In Proceedings of the 13th Annual International Symposium on Computer Architecture. 324--328. Google Scholar
- Harriman, D. J., Langendorf, B. K., and Ajanovic, J. 2000. Method and apparatus for improving system performance when reordering commands. U.S. Patent 6,088,772.Google Scholar
- Harris, J. G. 2003. Apparatus and method for handling memory access requests in a data processing system. U.S. Patent 6,601,151.Google Scholar
- Hur, I. 2006. Enhancing memory controllers to improve DRAM power and performance. Ph.D. thesis, The University of Texas at Austin. Google Scholar
- Hur, I. 2007. Method and system for creating and dynamically selecting an arbiter design in a data processing system. US patent 7,287,111.Google Scholar
- Hur, I. and Lin, C. 2004. Adaptive history-based memory schedulers. In Proceedings of the 37th Annual ACM/IEEE International Symposium on Microarchitecture. 343--354. Google Scholar
- Hur, I. and Lin, C. 2006. Adaptive history-based memory schedulers for modern processors. IEEE Micro (Top Picks Issue) 26, 1, 22--29. Google Scholar
- Jenne, J. E. and Olarig, S. P. 2003. Method and apparatus for scheduling memory calibrations based on transactions. U.S. Patent 6,631,440.Google Scholar
- Kalla, R., Sinharoy, B., and Tendler, J. 2004. IBM Power5 chip: A dual-core multithreaded processor. IEEE Micro 24, 2, 40--47. Google Scholar
- Kessler, R. E., Bertone, M. S., Braganza, M. C., Bouchard, G. A., and Steinman, M. B. 2003. System for minimizing memory bank conflicts in a computer system. U.S. Patent 6,622,225.Google Scholar
- Khailany, B., Dally, W. J., Kapasi, U. J., Mattson, P., Namkoong, J., Owens, J. D., Towles, B., Chang, A., and Rixner, S. 2001. Imagine: Media processing with streams. IEEE Micro 21, 2, 35--46. Google Scholar
- Larson, D. A. 2001. Apparatus for controlling pipelined memory access requests. U.S. Patent 6,321,233.Google Scholar
- Mathew, B. 2000. Parallel vector access: A technique for improving memory system performance. M.S. thesis, University of Utah.Google Scholar
- Mathew, B., McKee, S. A., Carter, J. B., and Davis, A. 2000a. Algorithmic foundations for a parallel vector access memory system. In Proceedings of the 12th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA). 156--165. Google Scholar
- Mathew, B., McKee, S. A., Carter, J. B., and Davis, A. 2000b. Design of a parallel vector access unit for SDRAM memory systems. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture (HPCA-6). 39--48.Google Scholar
- McCalpin, J. D. 1995. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter.Google Scholar
- McGee, B. J. and Chau, J. B. 2005. Memory controller and method using read and write queues and an ordering queue for dispatching read and write memory requests out of order to reduce memory latency. U.S. Patent 6,877,077.Google Scholar
- McKee, S. A. 1993. Hardware support for dynamic access ordering: Performance of some design options. Tech. rep. CS-93-08, University of Virginia. Google Scholar
- McKee, S. A. 1995. Maximizing memory bandwidth for streamed computations. Ph.D. thesis, University of Virginia. Google Scholar
- McKee, S. A., Klenke, R. H., Wright, K. L., Wulf, W. A., Salinas, M. H., Aylor, J. H., and Batson, A. P. 1998. Smarter memory: Improving bandwidth for streamed references. IEEE Comput., 31, 54--63. Google Scholar
- McKee, S. A., Wulf, W. A., Aylor, J. H., Salinas, M. H., Klenke, R. H., Hong, S. I., and Weikle, D. A. B. 2000. Dynamic access ordering for streamed computations. IEEE Trans. Comput. 49, 11, 1255--1271. Google Scholar
- Micron. 2004. http://download.micron.com/pdf/datasheets/dram/ddr2/512MbDDR2.pdf.Google Scholar
- Moyer, S. A. 1993. Access ordering and effective memory bandwidth. Ph.D. thesis, University of Virginia. Google Scholar
- Peiron, M., Valero, M., Ayguade, E., and Lang, T. 1995. Vector multiprocessors with arbitrated memory access. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. 243--252. Google Scholar
- Raghavan, R. and Hayes, J. P. 1990. On randomly interleaved memories. In Proceedings of the 1990 ACM/IEEE Conference on Supercomputing. 49--58. Google Scholar
- Rau, B. R. 1991. Pseudo-randomly interleaved memory. In Proceedings of the 18th Annual International Symposium on Computer Architecture. 74--83. Google Scholar
- Rixner, S. 2004. Memory controller optimizations for web servers. In Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture. 355--366. Google Scholar
- Rixner, S., Dally, W. J., Kapasi, U. J., Mattson, P., and Owens, J. D. 2000. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture. 128--138. Google Scholar
- Sah, S., Kulick, S. S., Udompanyanan, V., Natarajan, C., and Pai, H. S. 2006. Memory read/write reordering. U.S. Patent 7,047,374.Google Scholar
- Scott, S. L. 1996. Synchronization and communication in the T3E multiprocessor. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems. 26--36. Google Scholar
- Sherwood, T., Perelman, E., and Calder, B. 2001. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Proceedings of the 10th International Conference on Parallel Architectures and Compilation Techniques. 3--14. Google Scholar
- Tendler, J. M., Dodson, J. S., Fields Jr., J. S., Lee, H., and Sinharoy, B. 2002. Power4 system microarchitecture. IBM J. Resear. Develop. 46, 1, 5--26. Google Scholar
- Valero, M., Lang, T., Llaber, J. M., Peiron, M., Ayguade, E., and Navarra, J. J. 1992. Increasing the number of strides for conflict-free vector access. In Proceedings of the 19th Annual International Symposium on Computer Architecture. 372--381. Google Scholar
Index Terms
Memory scheduling for modern microprocessors
Recommendations
Memory access schedule minimization for embedded systems
The growing gap between microprocessor speed and DRAM speed is a major problem that computer designers are facing. In order to narrow the gap, it is necessary to improve DRAM's speed and throughput. To achieve this goal, this paper proposes techniques ...
Refresh pausing in DRAM memory systems
Dynamic Random Access Memory (DRAM) cells rely on periodic refresh operations to maintain data integrity. As the capacity of DRAM memories has increased, so has the amount of time consumed in doing refresh. Refresh operations contend with read ...
Cache/Memory Coordinated Fair Scheduling for Hybrid Memory Systems
HP3C 2020: Proceedings of the 2020 4th International Conference on High Performance Compilation, Computing and CommunicationsHybrid memory systems comprising DRAM and Non-Volatile Memory (NVM) have gained ever-increasing attention for building large-capacity and energy-efficiency main memory. Nevertheless, there remain challenges to best utilize them because of the ...






Comments