ABSTRACT
Many hardware optimizations rely on collecting information about program behavior at runtime. This information is stored in lookup tables. To be accurate and effective, these optimizations usually require large dedicated on-chip tables. Although technology advances offer an increased amount of on-chip resources, these resources are allocated to increase the size of on-chip conventional cache hierarchies.
This work proposes Predictor Virtualization, a technique that uses the existing memory hierarchy to emulate large predictor tables. We demonstrate the benefits of this technique by virtualizing a state-of-the-art data prefetcher. Full-system, cycle-accurate simulations demonstrate that the virtualized prefetcher preserves the performance benefits of the original design, while reducing the on-chip storage dedicated to the predictor table from 60KB down to less than one kilobyte.
Supplemental Material
Available for Download
Supplemental material for Predictor virtualization
- Almog, Y., Rosner, R., Schwartz, N., and Schmorak, A. Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture. In Proc. of the Intl' Symposium on Code Generation and Optimization, 2004. Google Scholar
Digital Library
- Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. Xen and the art of virtualization. In Proc. of the 19th Symposium on Operating Systems Principles, 2003. Google Scholar
Digital Library
- Barroso, L. A., Gharachorloo, K., McNamara, R., Nowatzyk, A., Qadeer, S., Sano, B., Smith, S., Stets, R., and Verghese, B. Piranha: a scalable architecture based on single-chipu multiprocessing. In Proc. Intl' Symposium on Computer Architecture, 2000. Google Scholar
Digital Library
- Cantin, J. F., Lipasti, M. H., and Smith, J. E. Stealth prefetching. In Proc. of the 12th Intl' Conference on Architectural Support For Programming Languages and Operating Systems, 2006. Google Scholar
Digital Library
- Chaiken, D., Kubiatowicz, J., and Agarwal, A. LimitLESS directories: A scalable cache coherence scheme. In Proc. of the Intl' Conference on Architectural Support For Programming Languages and Operating Systems, 1991. Google Scholar
Digital Library
- Clark, C., Fraser, K., Hand, S., Hansen, J. G., Jul, E., Limpach, C., Pratt, I., and Warfield, A. Live migration of virtual machines. In Proc. of the 2nd Symposium on Networked Systems Design & Implementation, 2005. Google Scholar
Digital Library
- Cooksey, R., Jourdan, S., and Grunwald, D. A stateless, content-directed data prefetching mechanism. In Proc. of the 10th Intl' Conference on Architectural Support For Programming Languages and Operating Systems, 2002. Google Scholar
Digital Library
- Collins, J., Sair, S., Calder, B., and Tullsen, D. M. Pointer cache assisted prefetching. In Proc. of the 35th Intl' Symposium on Microarchitecture, 2002. Google Scholar
Digital Library
- Ekman, M., and Stenström, P. Enhancing multiprocessor architecture simulation speed using matched-pair comparison. Proc. Intl' Symp. on the Performance Analysis of Systems and Software, 2005. Google Scholar
Digital Library
- Ferdman, M., and Falsafi, B. Last-Touch Correlated Data Streaming. In Proc. of the Intl' Symposium on Performance Analysis of Systems and Software, 2007.Google Scholar
Cross Ref
- Gniady, C. and Falsafi, B. Speculative sequential consistency with little custom storage. In Proc. of the Intl' Conference on Parallel Architectures and Compilation Techniques, 2002. Google Scholar
Digital Library
- Hardavellas, N., Somogyi, S., Wenisch, T. F., Wunderlich, R. E., Chen, S., Kim, J., Falsafi, B, Hoe, J. C., and Nowatzyk, A. G. SimFlex: A fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture. SIGMETRICS Performance Evaluation Review, 2004. Google Scholar
Digital Library
- Hu, Z., Martonosi, M., and Kaxiras, S. Timekeeping in the Memory System: Predicting and Optimizing Memory Behavior. In Proc.of the 29th Intl' Symposium on Computer Architecture, 2002. Google Scholar
Digital Library
- Jerger, N., Hill, E., and Lipasti, M. Friendly Fire: Understanding the Effects of Multiprocessor Prefetching. In Proc. of the International Symposium on Performance Analysis of Systems and Software, 2006.Google Scholar
- Keltcher, C.N., McGrath, K.J., Ahmed, A., Conway, P. The AMD Opteron processor for multiprocessor servers. IEEE Micro, 23(2): 66--76, 2003. Google Scholar
Digital Library
- Lipasti, M. H. and Shen, J. P. Exceeding the dataflow limit via value prediction. In Proc. of the 29th Intl' Symposium on Microarchitecture, pages 226--237, 1996. Google Scholar
Digital Library
- Lipasti, M. H., Wilkerson, C. B., and Shen, J. P. Value locality and load value prediction. In Proc. of the Seventh Intl' Conference on Architectural Support For Programming Languages and Operating Systems, 1996. Google Scholar
Digital Library
- Nesbit, K. J., and Smith, J. E. Data Cache Prefetching Using a Global History Buffer. In the Proc. of the 10th Intl' Symposium on High Performance Computer Architecture, 2004. Google Scholar
Digital Library
- Patel, S.J., and Lumetta, S.S. rePLay: A hardware framework for dynamic optimization. Transactions on Computers, 50(6): 590--608, 2001. Google Scholar
Digital Library
- Qureshi, M.K., Lynch, D.N., Mutlu, O., Patt, Y. N., A Case for MLP-Aware Cache Replacement, In Proc. of the 33rd Intl' Symposium on Computer Architecture, 2006. Google Scholar
Digital Library
- Rajwar, R., Herlihy, M., and Lai, K. Virtualizing Transactional Memory. In Proc. of the 32nd Intl' Symposium on Computer Architecture, 2005. Google Scholar
Digital Library
- Ranganathan, P., Adve, S., and Jouppi, N. P. Reconfigurable caches and their application to media processing. In Proc. of the 27th Intl' Symposium on Computer Architecture 2000. Google Scholar
Digital Library
- Rosner, R., Almog, Y., Moffie, M., Schwartz, N., and Mendelson, A. Power awareness through selective dynamically optimized traces. In Proc. of the 31th Intl' Symposium on Computer Architecture, 2004. Google Scholar
Digital Library
- Sazeides, Y. and Smith, J. E.The predictability of data values. In Proc. of the 30th Intl' Symposium on Microarchitecture, 1997 Google Scholar
Digital Library
- Sherwood, T., Sair, S., and Calder, B. Predictor-directed stream buffers. In Proc. of the 33rd Intl' Symposium on Microarchitecture, 2000 Google Scholar
Digital Library
- Sodani, A. and Sohi, G. S. Dynamic instruction reuse. In Proc. of the 24th Intl' Symposium on Computer Architecture, 1997 Google Scholar
Digital Library
- Somogyi, S., Wenisch, T. F., Ailamaki, A., Falsafi, B., Moshovos, A. Spatial Memory Streaming. In Proc. Intl' Symposium on Computer Architecture, 2006. Google Scholar
Digital Library
- Tendler, J., Dodson, S., and Fields, S. IBM eServer Power4 System Microarchitecture, Technical White Paper, IBM Server Group, 2001Google Scholar
- VMWare -- http://www.vmware.comGoogle Scholar
- Wang, K. and Franklin, M. Highly accurate data value prediction using hybrid predictors. In the Proc. of the 30th Intl' Symposium on Microarchitecture, 1997. Google Scholar
Digital Library
- Wang, Z., Burger, D., McKinley, K. S., Reinhardt, S. K., and Weems, C. C. Guided region prefetching: a cooperative hardware/software approach. In Proc. of the 30th Intl' Symposium on Computer Architecture, 2003 Google Scholar
Digital Library
- Wenisch, T. F., Somogyi, S., Hardavellas, N., Kim, J., Ailamaki, A., and Falsafi, B. Temporal Streaming of Shared Memory. In Proc. of the 32nd Intl' Symposium on Computer Architecture, 2005. Google Scholar
Digital Library
- Wenisch, T.F., Wunderlich, R. E., Ferdman, M., Ailamaki, A., Falsafi, B., and Hoe, J. C. SimFlex: statistical sampling of computer system simuation. IEEE Micro, 26(4): 18--31, 2006. Google Scholar
Digital Library
- Wunderlich, R. E., Wenisch, T. F., Falsafi, B., Hoe, J. C. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proc. of the 30th Intl' Symposium on Computer Architecture, 2003. Google Scholar
Digital Library
- Zhang, W., Calder, B., and Tullsen, D. M. An Event-Driven Multithreaded Dynamic Optimization Framework. In Proc. of the 14th Intl' Conference on Parallel Architectures and Compilation Techniques, 2005. Google Scholar
Digital Library
Index Terms
Predictor virtualization
Recommendations
Predictor virtualization
ASPLOS '08Many hardware optimizations rely on collecting information about program behavior at runtime. This information is stored in lookup tables. To be accurate and effective, these optimizations usually require large dedicated on-chip tables. Although ...
Predictor virtualization
ASPLOS '08Many hardware optimizations rely on collecting information about program behavior at runtime. This information is stored in lookup tables. To be accurate and effective, these optimizations usually require large dedicated on-chip tables. Although ...
Predictor virtualization
ASPLOS '08Many hardware optimizations rely on collecting information about program behavior at runtime. This information is stored in lookup tables. To be accurate and effective, these optimizations usually require large dedicated on-chip tables. Although ...









Comments