Abstract
Multi-core processors, with low communication costs and high availability of execution cores, will increase the use of execution and compilation models that use short threads to expose parallelism. Current branch predictors seek to incorporate large amounts of control flow history to maximize accuracy. However, when that history is absent the predictor fails to work as intended. Thus, modern predictors are almost useless for threads below a certain length.
Using a Speculative Multithreaded (SpMT) architecture as an example of a system which generates shorter threads, this work examines techniques to improve branch prediction accuracy when a new thread begins to execute on a different core. This paper proposes a minor change to the branch predictor that gives virtually the same performance on short threads as an idealized predictor that incorporates unknowable pre-history of a spawned speculative thread. At the same time, strong performance on long threads is preserved. The proposed technique sets the global history register of the spawned thread to the initial value of the program counter. This novel and simple design reduces branch mispredicts by 29% and provides as much as a 13% IPC improvement on selected SPEC2000 benchmarks.
Supplemental Material
Available for Download
Supplemental material for Accurate branch prediction for short threads
- H. Akkary and M. A. Driscoll. A dynamic multithreading processor. In 31st International Symposium on Microarchitecture, pages 226--236, Nov. 1998. Google Scholar
Digital Library
- M. Annavaram, E. Grochowski, and J. Shen. Mitigating Amdahl's law through EPI throttling. In 32nd Annual International Symposium on Computer Architecture, pages 298--309, June 2005. Google Scholar
Digital Library
- P. Chaparro, J. Gonzalez, and A. Gonzalez. Thermal-aware clustered microarchitectures. In International Conference on Computer Design, pages 48--53, Oct. 2004. Google Scholar
Digital Library
- R. S. Chappell, J. Stark, S. P. Kim, S. K. Reinhardt, and Y. N. Patt. Simultaneous subordinate microthreading (SSMT). In 26th Annual International Symposium on Computer Architecture, pages 186--195, May 1999. Google Scholar
Digital Library
- I.-C. K. Chen, J. T. Coffey, and T. N. Mudge. Analysis of branch prediction via data compression. In 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 128--137, Oct. 1996. Google Scholar
Digital Library
- J. Chung, H. Chafi, C. Minh, A. McDonald, B. Carlstrom, C. Kozyrakis, and K. Olukotun. The common case transactional behavior of multithreaded programs. In Sixth International Symposium on High-Performance Computer Architecture, pages 266--277, Feb. 2006.Google Scholar
- M. Cintra, J. F. Martńez, and J. Torrellas. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In 27th Annual International Symposium on Computer Architecture, pages 13--24, June 2000. Google Scholar
Digital Library
- J. Collins, H. Wang, D. Tullsen, C. Hughes, Y.-F. Lee, D. Lavery, and J. Shen. Speculative precomputation: Long-range prefetching of delinquent loads. In 28th Annual International Symposium on Computer Architecture, July 2001. Google Scholar
Digital Library
- M. de Alba and D. Kaeli. Path-based hardware loop prediction. In 4th International Conference on Control, Virtual Instrumentation and Digital Systems, August 2002.Google Scholar
- J. B. Dennis and D. P. Misunas. A preliminary architecture for a basic data-flow processor. In 2th Annual International Symposium on Computer Architecture, pages 126--132, June 1975. Google Scholar
Digital Library
- A. N. Eden and T. Mudge. The YAGS branch prediction scheme. In 31st International Symposium on Microarchitecture, pages 69--77, Nov. 1998. Google Scholar
Digital Library
- J. Gummaraju and M. Franklin. Branch prediction in multi-threaded processors. In 9th International Conference on Parallel Architectures and Compilation Techniques, pages 179--188, Oct. 2000. Google Scholar
Digital Library
- L. Hammond, B. D. Carlstrom, V. Wong, B. Hertzberg, M. Chen, C. Kozyrakis, and K. Olukotun. Programming with transactional coherence and consistency (TCC). In 11th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 1--13, Oct. 2004. Google Scholar
Digital Library
- S. Hily and A. Seznec. Branch prediction and simultaneous multithreading. In Conference on Parallel Architectures and Compilation Techniques, page 169, Oct. 1996. Google Scholar
Digital Library
- H. H. J. Hum, O. Maquelin, K. B. Theobald, X. Tian, G. R. Gao, and L. J. Hendren. A study of the EARTH-MANNA multithreaded system. International Journal of Parallel Programming, 24(4):319--348, Feb 1996. Google Scholar
Digital Library
- D. A. Jimenez. Fast path-based neural branch prediction. In 36th International Symposium on Microarchitecture, page 243, Dec. 2003. Google Scholar
Digital Library
- D. A. Jimenez and C. Lin. Neural methods for dynamic branch prediction. ACM Transactions on Computer Systems, 20(4):369--397, Feb. 2002. Google Scholar
Digital Library
- R. E. Kessler. The alpha 21264 microprocessor. IEEE MICRO, 19(2):24--36, Mar. 1999. Google Scholar
Digital Library
- R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction. In 36th International Symposium on Microarchitecture, pages 81--92, Dec. 2003. Google Scholar
Digital Library
- R. Kumar, N. P. Jouppi, and D. M. Tullsen. Conjoined-core chip multiprocessing. In 37th International Symposium on Microarchitecture, pages 195--206, Dec. 2004. Google Scholar
Digital Library
- C. Kyriacou, P. Evripidou, and P. Trancoso. Data-driven multithreading using conventional microprocessors. IEEE Transactions on Parallel and Distributed Systems, 17(10):1176--1188, Oct. 2006. Google Scholar
Digital Library
- P. Marcuello. Speculative multithreaded processors, Ph. D. Thesis, Universitat Politecnica de Catalunya. 2003.Google Scholar
- P. Marcuello and A. Gonzalez. Thread-spawning schemes for speculative multithreading. In Second International Symposium on High-Performance Computer Architecture, page 55, Feb 2002. Google Scholar
Digital Library
- P. Marcuello, J. Tubella, and A. Gonzalez. Value prediction for speculative multithreaded architectures. In 32nd International Symposium on Microarchitecture, pages 230--236, Nov. 1999. Google Scholar
Digital Library
- P. Marcuelo and A. Gonzàlez. A quantitative assessment of thread-level speculation techniques. In 14th International Symposium on Parallel and Distributed Processing, page 595, May 2000. Google Scholar
Digital Library
- S. McFarling. Combining branch predictors. DEC WRL Technical Note TN-36, 1993.Google Scholar
- P. Michaud, A. Seznec, and R. Uhlig. Trading conflict and capacity aliasing in conditional branch predictors. In 24th Annual International Symposium on Computer Architecture, pages 292--303, June 1997. Google Scholar
Digital Library
- K. Olukotun, L. Hammond, and M. Willey. Improving the performance of speculatively parallel applications on the Hydra-CMP. In 13th International Conference on Supercomputing, pages 21--30, June 1999. Google Scholar
Digital Library
- G. D. Pizzol and P. O. A. Navaux. Branch prediction topologies for SMT architectures. In Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing, pages 118--125, June 2005. Google Scholar
Digital Library
- M. K. Prabhu and K. Olukotun. Exposing speculative thread parallelism in SPEC2000. In 10th Symposium on Principles and Practice of Parallel Programming, pages 142--152, June 2005. Google Scholar
Digital Library
- Z. Purser, K. Sundaramoorthy, and E. Rotenberg. A study of slipstream processors. In 33rd International Symposium on Microarchitecture, pages 269--280, Dec 2000. Google Scholar
Digital Library
- C. G. Quiñones, C. Madriles, J. Sanchez, P. Marcuello, A. Gonzalez, and D. M. Tullsen. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In Conference on Programming Language Design and Implementation, pages 269--279, June 2005. Google Scholar
Digital Library
- S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An architecture of a dataflow single chip processor. In 16th Annual International Symposium on Computer Architecture, pages 46--53, Apr 1989. Google Scholar
Digital Library
- A. Seznec. Analysis of the O-GEometric history length branch predictor. In 32nd Annual International Symposium on Computer Architecture, pages 394--405, 2005. Google Scholar
Digital Library
- A. Seznec. The L-TAGE branch predictor. In Journal of Instruction-Level Parallelism, vol. 9, May 2007.Google Scholar
- A. Seznec, S. Felix, V. Krishnan, and Y. Sazeides. Design tradeoffs for the alpha EV8 conditional branch predictor. In 29th Annual International Symposium on Computer Architecture, pages 295--306, June 2002. Google Scholar
Digital Library
- A. Seznec and P. Michaud. De-aliashed hybrid branch predictors. Technical Report RR-3618, Inria, Feb. 1999.Google Scholar
- T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In 10th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 45--57, Oct. 2002. Google Scholar
Digital Library
- J. E. Smith. A study of branch prediction strategies. In 25th Annual International Symposium on Computer Architecture, pages 202--215, June 1998. Google Scholar
Digital Library
- G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In 22nd Annual International Symposium on Computer Architecture, pages 414--425, June 1995. Google Scholar
Digital Library
- E. Sprangle, R. S. Chappell, M. Alsup, and Y. N. Patt. The agree predictor: a mechanism for reducing negative branch history interference. In 24th Annual International Symposium on Computer Architecture, pages 284--291, June 1997. Google Scholar
Digital Library
- S. T. Srinivasan, H. Akkary, T. Holman, and K. Lai. A minimal dual-core speculative multi-threading architecture. In International Conference on Computer Design, pages 360--367, Oct. 2004. Google Scholar
Digital Library
- J. Steffan and T. Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In 4th International Symposium on High-Performance Computer Architecture, page 2, Jan. 1998. Google Scholar
Digital Library
- J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. In 27th Annual International Symposium on Computer Architecture, pages 1--12, June 2000. Google Scholar
Digital Library
- J.-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P.-C. Yew. The superthreaded processor architecture. IEEE Transactions on Computers, 48(9):881--902, Sep. 1999. Google Scholar
Digital Library
- D. Tullsen. Simulation and modeling of a simultaneous multithreading processor. In In 22nd Annual Computer Measurement Group Conference, December 1996.Google Scholar
- T. Vijaykumar, S. Gopal, J. Smith, and G. Sohi. Speculative versioning cache. IEEE Transactions on Parallel and Distributed Systems, 12(12):1305--1317, Dec. 2001. Google Scholar
Digital Library
- C. von Praun, L. Ceze, and C. Cascaval. Implicit parallelism with ordered transactions. In 12th Symposium on Principles and Practice of Parallel Programming, pages 79--89, Sep 2007. Google Scholar
Digital Library
- T.-Y. Yeh and Y. N. Patt. A comparison of dynamic branch predictors that use two levels of branch history. In 20th Annual International Symposium on Computer Architecture, pages 257--266, May 1993. Google Scholar
Digital Library
- W. Zhang, B. Calder, and D. M. Tullsen. An event-driven multithreaded dynamic optimization framework. In 14th International Conference on Parallel Architectures and Compilation Techniques, pages 87--98, Sep. 2005. Google Scholar
Digital Library
- W. Zhang, B. Calder, and D. M. Tullsen. A self-repairing prefetcher in an event-driven dynamic optimization framework. In International Symposium on Code Generation and Optimization, pages 50--64, March 2006. Google Scholar
Digital Library
- C. Zilles and G. Sohi. Execution-based prediction using speculative slices. SIGARCH Computer Architecture News, 29(2):2--13, June 2001. Google Scholar
Digital Library
Index Terms
Accurate branch prediction for short threads
Recommendations
Accurate branch prediction for short threads
ASPLOS '08Multi-core processors, with low communication costs and high availability of execution cores, will increase the use of execution and compilation models that use short threads to expose parallelism. Current branch predictors seek to incorporate large ...
Accurate branch prediction for short threads
ASPLOS '08Multi-core processors, with low communication costs and high availability of execution cores, will increase the use of execution and compilation models that use short threads to expose parallelism. Current branch predictors seek to incorporate large ...
Accurate branch prediction for short threads
ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systemsMulti-core processors, with low communication costs and high availability of execution cores, will increase the use of execution and compilation models that use short threads to expose parallelism. Current branch predictors seek to incorporate large ...







Comments