skip to main content
research-article

Accurate branch prediction for short threads

Published:01 March 2008Publication History
Skip Abstract Section

Abstract

Multi-core processors, with low communication costs and high availability of execution cores, will increase the use of execution and compilation models that use short threads to expose parallelism. Current branch predictors seek to incorporate large amounts of control flow history to maximize accuracy. However, when that history is absent the predictor fails to work as intended. Thus, modern predictors are almost useless for threads below a certain length.

Using a Speculative Multithreaded (SpMT) architecture as an example of a system which generates shorter threads, this work examines techniques to improve branch prediction accuracy when a new thread begins to execute on a different core. This paper proposes a minor change to the branch predictor that gives virtually the same performance on short threads as an idealized predictor that incorporates unknowable pre-history of a spawned speculative thread. At the same time, strong performance on long threads is preserved. The proposed technique sets the global history register of the spawned thread to the initial value of the program counter. This novel and simple design reduces branch mispredicts by 29% and provides as much as a 13% IPC improvement on selected SPEC2000 benchmarks.

Skip Supplemental Material Section

Supplemental Material

Video

References

  1. H. Akkary and M. A. Driscoll. A dynamic multithreading processor. In 31st International Symposium on Microarchitecture, pages 226--236, Nov. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Annavaram, E. Grochowski, and J. Shen. Mitigating Amdahl's law through EPI throttling. In 32nd Annual International Symposium on Computer Architecture, pages 298--309, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Chaparro, J. Gonzalez, and A. Gonzalez. Thermal-aware clustered microarchitectures. In International Conference on Computer Design, pages 48--53, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. S. Chappell, J. Stark, S. P. Kim, S. K. Reinhardt, and Y. N. Patt. Simultaneous subordinate microthreading (SSMT). In 26th Annual International Symposium on Computer Architecture, pages 186--195, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I.-C. K. Chen, J. T. Coffey, and T. N. Mudge. Analysis of branch prediction via data compression. In 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 128--137, Oct. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Chung, H. Chafi, C. Minh, A. McDonald, B. Carlstrom, C. Kozyrakis, and K. Olukotun. The common case transactional behavior of multithreaded programs. In Sixth International Symposium on High-Performance Computer Architecture, pages 266--277, Feb. 2006.Google ScholarGoogle Scholar
  7. M. Cintra, J. F. Martńez, and J. Torrellas. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In 27th Annual International Symposium on Computer Architecture, pages 13--24, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Collins, H. Wang, D. Tullsen, C. Hughes, Y.-F. Lee, D. Lavery, and J. Shen. Speculative precomputation: Long-range prefetching of delinquent loads. In 28th Annual International Symposium on Computer Architecture, July 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. de Alba and D. Kaeli. Path-based hardware loop prediction. In 4th International Conference on Control, Virtual Instrumentation and Digital Systems, August 2002.Google ScholarGoogle Scholar
  10. J. B. Dennis and D. P. Misunas. A preliminary architecture for a basic data-flow processor. In 2th Annual International Symposium on Computer Architecture, pages 126--132, June 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. N. Eden and T. Mudge. The YAGS branch prediction scheme. In 31st International Symposium on Microarchitecture, pages 69--77, Nov. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Gummaraju and M. Franklin. Branch prediction in multi-threaded processors. In 9th International Conference on Parallel Architectures and Compilation Techniques, pages 179--188, Oct. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Hammond, B. D. Carlstrom, V. Wong, B. Hertzberg, M. Chen, C. Kozyrakis, and K. Olukotun. Programming with transactional coherence and consistency (TCC). In 11th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 1--13, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Hily and A. Seznec. Branch prediction and simultaneous multithreading. In Conference on Parallel Architectures and Compilation Techniques, page 169, Oct. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. H. J. Hum, O. Maquelin, K. B. Theobald, X. Tian, G. R. Gao, and L. J. Hendren. A study of the EARTH-MANNA multithreaded system. International Journal of Parallel Programming, 24(4):319--348, Feb 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. A. Jimenez. Fast path-based neural branch prediction. In 36th International Symposium on Microarchitecture, page 243, Dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. A. Jimenez and C. Lin. Neural methods for dynamic branch prediction. ACM Transactions on Computer Systems, 20(4):369--397, Feb. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. E. Kessler. The alpha 21264 microprocessor. IEEE MICRO, 19(2):24--36, Mar. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction. In 36th International Symposium on Microarchitecture, pages 81--92, Dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Kumar, N. P. Jouppi, and D. M. Tullsen. Conjoined-core chip multiprocessing. In 37th International Symposium on Microarchitecture, pages 195--206, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Kyriacou, P. Evripidou, and P. Trancoso. Data-driven multithreading using conventional microprocessors. IEEE Transactions on Parallel and Distributed Systems, 17(10):1176--1188, Oct. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Marcuello. Speculative multithreaded processors, Ph. D. Thesis, Universitat Politecnica de Catalunya. 2003.Google ScholarGoogle Scholar
  23. P. Marcuello and A. Gonzalez. Thread-spawning schemes for speculative multithreading. In Second International Symposium on High-Performance Computer Architecture, page 55, Feb 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Marcuello, J. Tubella, and A. Gonzalez. Value prediction for speculative multithreaded architectures. In 32nd International Symposium on Microarchitecture, pages 230--236, Nov. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Marcuelo and A. Gonzàlez. A quantitative assessment of thread-level speculation techniques. In 14th International Symposium on Parallel and Distributed Processing, page 595, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. McFarling. Combining branch predictors. DEC WRL Technical Note TN-36, 1993.Google ScholarGoogle Scholar
  27. P. Michaud, A. Seznec, and R. Uhlig. Trading conflict and capacity aliasing in conditional branch predictors. In 24th Annual International Symposium on Computer Architecture, pages 292--303, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. Olukotun, L. Hammond, and M. Willey. Improving the performance of speculatively parallel applications on the Hydra-CMP. In 13th International Conference on Supercomputing, pages 21--30, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. D. Pizzol and P. O. A. Navaux. Branch prediction topologies for SMT architectures. In Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing, pages 118--125, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. K. Prabhu and K. Olukotun. Exposing speculative thread parallelism in SPEC2000. In 10th Symposium on Principles and Practice of Parallel Programming, pages 142--152, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Z. Purser, K. Sundaramoorthy, and E. Rotenberg. A study of slipstream processors. In 33rd International Symposium on Microarchitecture, pages 269--280, Dec 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. G. Quiñones, C. Madriles, J. Sanchez, P. Marcuello, A. Gonzalez, and D. M. Tullsen. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In Conference on Programming Language Design and Implementation, pages 269--279, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An architecture of a dataflow single chip processor. In 16th Annual International Symposium on Computer Architecture, pages 46--53, Apr 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Seznec. Analysis of the O-GEometric history length branch predictor. In 32nd Annual International Symposium on Computer Architecture, pages 394--405, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Seznec. The L-TAGE branch predictor. In Journal of Instruction-Level Parallelism, vol. 9, May 2007.Google ScholarGoogle Scholar
  36. A. Seznec, S. Felix, V. Krishnan, and Y. Sazeides. Design tradeoffs for the alpha EV8 conditional branch predictor. In 29th Annual International Symposium on Computer Architecture, pages 295--306, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Seznec and P. Michaud. De-aliashed hybrid branch predictors. Technical Report RR-3618, Inria, Feb. 1999.Google ScholarGoogle Scholar
  38. T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In 10th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 45--57, Oct. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. E. Smith. A study of branch prediction strategies. In 25th Annual International Symposium on Computer Architecture, pages 202--215, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In 22nd Annual International Symposium on Computer Architecture, pages 414--425, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. E. Sprangle, R. S. Chappell, M. Alsup, and Y. N. Patt. The agree predictor: a mechanism for reducing negative branch history interference. In 24th Annual International Symposium on Computer Architecture, pages 284--291, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. T. Srinivasan, H. Akkary, T. Holman, and K. Lai. A minimal dual-core speculative multi-threading architecture. In International Conference on Computer Design, pages 360--367, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Steffan and T. Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In 4th International Symposium on High-Performance Computer Architecture, page 2, Jan. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. In 27th Annual International Symposium on Computer Architecture, pages 1--12, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J.-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P.-C. Yew. The superthreaded processor architecture. IEEE Transactions on Computers, 48(9):881--902, Sep. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. D. Tullsen. Simulation and modeling of a simultaneous multithreading processor. In In 22nd Annual Computer Measurement Group Conference, December 1996.Google ScholarGoogle Scholar
  47. T. Vijaykumar, S. Gopal, J. Smith, and G. Sohi. Speculative versioning cache. IEEE Transactions on Parallel and Distributed Systems, 12(12):1305--1317, Dec. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. C. von Praun, L. Ceze, and C. Cascaval. Implicit parallelism with ordered transactions. In 12th Symposium on Principles and Practice of Parallel Programming, pages 79--89, Sep 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. T.-Y. Yeh and Y. N. Patt. A comparison of dynamic branch predictors that use two levels of branch history. In 20th Annual International Symposium on Computer Architecture, pages 257--266, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. W. Zhang, B. Calder, and D. M. Tullsen. An event-driven multithreaded dynamic optimization framework. In 14th International Conference on Parallel Architectures and Compilation Techniques, pages 87--98, Sep. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. W. Zhang, B. Calder, and D. M. Tullsen. A self-repairing prefetcher in an event-driven dynamic optimization framework. In International Symposium on Code Generation and Optimization, pages 50--64, March 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. C. Zilles and G. Sohi. Execution-based prediction using speculative slices. SIGARCH Computer Architecture News, 29(2):2--13, June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Accurate branch prediction for short threads

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 36, Issue 1
      ASPLOS '08
      March 2008
      339 pages
      ISSN:0163-5964
      DOI:10.1145/1353534
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
        March 2008
        352 pages
        ISBN:9781595939586
        DOI:10.1145/1346281

      Copyright © 2008 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 March 2008

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!