skip to main content
article

A machine learning approach to mapping streaming workloads to dynamic multicore processors

Published:13 June 2016Publication History
Skip Abstract Section

Abstract

Dataflow programming languages facilitate the design of data intensive programs such as streaming applications commonly found in embedded systems. They also expose parallelism that can be exploited using multicore processors which are now part of the mobile landscape. In recent years a shift has occurred towards heterogeneity ( ARM big.LITTLE) and reconfigurability. Dynamic Multicore Processors (DMPs) bridge the gap between fully reconfigurable processors and homogeneous multicore systems. They can re-allocate their resources at runtime to create larger more powerful logical processors fine-tuned to the workload. Unfortunately, there exists no accurate method to determine how to partition the cores in a DMP among application threads. Often programmers rely on analyzing the application manually and using a set of hand picked heuristics. This leads to sub-optimal performance, reducing the potential of DMPs. What is needed is a way to determine the optimal partitioning and grouping of resources to maximize performance. As a first step, this paper studies the effect of thread partitioning and hardware resource allocation on a set of StreamIt applications. We show that the resulting space is not trivial and exhibits a large performance variation depending on the combination of parameters. We introduce a machine-learning based methodology to tackle the space complexity. Our machine-learning model is able to directly predict the best combination of parameters using static code features. The predicted set of parameters leads to performance on-par with the best performance found in a space of more than 32,000 configurations per application.

References

  1. J. Auerbach, D. Bacon, I. Burcea, P. Cheng, S. Fink, R. Rabbah, and S. Shukla. A compiler and runtime for heterogeneous computing. In DAC, 2012, pages 271–276, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, M. Reif, L. Bao, J. Brown, M. Mattina, C.-C. Miao, C. Ramey, D. Wentzlaff, W. Anderson, E. Berger, N. Fairbanks, D. Khan, F. Montenegro, J. Stickney, and J. Zook. Tile64 - processor: A 64-core soc with mesh interconnect. In ISSCC 2008. IEEE International, pages 88–598, Feb 2008.Google ScholarGoogle ScholarCross RefCross Ref
  3. F. Bower, D. Sorin, and L. Cox. The impact of dynamically heterogeneous multicore processors on thread scheduling. Micro, IEEE, 28(3): 17–25, May 2008. ISSN 0272-1732.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for gpus: Stream computing on graphics hardware. In ACM SIGGRAPH 2004, pages 777–786, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. M. Carpenter, A. Ramirez, and E. Ayguade. Mapping stream programs onto heterogeneous multiprocessor systems. In CASES ’09, pages 57–66, New York, NY, USA, 2009. ACM. Google ScholarGoogle Scholar
  6. J. Chen, M. I. Gordon, W. Thies, M. Zwicker, K. Pulli, and F. Durand. A reconfigurable architecture for load-balanced rendering. In HWWS ’05, pages 71–80, New York, NY, USA, 2005. ACM. Google ScholarGoogle Scholar
  7. S. Eyerman and L. Eeckhout. Modeling critical sections in amdahl’s law and its implications for multicore design. SIGARCH Comput. Archit. News, 38(3):362–370, June 2010.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. M. Farhad, Y. Ko, B. Burgstaller, and B. Scholz. Profile-guided deployment of stream programs on multicores. LCTES ’12, pages 79–88, New York, NY, USA, 2012. ACM.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. I. Gordon, W. Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A stream compiler for communication-exposed architectures. SIGARCH Comput. Archit. News, 30(5):291–303, Oct. 2002. ISSN 0163-5964.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Govindan, B. Robatmili, D. Li, B. Maher, A. Smith, S. W. Keckler, and D. Burger. Scaling power and performance via processor composability. IEEE Transactions on Computers, 63(8):2025–2038, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. P. Gulati, C. Kim, S. Sethumadhavan, S. W. Keckler, and D. Burger. Multitasking workload scheduling on flexible core chip multiprocessors. SIGARCH Comput. Archit. News, 36(2):46–55, May 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez. Core fusion: Accommodating software diversity in chip multiprocessors. SIGARCH Comput. Archit. News, 35(2):186–197, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Kim, S. Sethumadhavan, M. S. Govindan, N. Ranganathan, D. Gulati, D. Burger, and S. W. Keckler. Composable lightweight processors. In MICRO ’07, pages 381–394, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle Scholar
  14. M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. SIGPLAN Not., 43(6):114–124, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. R. Newton, L. D. Girod, M. B. Craig, S. R. Madden, and J. G. Morrisett. Design and evaluation of a compiler for embedded stream programs. In LCTES ’08, pages 131–140, New York, NY, USA, 2008. ACM. Google ScholarGoogle Scholar
  16. U. of Edinburgh. Edinburgh compute and data facility web site, 1 August 2007, accessed 4th of April. 2016. www.ecdf.ed.ac.uk.Google ScholarGoogle Scholar
  17. P. Santos, G. Nazar, F. Anjam, S. Wong, D. Matos, and L. Carro. A fully dynamic reconfigurable noc-based mpsoc: The advantages of total reconfiguration. In HiPEAC ’13, Berlin, Germany, January 2013.Google ScholarGoogle Scholar
  18. M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. SIGPLAN Not., 44(3):253–264, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Thies and S. Amarasinghe. An empirical characterization of stream programs and its implications for language and compiler design. In PACT ’10, pages 365–376, New York, NY, USA, 2010. ACM. Google ScholarGoogle Scholar
  20. W. Thies, M. Karczmarek, and S. P. Amarasinghe. Streamit: A language for streaming applications. In CC, pages 179–196, London, UK, UK, 2002. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. W. Vuduc. Automatic Performance Tuning of Sparse Matrix Kernels. PhD thesis, 2003. AAI3121741. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: Raw machines. Computer, 30 (9):86–93, Sep 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Z. Wang and M. F. P. O’boyle. Using machine learning to partition streaming programs. ACM Trans. Archit. Code Optim., 10(3):20:1– 20:25, Sept. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Watanabe, J. D. Davis, and D. A. Wood. Widget: Wisconsin decoupled grid execution tiles. SIGARCH Comput. Archit. News, 38 (3):2–13, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. M. Wells, K. Chakraborty, and G. S. Sohi. Dynamic heterogeneity and the need for multicore virtualization. SIGOPS Oper. Syst. Rev., 43 (2):5–14, Apr. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Zhou and D. Wentzlaff. The sharing architecture: Sub-core configurability for iaas clouds. SIGPLAN Not., 49(4):559–574, Feb. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A machine learning approach to mapping streaming workloads to dynamic multicore processors

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 51, Issue 5
        LCTES '16
        May 2016
        122 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2980930
        • Editor:
        • Andy Gill
        Issue’s Table of Contents
        • cover image ACM Conferences
          LCTES 2016: Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded Systems
          June 2016
          122 pages
          ISBN:9781450343169
          DOI:10.1145/2907950

        Copyright © 2016 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 June 2016

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!