skip to main content
research-article

Profile-guided deployment of stream programs on multicores

Published:12 June 2012Publication History
Skip Abstract Section

Abstract

Because multicore architectures have become the industry standard, programming abstractions for concurrent programming are of key importance. Stream programming languages facilitate application domains characterized by regular sequences of data, such as multimedia, graphics, signal processing and networking. With stream programs, computations are expressed through independent actors that interact through FIFO data channels. A major challenge with stream programs is to load-balance actors among available processing cores. The workload of a stream program is determined by actor execution times and the communication overhead induced by data channels. Estimating communication costs on cache-coherent shared-memory multiprocessors is difficult, because data movements are abstracted away by the cache coherence protocol. Standard execution time profiling techniques cannot separate actor execution times from communication costs, because communication costs manifest in terms of execution time overhead.

In this work we present a unified Integer Linear Programming (ILP) formulation that balances the workload of stream programs on cache-coherent multicore architectures. For estimating the communication costs of data channels, we devise a novel profiling scheme that minimizes the number of profiling steps. We conduct experiments across a range of StreamIt benchmarks and show that our method achieves a speedup of up to 4.02x on 6 processors. The number of profiling steps is on average only 17% of an exhaustive profiling run over all data channels of a stream program.

References

  1. S. Amarasinghe. Streamit. http://groups.csail.mit.edu, 2010.Google ScholarGoogle Scholar
  2. J. Backus. Can Programming be Liberated from the von Neumann style? A Functional Style and its Algebra of Programs. ACM Turing Award Lectures, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. S. Battacharyya, E. A. Lee, and P. K. Murthy. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Rowland L. Brooks. On colouring the nodes of a network. Mathematical Proceedings of the Cambridge Philosophical Society, 37:194--197, 1941.Google ScholarGoogle ScholarCross RefCross Ref
  5. R. E. Bryant and D. R. O'Halloran. Computer Systems: A Programmer's Perspective. Prentice-Hall, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. Graph., 23(3):777--786, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. M. Carpenter, A. Ramirez, and E. Ayguade. Mapping stream programs onto heterogeneous multiprocessor systems. In CASES '09: Proceedings of the 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 57--66. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. K. Chen, X. F. Li, R. Lian, J. H. Lin, L. Liu, T. Liu, and R. Ju. Shangri-la: Achieving high performance from compiled network applications while enabling ease of programming. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. B. Dennis. First version of a data flow procedure language. In Programming Symposium, Proceedings Colloque sur la Programmation, pages 362--376. Springer-Verlag, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. M. Farhad, Yousun Ko, Bernd Burgstaller, and Bernhard Scholz. Orchestration by approximation: mapping stream programs onto multicore architectures. In Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, ASPLOS '11, pages 357--368, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Fourer, D. M. Gay, and B. W. Kernighan. AMPL: A Modeling Language for Mathematical Programming. Thomson/Brooks/Cole, 2nd edition, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS '06: Proceedings of the 2006 International Conference on Architectural Support for Programming Languages and Operating Systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Gummaraju and M. Rosenblum. Stream programming on general-purpose processors. In MICRO 38: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, pages 343--354. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. P. Hofstee. Power efficient processor architecture and the Cell processor. In HPCA '05: Proceedings of the 2005 International Symposium on High-Performance Computer Architecture, volume 0, pages 258--262. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. IBM. Cplex. http://www.ibm.com, 2011.Google ScholarGoogle Scholar
  16. Intel Corporation. Intel 64 and IA-32 Architectures Software Developer Manuals, Vol. 3B, December 2011.Google ScholarGoogle Scholar
  17. Intel Corporation. Intel 64 and IA-32 Architectures Software Developer Manuals, retrieved Jan. 2012.Google ScholarGoogle Scholar
  18. M. Karczmarek, W. Thies, and S. Amarasinghe. Phased scheduling of stream programs. LCTES '03: Proceedings of the 2003 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, 38(7):1235--1245, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In PLDI '08: Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers, 36:24--35, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. A. Lee and D. G. Messerschmitt. Synchronous data flow. Proceedings of the IEEE, 75(9):1235--1245, 1987.Google ScholarGoogle ScholarCross RefCross Ref
  22. LIKWID tool website. http://code.google.com/p/likwid, retrieved Jan. 2012.Google ScholarGoogle Scholar
  23. W. R. Mark, R. Steven G., K. Akeley, and M. J. Kilgard. Cg: a system for programming ghardware in a C-like language. In SIGGRAPH '03: Proceedings of the 2003 Conference on Special Interest Group on GRAPHics and Interactive Techniques. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. David W. Matula and Leland L. Beck. Smallest-last ordering and clustering and graph coloring algorithms. J. ACM, 30:417--427, July 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. W. Michael, M. Taylor, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, S. Devabhaktuni, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: The raw machine. IEEE Computer, 30:86--93, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Robert. A survey of stream processing. Acta Informatica, 34(7):491--541, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. H. Spring, J. Privat, R. Guerraoui, and J. Vitek. StreamFlex: High-throughput stream programming in Java. OOPSLA '07: Proceedings of the 2007 ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications, 42(10), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. Thies. Language and Compiler Support for Stream Programs. PhD thesis, Massachusetts Institute of Technology, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. W. Thies and S. Amarasinghe. An Empirical Characterization of Stream Programs and its Implications for Language and Compiler Design. In PACT '10 Proceedings of the 2010 Conference on Parallel Architectures and Compilation Techniques. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. W. Thies, M. Karczmarek, and S. P. Amarasinghe. Streamit: A language for streaming applications. In CC '02: Proceedings of the 11th International Conference on Compiler Construction, pages 179--196, London, UK, 2002. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Treibig, G. Hager, and G. Wellein. Likwid: A lightweight performance-oriented tool suite for x86 multicore environments. In Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego CA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil. Software pipelined execution of stream programs on GPUs. In CGO '09: Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil. Synergistic execution of stream programs on multicores with accelerators. LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, 44(7), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. V. V. Vazirani. Approximation Algorithms. Springer-Verlag, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Wei, J. Yu, H. Yu, and G. R. Gao. Minimizing communication in rate-optimal software pipelining for stream programs. In CGO '10: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 210--217. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Haitao Wei, Junqing Yu, Huafei Yu, and Guang R. Gao. Minimizing communication in rate-optimal software pipelining for stream programs. In Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, CGO '10, pages 210--217, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Zhang, Q. J. Li, R. Rabbah, and S. Amarasinghe. A lightweight streaming layer for multicore execution. SIGARCH Comput. Archit. News, 36(2):18--27, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Zhang, Z. Li, H. Song, and L. Liu. A programming model for an embedded media processing architecture. In SAMOS '05: Proceedings of the 2005 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. Springer LNCS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Profile-guided deployment of stream programs on multicores

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 47, Issue 5
      LCTES '12
      MAY 2012
      152 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2345141
      Issue’s Table of Contents
      • cover image ACM Conferences
        LCTES '12: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
        June 2012
        153 pages
        ISBN:9781450312127
        DOI:10.1145/2248418

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 June 2012

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!