skip to main content
research-article

Throughput-driven synthesis of embedded software for pipelined execution on multicore architectures

Published:09 February 2009Publication History
Skip Abstract Section

Abstract

We present a methodology for pipelined software synthesis of streaming applications. First, we develop a versatile task assignment algorithm capable of optimizing realistically-arbitrary cost functions for two cores. The algorithm is exact (i.e., theoretically optimal) contrary to existing heuristics. Second, our approximation technique provides an adjustable knob to trade solution quality with algorithm runtime and memory. Third, we develop a recursive heuristic for more cores. FPGA-based emulated experiments validate our theoretical results. The exact algorithm yields 1.7 × throughput improvement. The approximation method offers a range of tradeoff points (e.g., 3 × faster with 20 × less memory) while degrading the throughput only 1% to 5%.

References

  1. Aleksandrov, L., Djidjev, H., Guo, H., and Maheshwari, A. 2007. Partitioning planar graphs with costs and weights. J. Exper. Algor. 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alur, R. 2003. Formal analysis of hierarchical state machines. In Verifcation Theory and Practice. Springer, Berlin, Germany, 42--66.Google ScholarGoogle Scholar
  3. Alur, R., Courcoubetis, C., Henzinger, T. A., and Ho, P. H. 1992. Hybrid automata: An algorithmic approach to the specification and verification of hybrid systems. In Proceedings of the 4th Annual Conference on Hybrid Systems. Springer, Berlin, Germany, 209--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Angelini, P., Di Battista, G., and Patrignani, M. 2007. Computing a minimum-depth planar graph embedding in O(n4) time. Lecture Notes in Computer Science, vol. 4619, 287.Google ScholarGoogle ScholarCross RefCross Ref
  5. Atasu, K., Pozzi, L., and Ienne, P. 2003. Automatic application-specific instruction-set extensions under microarchitectural constraints. In Proceedings of the Design Automation Conference (DAC). IEEE, Los Alamitos, CA, 256--261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Balarin, F., Watanabe, Y., Hsieh, H., Lavagno, L., Passerone, C., Sangiovanni-Vincentelli, A. 2003. Metropolis: An integrated electronic system design environment. IEEE Comput. 36, 4, 45--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Benveniste, A., Carloni, L. P., Caspi, P., and Sangiovanni-Vincentelli, A. L. 2003. Heterogeneous reactive systems modeling and correct-by-construction deployment. In Proceedings of the International Conference on Embedded Software (EMSOFT). Springer, Berlin, Germany, 35--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bonivento, A., Carloni, L. P., and Sangiovanni-Vincentelli, A. L. 2005. Rialto: A bridge between description and implementation of control algorithms for wireless sensor networks. In Proceedings of the 2nd International Conference on Embedded Software (EMSOFT). Springer, Berlin, Germany, 183--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bui, T. N. and Peck, A. 1992. Partitioning planar graphs. SIAM J. Comput. 21, 2, 203--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cong, J., Han, G., and Jiang, W. 2007. Synthesis of an application-specific soft multiprocessor system. In Proceedings of the 15th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA). ACM, New York, 99--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to Algorithms. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Erbas, C., Erbas, S. C., and Pimentel, A. D. 2006. Multiobjective optimization and evolutionary algorithms for the application mapping problem in multiprocessor system-on-chip design. IEEE Trans. Evolut. Comput. 10, 3, 358--374.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Feige, U. and Krauthgamer, R. 2002. A polylogarithmic approximation of the minimum bisection. SIAM J. Comput. 31, 4, 1090--1118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Garey, M. R. and Johnson, D. S. 1990. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Garg, N., Saran, H., and Vazirani, V. V. 2000. Finding separator cuts in planar graphs within twice the optimal. SIAM J. Comput. 29, 1, 159--179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gordon, M. I., Thies, W., and Amarasinghe, S. 2006. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XII). ACM, New York, 151--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Henzinger, T. A., Manna, Z., and Pnueli, A. 1992. Towards refining temporal specifications into hybrid systems. In Proceedings of the 5th International Conference on Hybrid Systems. Springer, Berlin, Germany, 60--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Henzinger, T. A., Nicollin, X., Sifakis, J., and Yovine, S. 1994. Symbolic model checking for real-time systems. Inform. Comput. 111, 2, 193--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Henzinger, T. A., Qadeer, S., and Rajamani, S. K. 1998. You assume, we guarantee: Methodology and case studies. In Proceedings of the 10th International Conference on Computer Aided Verification. Springer, Berlin, Germany, 440--451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Henzinger, T. A. and Sifakis, J. 2006. The embedded systems design challenge. In Proceedings of the 14th International Symposium on Formal Methods. Springer, Berlin, Germany, 1--15.Google ScholarGoogle Scholar
  21. Hu, J. and Marculescu, R. 2005. Energy- and performance-aware mapping for regular noc architectures. IEEE Trans. Comput. Aid. Des. Integr. Circ. Syst. 24, 4.Google ScholarGoogle Scholar
  22. Kahn, G. 1974. The semantics of simple language for parallel programming. In Proceedings of the International Federation for Information Processing (IFIP) Congress. 471--475.Google ScholarGoogle Scholar
  23. Karpinski, M. 2002. Approximability of the minimum bisection problem: An algorithmic challenge. In Proceedings of the 27th International Symposium on Mathematical Foundations of Computer Science (MFCS'02). Springer, Berlin, Germany, 59--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lee, E. A. 2005. Building unreliable systems out of reliable components: The real time story. Tech. rep. UCB/EECS-2005-5, EECS Department, University of California, Berkeley.Google ScholarGoogle Scholar
  25. Lee, E. A. 2006. The problem with threads. IEEE Comput. 39, 5, 33--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lee, E. A. and Messerschmitt, D. G. 1987a. Static scheduling of synchronous data ow programs for digital signal processing. IEEE Trans. Comput. 36, 1, 24--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Lee, E. A. and Messerschmitt, D. G. 1987b. Synchronous data ow. Proc. IEEE 75, 9, 1235--1245.Google ScholarGoogle ScholarCross RefCross Ref
  28. Lipton, R. J. and Tarjan, R. E. 1979. A separator theorem for planar graphs. SIAM J. Applied Mathematics 36, 177--189.Google ScholarGoogle ScholarCross RefCross Ref
  29. Ma, Z., Catthoor, F., and Vounckx, J. 2005. Hierarchical task scheduler for interleaving subtasks on heterogeneous multiprocessor platforms. In Proceedings of the Conference on Asia South Pacific Design Automation (ASP-DAC). IEEE, Los Alamitos, CA, 952--955. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Meeting. 2006. Joint United States-European Union-TEKES workshop: Long term challenges in high con_dence composable embedded systems. http://www.truststc.org/euus/wiki/Euus/HelsinkiMeeting.Google ScholarGoogle Scholar
  31. Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, and Saman Amarasinghe. 2002. A stream compiler for communication-exposed architectures. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X). ACM, New York, 291--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Owens, J. D. et al. 2000. Polygon rendering on a stream architecture. In Proceedings of the Workshop on Graphics Hardware. ACM, New York, 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Owens, J. D. et al. 2002. Media processing applications on the Imagine stream processor. In Proceedings of the IEEE/ACM International Conference on Computer Design (ICCD). IEEE, Los Alamitos, CA, 295--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Park, J. K. and Phillips, C. A. 1993. Finding minimum-quotient cuts in planar graphs. In Proceedings of the 25th Annual ACM Symposium on Theory of Computing (STOC). ACM, New York, 766--775. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Pimentel, A. D. et al. 2001. Exploring embedded-systems architectures with artemis. IEEE Comput. 34, 11, 57--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Pino, J. L., Ha, S., Lee, E. A., and Buck, J. T. 1995. Software synthesis for DSP using ptolemy. J. VLSI Signal Process. Syst. 9, 1-2, 7--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Pinto, A., Bonivento, A., Sangiovanni-Vincentelli, A. L., Passerone, R., and Sgroi, M. 2006. System-level design paradigms: Platform-based design and communication synthesis. ACM Trans. Des. Autom. Electron. Syst. 11, 3, 537--563. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Rangan, R., Vachharajani, N., Stoler, A., Ottoni, G., August, D. I., and Cai, G. Z. N. 2006. Support for high-frequency streaming in CMPs. In Proceedings of the 39th Annual International Symposium on Microarchitecture. IEEE, Los Alamitos, CA, 259--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Rao, S., Amir, E., and Krauthgamer, R. 2003. Constant factor approximation of vertex-cuts in planar graphs. In Proceedings of the ACM Symposium on Theory of Computing (STOC). ACM, New York, 90--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Rao, S. B. 1992. Faster algorithms for finding small edge cuts in planar graphs. In Proceedings of the ACM Symposium on Theory of Computing (STOC). ACM, New York, 229--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Stankovic, J. A. 2007. Keynote speech: Control challenges in wireless sensor networks. In Proceedings of the 10th International Conference on Hybrid Systems: Computation and Control. Springer, Berlin, Germany, 2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Stuijk, S. and Basten, T. 2008. Analyzing concurrency in streaming applications. Kluwver J. Syst. Architec. (available online). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Stuijk, S., Basten, T., Geilen, M., and Corporaal, H. 2007. Multiprocessor resource allocation for throughput-constrained synchronous dataow graphs. In Proceedings of the 44th Design Automation Conference (DAC). IEEE, Los Alamitos, CA, 777--782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Stuijk, S., Geilen, M., and Basten, T. 2006. Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataow graphs. In Proceedings of the 43rd Design Automation Conference (DAC). IEEE, Los Alamitos, CA, 899--904. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Sztipanovits, J., Glossner, C. J., Mudge, T. N., Rowen, C., Sangiovanni-Vincentelli, A. L., Wolf, W., and Zhao, F. 2005. Panel session: Grand challenges in embedded systems. InProceedings of the 2nd International Conference on Embedded Software (EMSOFT). IEEE, Los Alamitos, CA, 333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Taylor, M. B., Psota, J., Saraf, A., Shnidman, N., Strumpen, V., Frank, M., et al. 2004. Evaluation of the RAW microprocessor: An exposed-wire-delay architecture for ILP and streams. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA). IEEE, Los Alamitos, CA, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Thies, W., Karczmarek, M., and Amarasinghe, S. 2002. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction. Springer, Berlin, Germany, 179--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Thies, W., Lin, J., and Amarasinghe, S. 2003. Partitioning a structured stream graph using dynamic programming. Tech. rep., CS Department, Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  49. Thoen, F. and Catthoor, F. 2000. Modeling, Verification, and Exploration of Task-Level Concurrency of Real-Time Embedded Systems. Kluwer Academic Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Yu, J., Yao, J., Bhuyan, L., and Yang, J. 2007. Program mapping onto network processors by recursive bipartitioning and refining. In Proceedings of the 44th Annual IEEE/ACM Design Automation Conference (DAC'04). IEEE, Los Alamitos, CA, 805--810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Yu, Z., Meeuwsen, M., Apperson, R., Sattari, O., Lai, M., Webb, J., Work, E., Mohsenin, T., Singh, M., and Baas, B. M. 2006. An asynchronous array of simple processors for DSP applications. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  52. Zhou, G., Leung, M.-K., and Lee, E. A. 2007. A code generation framework for actor-oriented models with partial evaluation. In Proceedings of the International Conference on Embedded Software and Systems. ACM, New York, 786--799. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Throughput-driven synthesis of embedded software for pipelined execution on multicore architectures

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!