Abstract
We present a methodology for pipelined software synthesis of streaming applications. First, we develop a versatile task assignment algorithm capable of optimizing realistically-arbitrary cost functions for two cores. The algorithm is exact (i.e., theoretically optimal) contrary to existing heuristics. Second, our approximation technique provides an adjustable knob to trade solution quality with algorithm runtime and memory. Third, we develop a recursive heuristic for more cores. FPGA-based emulated experiments validate our theoretical results. The exact algorithm yields 1.7 × throughput improvement. The approximation method offers a range of tradeoff points (e.g., 3 × faster with 20 × less memory) while degrading the throughput only 1% to 5%.
- Aleksandrov, L., Djidjev, H., Guo, H., and Maheshwari, A. 2007. Partitioning planar graphs with costs and weights. J. Exper. Algor. 11. Google Scholar
Digital Library
- Alur, R. 2003. Formal analysis of hierarchical state machines. In Verifcation Theory and Practice. Springer, Berlin, Germany, 42--66.Google Scholar
- Alur, R., Courcoubetis, C., Henzinger, T. A., and Ho, P. H. 1992. Hybrid automata: An algorithmic approach to the specification and verification of hybrid systems. In Proceedings of the 4th Annual Conference on Hybrid Systems. Springer, Berlin, Germany, 209--229. Google Scholar
Digital Library
- Angelini, P., Di Battista, G., and Patrignani, M. 2007. Computing a minimum-depth planar graph embedding in O(n4) time. Lecture Notes in Computer Science, vol. 4619, 287.Google Scholar
Cross Ref
- Atasu, K., Pozzi, L., and Ienne, P. 2003. Automatic application-specific instruction-set extensions under microarchitectural constraints. In Proceedings of the Design Automation Conference (DAC). IEEE, Los Alamitos, CA, 256--261. Google Scholar
Digital Library
- Balarin, F., Watanabe, Y., Hsieh, H., Lavagno, L., Passerone, C., Sangiovanni-Vincentelli, A. 2003. Metropolis: An integrated electronic system design environment. IEEE Comput. 36, 4, 45--52. Google Scholar
Digital Library
- Benveniste, A., Carloni, L. P., Caspi, P., and Sangiovanni-Vincentelli, A. L. 2003. Heterogeneous reactive systems modeling and correct-by-construction deployment. In Proceedings of the International Conference on Embedded Software (EMSOFT). Springer, Berlin, Germany, 35--50. Google Scholar
Digital Library
- Bonivento, A., Carloni, L. P., and Sangiovanni-Vincentelli, A. L. 2005. Rialto: A bridge between description and implementation of control algorithms for wireless sensor networks. In Proceedings of the 2nd International Conference on Embedded Software (EMSOFT). Springer, Berlin, Germany, 183--186. Google Scholar
Digital Library
- Bui, T. N. and Peck, A. 1992. Partitioning planar graphs. SIAM J. Comput. 21, 2, 203--215. Google Scholar
Digital Library
- Cong, J., Han, G., and Jiang, W. 2007. Synthesis of an application-specific soft multiprocessor system. In Proceedings of the 15th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA). ACM, New York, 99--107. Google Scholar
Digital Library
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to Algorithms. MIT Press, Cambridge, MA. Google Scholar
Digital Library
- Erbas, C., Erbas, S. C., and Pimentel, A. D. 2006. Multiobjective optimization and evolutionary algorithms for the application mapping problem in multiprocessor system-on-chip design. IEEE Trans. Evolut. Comput. 10, 3, 358--374.Google Scholar
Digital Library
- Feige, U. and Krauthgamer, R. 2002. A polylogarithmic approximation of the minimum bisection. SIAM J. Comput. 31, 4, 1090--1118. Google Scholar
Digital Library
- Garey, M. R. and Johnson, D. S. 1990. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman, New York. Google Scholar
Digital Library
- Garg, N., Saran, H., and Vazirani, V. V. 2000. Finding separator cuts in planar graphs within twice the optimal. SIAM J. Comput. 29, 1, 159--179. Google Scholar
Digital Library
- Gordon, M. I., Thies, W., and Amarasinghe, S. 2006. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XII). ACM, New York, 151--162. Google Scholar
Digital Library
- Henzinger, T. A., Manna, Z., and Pnueli, A. 1992. Towards refining temporal specifications into hybrid systems. In Proceedings of the 5th International Conference on Hybrid Systems. Springer, Berlin, Germany, 60--76. Google Scholar
Digital Library
- Henzinger, T. A., Nicollin, X., Sifakis, J., and Yovine, S. 1994. Symbolic model checking for real-time systems. Inform. Comput. 111, 2, 193--244. Google Scholar
Digital Library
- Henzinger, T. A., Qadeer, S., and Rajamani, S. K. 1998. You assume, we guarantee: Methodology and case studies. In Proceedings of the 10th International Conference on Computer Aided Verification. Springer, Berlin, Germany, 440--451. Google Scholar
Digital Library
- Henzinger, T. A. and Sifakis, J. 2006. The embedded systems design challenge. In Proceedings of the 14th International Symposium on Formal Methods. Springer, Berlin, Germany, 1--15.Google Scholar
- Hu, J. and Marculescu, R. 2005. Energy- and performance-aware mapping for regular noc architectures. IEEE Trans. Comput. Aid. Des. Integr. Circ. Syst. 24, 4.Google Scholar
- Kahn, G. 1974. The semantics of simple language for parallel programming. In Proceedings of the International Federation for Information Processing (IFIP) Congress. 471--475.Google Scholar
- Karpinski, M. 2002. Approximability of the minimum bisection problem: An algorithmic challenge. In Proceedings of the 27th International Symposium on Mathematical Foundations of Computer Science (MFCS'02). Springer, Berlin, Germany, 59--67. Google Scholar
Digital Library
- Lee, E. A. 2005. Building unreliable systems out of reliable components: The real time story. Tech. rep. UCB/EECS-2005-5, EECS Department, University of California, Berkeley.Google Scholar
- Lee, E. A. 2006. The problem with threads. IEEE Comput. 39, 5, 33--42. Google Scholar
Digital Library
- Lee, E. A. and Messerschmitt, D. G. 1987a. Static scheduling of synchronous data ow programs for digital signal processing. IEEE Trans. Comput. 36, 1, 24--35. Google Scholar
Digital Library
- Lee, E. A. and Messerschmitt, D. G. 1987b. Synchronous data ow. Proc. IEEE 75, 9, 1235--1245.Google Scholar
Cross Ref
- Lipton, R. J. and Tarjan, R. E. 1979. A separator theorem for planar graphs. SIAM J. Applied Mathematics 36, 177--189.Google Scholar
Cross Ref
- Ma, Z., Catthoor, F., and Vounckx, J. 2005. Hierarchical task scheduler for interleaving subtasks on heterogeneous multiprocessor platforms. In Proceedings of the Conference on Asia South Pacific Design Automation (ASP-DAC). IEEE, Los Alamitos, CA, 952--955. Google Scholar
Digital Library
- Meeting. 2006. Joint United States-European Union-TEKES workshop: Long term challenges in high con_dence composable embedded systems. http://www.truststc.org/euus/wiki/Euus/HelsinkiMeeting.Google Scholar
- Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, and Saman Amarasinghe. 2002. A stream compiler for communication-exposed architectures. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X). ACM, New York, 291--303. Google Scholar
Digital Library
- Owens, J. D. et al. 2000. Polygon rendering on a stream architecture. In Proceedings of the Workshop on Graphics Hardware. ACM, New York, 23--32. Google Scholar
Digital Library
- Owens, J. D. et al. 2002. Media processing applications on the Imagine stream processor. In Proceedings of the IEEE/ACM International Conference on Computer Design (ICCD). IEEE, Los Alamitos, CA, 295--302. Google Scholar
Digital Library
- Park, J. K. and Phillips, C. A. 1993. Finding minimum-quotient cuts in planar graphs. In Proceedings of the 25th Annual ACM Symposium on Theory of Computing (STOC). ACM, New York, 766--775. Google Scholar
Digital Library
- Pimentel, A. D. et al. 2001. Exploring embedded-systems architectures with artemis. IEEE Comput. 34, 11, 57--63. Google Scholar
Digital Library
- Pino, J. L., Ha, S., Lee, E. A., and Buck, J. T. 1995. Software synthesis for DSP using ptolemy. J. VLSI Signal Process. Syst. 9, 1-2, 7--21. Google Scholar
Digital Library
- Pinto, A., Bonivento, A., Sangiovanni-Vincentelli, A. L., Passerone, R., and Sgroi, M. 2006. System-level design paradigms: Platform-based design and communication synthesis. ACM Trans. Des. Autom. Electron. Syst. 11, 3, 537--563. Google Scholar
Digital Library
- Rangan, R., Vachharajani, N., Stoler, A., Ottoni, G., August, D. I., and Cai, G. Z. N. 2006. Support for high-frequency streaming in CMPs. In Proceedings of the 39th Annual International Symposium on Microarchitecture. IEEE, Los Alamitos, CA, 259--272. Google Scholar
Digital Library
- Rao, S., Amir, E., and Krauthgamer, R. 2003. Constant factor approximation of vertex-cuts in planar graphs. In Proceedings of the ACM Symposium on Theory of Computing (STOC). ACM, New York, 90--99. Google Scholar
Digital Library
- Rao, S. B. 1992. Faster algorithms for finding small edge cuts in planar graphs. In Proceedings of the ACM Symposium on Theory of Computing (STOC). ACM, New York, 229--240. Google Scholar
Digital Library
- Stankovic, J. A. 2007. Keynote speech: Control challenges in wireless sensor networks. In Proceedings of the 10th International Conference on Hybrid Systems: Computation and Control. Springer, Berlin, Germany, 2.Google Scholar
Digital Library
- Stuijk, S. and Basten, T. 2008. Analyzing concurrency in streaming applications. Kluwver J. Syst. Architec. (available online). Google Scholar
Digital Library
- Stuijk, S., Basten, T., Geilen, M., and Corporaal, H. 2007. Multiprocessor resource allocation for throughput-constrained synchronous dataow graphs. In Proceedings of the 44th Design Automation Conference (DAC). IEEE, Los Alamitos, CA, 777--782. Google Scholar
Digital Library
- Stuijk, S., Geilen, M., and Basten, T. 2006. Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataow graphs. In Proceedings of the 43rd Design Automation Conference (DAC). IEEE, Los Alamitos, CA, 899--904. Google Scholar
Digital Library
- Sztipanovits, J., Glossner, C. J., Mudge, T. N., Rowen, C., Sangiovanni-Vincentelli, A. L., Wolf, W., and Zhao, F. 2005. Panel session: Grand challenges in embedded systems. InProceedings of the 2nd International Conference on Embedded Software (EMSOFT). IEEE, Los Alamitos, CA, 333. Google Scholar
Digital Library
- Taylor, M. B., Psota, J., Saraf, A., Shnidman, N., Strumpen, V., Frank, M., et al. 2004. Evaluation of the RAW microprocessor: An exposed-wire-delay architecture for ILP and streams. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA). IEEE, Los Alamitos, CA, 2. Google Scholar
Digital Library
- Thies, W., Karczmarek, M., and Amarasinghe, S. 2002. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction. Springer, Berlin, Germany, 179--196. Google Scholar
Digital Library
- Thies, W., Lin, J., and Amarasinghe, S. 2003. Partitioning a structured stream graph using dynamic programming. Tech. rep., CS Department, Massachusetts Institute of Technology.Google Scholar
- Thoen, F. and Catthoor, F. 2000. Modeling, Verification, and Exploration of Task-Level Concurrency of Real-Time Embedded Systems. Kluwer Academic Publishers. Google Scholar
Digital Library
- Yu, J., Yao, J., Bhuyan, L., and Yang, J. 2007. Program mapping onto network processors by recursive bipartitioning and refining. In Proceedings of the 44th Annual IEEE/ACM Design Automation Conference (DAC'04). IEEE, Los Alamitos, CA, 805--810. Google Scholar
Digital Library
- Yu, Z., Meeuwsen, M., Apperson, R., Sattari, O., Lai, M., Webb, J., Work, E., Mohsenin, T., Singh, M., and Baas, B. M. 2006. An asynchronous array of simple processors for DSP applications. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). IEEE, Los Alamitos, CA.Google Scholar
- Zhou, G., Leung, M.-K., and Lee, E. A. 2007. A code generation framework for actor-oriented models with partial evaluation. In Proceedings of the International Conference on Embedded Software and Systems. ACM, New York, 786--799. Google Scholar
Digital Library
Index Terms
Throughput-driven synthesis of embedded software for pipelined execution on multicore architectures
Recommendations
Embedded software development on top of transaction-level models
CODES+ISSS '07: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesisEarly embedded SW development with transaction-level models has been broadly promoted to improve SoC design productivity. But the proposed APIs only provide low-level read/write operations via a TLM interconnect. SW developers have to implement platform-...
An FPGA-Based Framework for Technology-Aware Prototyping of Multicore Embedded Architectures
The use of cycle-accurate software simulators as a foundation for the exploration of all the possible full-system hardware-software (hw-sw) configurations does not appear to be anymore a feasible way to handle modern embedded multicore systems ...
A Student Experiment Method for Learning the Basics of Embedded Software Development Including HW/SW Co-design
AINAW '08: Proceedings of the 22nd International Conference on Advanced Information Networking and Applications - WorkshopsThe applications of embedded system are widespread in not only consumer products and industrial machines but also new applications such as ubiquitous networking or sensor networking. There is a great demand for em-bedded software engineers in the ...






Comments