skip to main content
research-article

Communication Optimizations for Multithreaded Code Generation from Simulink Models

Authors Info & Claims
Published:21 May 2015Publication History
Skip Abstract Section

Abstract

Communication frequency is increasing with the growing complexity of emerging embedded applications and the number of processors in the implemented multiprocessor SoC architectures. In this article, we consider the issue of communication cost reduction during multithreaded code generation from partitioned Simulink models to help designers in code optimization to improve system performance. We first propose a technique combining message aggregation and communication pipeline methods, which groups communications with the same destinations and sources and parallelizes communication and computation tasks. We also present a method to apply static analysis and dynamic emulation for efficient communication buffer allocation to further reduce synchronization cost and increase processor utilization. The existing cyclic dependency in the mapped model may hinder the effectiveness of the two techniques. We further propose a set of optimizations involving repartition with strongly connected threads to maximize the degree of communication reduction and preprocessing strategies with available delays in the model to reduce the number of communication channels that cannot be optimized. Experimental results demonstrate the advantages of the proposed optimizations with 11--143% throughput improvement.

References

  1. Prithviraj Banerjee, John A. Chandy, Manish Gupta, Eugene W. Hodges IV, John G. Holm, Antonio Lain, Daniel J. Palermo, Shankar Ramaswamy, and Ernesto Su.1995. The paradigm compiler for distributed-memory multicomputers. Computer 28, 10 (October 1995), 37--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Lisane Brisolara, Sang-il Han, Xavier Guerin, Luigi Carro, Ricardo Reis, Soo-Ik Chae, and Ahmed Jerraya. 2007. Reducing fine-grain communication overhead in multithread code generation for heterogeneous MPSoC. In Proceedings of the 10th International Workshop on Software & Compilers for Embedded Systems (SCOPES’’07), Heiko Falk and Peter Marwedel (Eds.). ACM, New York, NY, 81--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jeronimo Castrillon, Andreas Tretter, Rainer Leupers, and Gerd Ascheid. 2012. Communication-aware mapping of KPN applications onto heterogeneous MPSoCs. In Proceedings of the 49th Annual Design Automation Conference (DAC’12). ACM, New York, NY, 1266--1271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jeronimo Castrillon, Rainer Leupers, and Gerd Ascheid. 2013. MAPS: Mapping concurrent dataflow applications to heterogeneous mpsocs. IEEE Transactions on Industrial Informatics 9, 1, 527--545.Google ScholarGoogle ScholarCross RefCross Ref
  5. Gregory A. Chadwick. 2013. Communication-centric, Multi-Core, Fine-Grained Processor Architecture. Technical Report UCAM-CL-TR-832. University of Cambridge, Computer Laboratory.Google ScholarGoogle Scholar
  6. Eric Cheung, Harry Hsieh, and Felice Balarin. 2007. Automatic buffer sizing for rate-constrained KPN applications on multiprocessor system-on-chip. In Proceedings of the 2007 IEEE International High Level Design Validation and Test Workshop. IEEE Computer Society, Washington, DC, 37--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jason Cong, Guoling Han, and Wei Jiang. 2007. Synthesis of an application-specific soft multiprocessor system. In Proceedings of the 2007 ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays (FPGA’07). ACM, New York, NY, 99--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C-SKY Inc. Homepage. Retrieved from http://www.c-sky.com.Google ScholarGoogle Scholar
  9. RTI-MP, dSPACE, Inc. Retrieved from http://www.dspaceinc.com/ww/en/inc/home/products/sw/impsw/rtimpblo.cfm.Google ScholarGoogle Scholar
  10. Stijn Eyerman and Lieven Eeckhout. 2010. Modeling critical sections in Amdahl's law and its implications for multicore design. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 362--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sang-Il Han, Amer Baghdadi, Marius Bonaciu, Soo-Ik Chae, and Ahmed A. Jerraya. 2004. An efficient scalable and flexible data transfer architecture for multiprocessor SoC with massive distributed memory. In Proceedings of the 41st Annual Design Automation Conference (DAC’04). ACM, New York, NY, 250--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sang-Il Han, Soo-Ik Chae, Lisane Brisolara, Luigi Carro, Ricardo Reis, Xavier Guérin, and Ahmed A. Jerraya. 2007. Memory-efficient multithreaded code generation from Simulink for heterogeneous MPSoC. Design Automation for Embedded Systems 11, 4, 249--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sang-Il Han, Soo-Ik Chae, Lisane Brisolara, Luigi Carro, Katalin Popovici, Xavier Guerin, Ahmed A. Jerraya, Kai Huang, Lei Li, and Xiaolang Yan. 2009. Simulink®-based heterogeneous multiprocessor SoC design flow for mixed hardware/software refinement and simulation. Integrated VLSI Journal 42, 2 (February 2009), 227--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sang-Il Han, Soo-Ik Chae, and Ahmed A. Jerraya. 2006a. Functional modeling techniques for efficient SW code generation of video codec applications. In Proceedings of the 2006 Asia and South Pacific Design Automation Conference (ASP-DAC’06). IEEE Press, Piscataway, NJ, 935--940. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sang-Il Han, Xavier Guerin, Soo-Ik Chae, and Ahmed A. Jerraya. 2006b. Buffer memory optimization for video codec application modeled in Simulink. In Proceedings of the 43rd Annual Design Automation Conference (DAC’06). ACM, New York, NY, 689--694. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Pieter H. Hartel, Theo C. Ruys, and Marc C. W. Geilen. 2008. Scheduling optimisations for SPIN to minimise buffer requirements in synchronous data flow. In Proceedings of the 2008 International Conference on Formal Methods in Computer-Aided Design (FMCAD’08), Alessandro Cimatti and Robert B. Jones (Eds.). IEEE Press, Piscataway, NJ, Article 21, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gerard Holzmann. 2003. The Spin Model Checker: Primer and Reference Manual (First ed.). Addison-Wesley Professional. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kai Huang, Wolfgang Haid, Iuliana Bacivarov, Matthias Keller, and Lothar Thiele. 2012. Embedding formal performance analysis into the design cycle of MPSoCs for real-time streaming applications. ACM Transactions on Embedded Computer Systems 11, 1, Article 8 (April 2012), 23 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kai Huang, Sang-il Han, Katalin Popovici, Lisane Brisolara, Xavier Guerin, Lei Li, Xiaolang Yan, Soo-lk Chae, Luigi Carro, and Ahmed Amine Jerraya. 2007. Simulink-based MPSoC design flow: Case study of Motion-JPEG and H.264. In Proceedings of the 44th Annual Design Automation Conference (DAC’07). ACM, New York, NY, 39--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gilles Kahn and David MacQueeen. 1976. Coroutines and networks of parallel processors. In Proceedings of World Computer Congress-IFIP (1977), Toronto, Canada, 993--998.Google ScholarGoogle Scholar
  21. Edward A. Lee and Thomas M. Parks. 2001. Dataflow process networks. In Readings in Hardware/Software Co-Design, Giovanni De Micheli, Rolf Ernst, and Wayne Wolf (Eds.). Kluwer Academic Publishers, Norwell, MA, 59--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Weichen Liu, Zonghua Gu, Jiang Xu, Yu Wang, and Mingxuan Yuan. 2009. An efficient technique for analysis of minimal buffer requirements of synchronous dataflow graphs with model checking. In Proceedings of the 7th IEEE /ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’09). ACM, New York, NY, 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Simulink, Mathworks. Retrieved from http://www.mathworks.com.Google ScholarGoogle Scholar
  24. Real-time workshop, Mathworks. Retrieved from http://www.mathworks.com.Google ScholarGoogle Scholar
  25. Simon Moore and Daniel Greenfield. 2008. The next resource war: computation vs. communication. In Proceedings of the 2008 International Workshop on System Level Interconnect Prediction (SLIP’08). ACM, New York, NY, 81--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. UML, Object Management Group, Inc. http://www.uml.org/.Google ScholarGoogle Scholar
  27. Tae-ho Shin, Hyunok Oh, and Soonhoi Ha. 2011. Minimizing buffer requirements for throughput constrained parallel execution of synchronous dataflow graph. In Proceedings of the 16th Asia and South Pacific Design Automation Conference (ASPDAC’11). IEEE Press, Piscataway, NJ, 165--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sander Stuijk, Marc Geilen, and Twan Basten. 2006. Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs. In Proceedings of the 43rd Annual Design Automation Conference (DAC’06). ACM, New York, NY, 899--904. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Robert Tarjan. 1971. Depth-first search and linear graph algorithms. In Proceedings of the 12th Annual Symposium on Switching and Automata Theory (SWAT’71). IEEE Computer Society, Washington, DC, 114--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. V6 TAI Logic Module, S2C Inc. http://www.s2cinc.com/product/HardWare/V6TAILogicModule.htm.Google ScholarGoogle Scholar
  31. Jia Yu, Jingnan Yao, Laxmi Bhuyan, and Jun Yang. 2007. Program mapping onto network processors by recursive bipartitioning and refining. In Proceedings of the 44th Annual Design Automation Conference (DAC’07). ACM, New York, NY, 805--810. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Communication Optimizations for Multithreaded Code Generation from Simulink Models

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!