skip to main content
research-article

Parallelizing Industrial Hard Real-Time Applications for the parMERASA Multicore

Authors Info & Claims
Published:23 May 2016Publication History
Skip Abstract Section

Abstract

The EC project parMERASA (Multicore Execution of Parallelized Hard Real-Time Applications Supporting Analyzability) investigated timing-analyzable parallel hard real-time applications running on a predictable multicore processor. A pattern-supported parallelization approach was developed to ease sequential to parallel program transformation based on parallel design patterns that are timing analyzable. The parallelization approach was applied to parallelize the following industrial hard real-time programs: 3D path planning and stereo navigation algorithms (Honeywell International s.r.o.), control algorithm for a dynamic compaction machine (BAUER Maschinen GmbH), and a diesel engine management system (DENSO AUTOMOTIVE Deutschland GmbH). This article focuses on the parallelization approach, experiences during parallelization with the applications, and quantitative results reached by simulation, by static WCET analysis with the OTAWA tool, and by measurement-based WCET analysis with the RapiTime tool.

References

  1. ARINC Inc. 2012. ARINC Specification 653: Avionics Application Software Standard Interface, Part 1 and 4, Subset Services.Google ScholarGoogle Scholar
  2. AUTOSAR. 2014. Retrieved April 13, 2016 from http://www.autosar.org/.Google ScholarGoogle Scholar
  3. Clément Ballabriga, Hugues Cassé, Christine Rochange, and Pascal Sainrat. 2011. OTAWA: An open toolbox for adaptive WCET analysis. In Software Technologies for Embedded and Ubiquitous Systems, SangLyul Min, Robert Pettit, Peter Puschner, and Theo Ungerer (Eds.). Lecture Notes in Computer Science, Vol. 6399. Springer, Berlin, 35--46. DOI:http://dx.doi.org/10.1007/978-3-642-16256-5_6Google ScholarGoogle Scholar
  4. Tobias Bjerregaard and Jens Sparso. 2005. A scheduling discipline for latency and bandwidth guarantees in asynchronous network-on-chip. In Symposium on Asynchronous Circuits and Systems (ASYNC’05). 34--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Christian Bradatsch and Florian Kluge. 2013. parMERASA Multi-Core RTOS Kernel. Technical Report no. 2013-02. University of Augsburg, Augsburg, Germany.Google ScholarGoogle Scholar
  6. Christian Bradatsch, Florian Kluge, and Theo Ungerer. 2013. A cross-domain system architecture for embedded hard real-time many-core systems. In 11th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing (EUC’13). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  7. Francisco J. Cazorla, Eduardo Quiñones, Tullio Vardanega, Liliana Cucu, Benoit Triquet, Guillem Bernat, Emery Berger, Jaume Abella, Franck Wartel, Michael Houston, Luca Santinelli, Leonidas Kosmidis, Code Lo, and Dorin Maxim. 2013. PROARTIS: Probabilistically analyzable real-time systems. ACM Transactions on Embedded Computing Systems 12, 2s, Article 94, 26 pages. DOI:http://dx.doi.org/10.1145/2465787.2465796 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bernard Cole. 2015. Effective code coverage comes to multicore software. EETimes. Retrieved April 13, 2016 from http://www.eetimes.com/document.asp?doc_id=1326496.Google ScholarGoogle Scholar
  9. Antoine Colin and Isabelle Puaut. 2001. A modular and retargetable framework for tree-based WCET analysis. In 13th Euromicro Conference on Real-Time Systems. IEEE, 37--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Benoît Dupont de Dinechin, Pierre Guironnet de Massas, Guillaume Lager, Clément Léger, Benjamin Orgogozo, Jérôme Reybert, and Thierry Strudel. 2013. A distributed run-time environment for the Kalray MPPA-256 integrated manycore processor. Procedia Computer Science 18, Complete, 1654--1663.Google ScholarGoogle Scholar
  11. Heiko Falk and Paul Lokuciejewski. 2010. A compiler framework for the reduction of worst-case execution times. Real-Time Systems 46, 2, 251--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. I. Foster. 1995. Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley Longman Publishing Co., Inc., Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Antoine Fraboulet, Tanguy Risset, and Antoine Scherrer. 2004. Cycle accurate simulation model generation for soc prototyping. In Computer Systems: Architectures, Modeling, and Simulation. Springer, 453--462.Google ScholarGoogle Scholar
  14. Mike Gerdes, Ralf Jahr, and Theo Ungerer. 2013. parMERASA Pattern Catalogue: Timing Predictable Parallel Design Patterns. Technical Report no. 2013-11. University of Augsburg, Augsburg, Germany.Google ScholarGoogle Scholar
  15. Mike Gerdes, Florian Kluge, Theo Ungerer, Christine Rochange, and Pascal Sainrat. 2012. Time analysable synchronisation techniques for parallelised hard real-time applications. In Design, Automation & Test in Europe Conference & Exhibition (DATE’& Exhibition (DATE’’12). 671--676. DOI:http://dx.doi.org/10.1109/DATE.2012.6176555 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mike Gerdes, Julian Wolf, Irakli Guliashvili, Theo Ungerer, Michael Houston, Guillem Bernat, Stefan Schnitzler, and Hans Regler. 2011. Large drilling machine control code-parallelisation and WCET speedup. In IEEE International Symposium on Industrial Embedded Systems (SIES’11). IEEE, 91--94.Google ScholarGoogle ScholarCross RefCross Ref
  17. Kees Goossens and Andreas Hansson. 2010. The aethereal network on chip after ten years: Goals, evolution, lessons, and future. In Design Automation Conference (DAC’10). 306--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Reinhold Heckmann and Christian Ferdinand. 2004. Worst-case execution time prediction by static program analysis. In 18th International Parallel and Distributed Processing Symposium (IPDPS’04). IEEE Computer Society. 26--30.Google ScholarGoogle Scholar
  19. Infineon Technologies AG 2008. TriCore 1 Architecture Volume 1: Instruction Set V1.3 & V1.3.1. Infineon Technologies AG.Google ScholarGoogle Scholar
  20. ISO. 2011. Road Vehicles -- Functional Safety -- Part 6: Product Development at the Software Level, Ref. Num. ISO 26262-6:2011(E).Google ScholarGoogle Scholar
  21. Ralf Jahr, Martin Frieb, Mike Gerdes, and Theo Ungerer. 2014. Model-based parallelization and optimization of an industrial control code. In Tagungsband des Dagstuhl-Workshops MBEES: Modellbasierte Entwicklung eingebetteter Systeme X. 63--72.Google ScholarGoogle Scholar
  22. Ralf Jahr, Mike Gerdes, and Theo Ungerer. 2013a. On efficient and effective model-based parallelization of hard real-time applications. In Dagstuhl-Workshop MBEES: Modellbasierte Entwicklung eingebetteter Systeme IX. 50--59.Google ScholarGoogle Scholar
  23. Ralf Jahr, Mike Gerdes, and Theo Ungerer. 2013b. A pattern-supported parallelization approach. In Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM’13). 53--62. DOI:http://dx.doi.org/10.1145/2442992.2442998 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ralf Jahr, Mike Gerdes, Theo Ungerer, Haluk Ozaktas, Christine Rochange, and Pavel G. Zaykov. 2014a. Effects of structured parallelism by parallel design patterns on embedded hard real-time systems. In IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA’14). 1--10. DOI:http://dx.doi.org/10.1109/RTCSA.2014.6910546Google ScholarGoogle Scholar
  25. Ralf Jahr, Alexander Stegmeier, Rolf Kiefhaber, Martin Frieb, and Theo Ungerer. 2014b. User Manual for the Optimization and WCET Analysis of Software with Timing Analyzable Algorithmic Skeletons. Technical Report no. 2014-05. University of Augsburg, Augsburg, Germany.Google ScholarGoogle Scholar
  26. Ben Lickly, Isaac Liu, Sungjun Kim, Hiren D. Patel, Stephen A. Edwards, and Edward A. Lee. 2008. Predictable programming on a precision timed architecture. In Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’08). 137--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Isaac Liu, Jan Reineke, David Broman, Michael Zimmer, and Edward A. Lee. 2012. PRET microarchitecture implementation with repeatable timing and competitive performance. In International Conference on Computer Design (ICCD’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Robert G. Lukas. 1995. Dynamic compaction. Geotechnical Engineering Circular No. 1, FHWA-SA-95-037, 1--97. http://isddc.dot.gov/OLPFiles/FHWA/009754.pdf.Google ScholarGoogle Scholar
  29. Timothy Mattson, Beverly Sanders, and Berna Massingill. 2004. Patterns for Parallel Programming (1st ed.). Addison-Wesley Professional, Indianapolis, IN. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Timothy G. Mattson, Michael Riepen, Thomas Lehnig, Paul Brett, Werner Haas, Patrick Kennedy, Jason Howard, Sriram Vangal, Nitin Borkar, Greg Ruhl, and Saurabh Dighe. 2010. The 48-core SCC processor: The programmer’s view. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC’10). 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ivan Miro-Panades, Fabien Clermidy, Pascal Vivet, and Alain Greiner. 2008. Physical implementation of the DSPIN network-on-chip in the FAUST architecture. In International Symposium on Networks on Chip (NOCS’08). 139--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jörg Mische, Irakli Guliashvili, Sascha Uhrig, and Theo Ungerer. 2010. How to enhance a superscalar processor to provide hard real-time capable in-order SMT. In 23rd International Conference on Architecture of Computing Systems (ARCS’10). Hannover, Germany, 2--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jörg Mische and Theo Ungerer. 2014. Guaranteed service independent of the task placement in NoCs with torus topology. In 22nd International Conference on Real-Time Networks and Systems (RTNS’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Haluk Ozaktas, Christine Rochange, and Pascal Sainrat. 2013. Automatic WCET analysis of real-time parallel applications. In OASIcs-OpenAccess Series in Informatics, Vol. 30. Schloss Dagstuhl -- Leibniz Center for Informatics.Google ScholarGoogle Scholar
  35. Milos Panic, Sebastian Kehr, Eduardo Quiñones, Bert Böddeker, Jaume Abella, and Francisco J. Cazorla. 2014a. RunPar: An allocation algorithm for automotive applications exploiting runnable parallelism in multicores. In 12th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Milos Panic, Eduardo Quiñones, Pavel G. Zaykov, Carles Hernandez, Jaume Abella, and Francisco J. Cazorla. 2014b. Parallel many-core avionics systems. In ACM International Conference on Embedded Software (EMSOFT’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Marco Paolieri, Eduardo Quiñones, and Francisco J. Cazorla. 2013. Timing effects of DDR memory systems in hard real-time multicore architectures: Issues and solutions. ACM Transactions on Embedded Computing Systems (TECS’13) 12, 1s, 64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Christof Pitter and Martin Schoeberl. 2010. A real-time Java chip-multiprocessor. ACM Transactions on Embedded Computing Systems 10, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Arthur Pyka, Mathias Rohde, and Sascha Uhrig. 2013. A real-time capable first-level cache for multi-cores. In 3rd Workshop on High Performance and Real-Time Embedded Systems (HiRES’13).Google ScholarGoogle Scholar
  40. Arthur Pyka, Mathias Rohde, and Sascha Uhrig. 2014a. Extended performance analysis of the time predictable on-demand coherent data cache for multi- and many-core systems. In International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS’14). 107--114. DOI:http://dx.doi.org/10.1109/SAMOS.2014.6893201Google ScholarGoogle ScholarCross RefCross Ref
  41. Arthur Pyka, Mathias Rohde, and Sascha Uhrig. 2014b. A real-time capable coherent data cache for multicores. Concurrency and Computation: Practice and Experience 26, 6, 1342--1354. DOI:http://dx.doi.org/10.1002/cpe.3172 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Arthur Pyka, Lilian Tadros, Sascha Uhrig, Hugues Cassé, Haluk Ozaktas, and Christine Rochange. 2015. WCET analysis of parallel benchmarks using on-demand coherent cache. In 3rd Workshop on High-Performance and Real-Time Embedded Systems (HIRES’15).Google ScholarGoogle Scholar
  43. Christine Rochange, Armelle Bonenfant, Pascal Sainrat, Mike Gerdes, Julian Wolf, Theo Ungerer, Zlatko Petrov, and Frantisek Mikulu. 2010. WCET analysis of a parallel 3D multigrid solver executed on the MERASA multi-core. In Workshop on Worst-Case Execution Time Analysis (WCET’10). 90--100.Google ScholarGoogle Scholar
  44. Martin Schoeberl. 2008. A Java processor architecture for embedded real-time systems. Journal of Systems Architecture 54, 1, 265--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Martin Schoeberl, Sahar Abbaspour, Benny Akesson, Neil Audsley, Raffaele Capasso, Jamie Garside, Kees Goossens, Sven Goossens, Scott Hansen, Reinhold Heckmann, Stefan Hepp, Benedikt Huber, Alexander Jordan, Evangelia Kasapaki, Jens Knoop, Yonghui Li, Daniel Prokesch, Wolfgang Puffitsch, Peter Puschner, André Rocha, Cláudio Silva, Jens Spars, and Alessandro Tocchi. 2015. T-CREST: Time-predictable multi-core architecture for embedded systems. Journal of Systems Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Martin Schoeberl, Pascal Schleuniger, Wolfgang Puffitsch, Florian Brandner, Christian W. Probst, Sven Karlsson, and Tommy Thorn. 2011. Towards a time-predictable dual-issue microprocessor: The Patmos approach. In Workshop on Bringing Theory to Practice: Predictability and Performance in Embedded Systems (PPES’11).Google ScholarGoogle Scholar
  47. Andreas Schranzhofer, Rodolfo Pellizzoni, Jian-Jia Chen, Lothar Thiele, and Marco Caccamo. 2010. Worst-case response time analysis of resource access models in multi-core systems. In 47th ACM/IEEE Design Automation Conference (DAC’10). 332--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Radu Stefan, Anca Molnos, and Kees Goossens. 2014. dAElite: A TDM NoC supporting QoS, multicast, and fast connection set-up. IEEE Transactions on Computers 63, 3, 583--594. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Alexander Stegmeier, Martin Frieb, Ralf Jahr, and Theo Ungerer. 2015. Algorithmic skeletons for parallelization of embedded real-time systems. In 3rd Workshop on High-Performance and Real-Time Embedded Systems (HiRES’15).Google ScholarGoogle Scholar
  50. Lothar Thiele and Reinhard Wilhelm. 2004. Design for timing predictability. Real-Time Systems 28, 2--3, 157--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Theo Ungerer, Francisco J. Cazorla, Pascal Sainrat, Guillem Bernat, Zlatko Petrov, Christine Rochange, Eduardo Quiñones, Mike Gerdes, Marco Paolieri, Julian Wolf, Hugues Cassé, Sascha Uhrig, Irakli Guliashvili, Michael Houston, Florian Kluge, Stefan Metzlaff, and Jörg Mische. 2010. Merasa: Multicore execution of hard real-time applications supporting analyzability. IEEE Micro 30, 5, 66--75. DOI:http://dx.doi.org/10.1109/MM.2010.78 Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Theo Ungerer, Borut Robič, and Jurij Šilc. 2003. A survey of processors with explicit multithreading. ACM Computing Surveys 35, 1, 29--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, Frank Mueller, Isabelle Puaut, Peter Puschner, Jan Staschulat, and Per Stenström. 2008. The worst-case execution-time problem—overview of methods and survey of tools. ACM Transactions on Embedded Computing Systems 7, 3, Article 36, 53 pages. DOI:http://dx.doi.org/10.1145/1347375.1347389 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Reinhard Wilhelm, Daniel Grund, Jan Reineke, Marc Schlickling, Markus Pister, and Christian Ferdinand. 2009. Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 7, 966--978. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parallelizing Industrial Hard Real-Time Applications for the parMERASA Multicore

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!