Abstract
The need for increased application performance in high-integrity systems such as those in avionics is on the rise as software continues to implement more complex functionalities. The prevalent computing solution for future high-integrity embedded products is multi-processor systems-on-chip (MPSoC) processors. MPSoCs include central processing unit (CPU) multicores that enable improving performance via thread-level parallelism. MPSoCs also include generic accelerators (graphics processing units [GPUs]) and application-specific accelerators. However, the data processing approach (DPA) required to exploit each of these underlying parallel hardware blocks carries several open challenges to enable the safe deployment in high-integrity domains. The main challenges include the qualification of its associated runtime system and the difficulties in analyzing programs deploying the DPA with out-of-the-box timing analysis and code coverage tools. In this work, we perform a thorough analysis of vector extensions (VExts) in current commercial off-the-shelf (COTS) processors for high-integrity systems. We show that VExts prevent many of the challenges arising with parallel programming models and GPUs. Unlike other DPAs, VExts require no runtime support, prevent design race conditions that might arise with parallel programming models, and have minimum impact on the software ecosystem, enabling the use of existing code coverage and timing analysis tools. We develop vectorized versions of neural network kernels and show that the NVIDIA Xavier VExts provide a reasonable increase in guaranteed application performance of up to 2.7x. Our analysis contends that VExts are the DPA approach with arguably the fastest path for adoption in high-integrity systems.
- [1] . 2018. Safety-related challenges and opportunities for GPUs in the automotive domain. IEEE Micro 38, 6 (2018), 46–55.
DOI: Google ScholarCross Ref
- [2] . 2017. GPU scheduling on the NVIDIA TX2: Hidden details revealed. In 2017 IEEE Real-Time Systems Symposium (RTSS’17), Paris, France, December 5-8, 2017. IEEE Computer Society, New York, NY, 104–115.
DOI: Google ScholarCross Ref
- [3] . 2019. OpenVX and real-time certification: The troublesome history. In IEEE Real-Time Systems Symposium (RTSS’19), Hong Kong, SAR, China, December 3-6, 2019. IEEE, New York, NY, 312–325.
DOI: Google ScholarCross Ref
- [4] . 2020. Arm - Cortex-A57 Software Optimization Guide. Retrieved September 5, 2022 from https://developer.arm.com/documentation/uan0015/b/.Google Scholar
- [5] . 2020. Arm - Neon Intrinsics Reference. Retrieved September 5, 2022 from https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics.Google Scholar
- [6] . 2019. Apollo, an Open Autonomous Driving Platform. Retrieved September 5, 2022 from http://apollo.auto/.Google Scholar
- [7] . 1968. The ILLIAC IV computer. IEEE Transactions on Computers 17, 8 (1968), 746–757.
DOI: Google ScholarDigital Library
- [8] . 2013. Multicore-based vector coprocessor sharing for performance and energy gains. ACM Transactions on Embedded Computing Systems 13, 2, Article
17 (Sep. 2013), 25 pages.DOI: Google ScholarDigital Library
- [9] . 2021. Comparison of GPU computing methodologies for safety-critical systems: An avionics case study. In Design, Automation & Test in Europe Conference & Exhibition (DATE’21), Grenoble, France, February 1–5, 2021. IEEE, New York, NY, 717–718.
DOI: Google ScholarCross Ref
- [10] . 2012. GPUVerify: A verifier for GPU kernels. In Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’12), part of SPLASH’12, Tucson, AZ, October 21-25, 2012, and (Eds.). ACM, New York, NY, 113–132.
DOI: Google ScholarDigital Library
- [11] . 2020. GMAI: Understanding and exploiting the internals of GPU resource allocation in critical systems. ACM Transactions on Embedded Computing Systems 19, 5 (2020), 34:1–34:23.
DOI: Google ScholarDigital Library
- [12] . 2020. AI-4-GNC airbus DS perspectives. In 14th ESA Workshop on Avionics, Data, Control and Software Systems (ADCSS’20). European Space Agency (ESA), Paris, France, 1–12.Google Scholar
- [13] . 2019. Novel methodologies for predictable CPU-To-GPU command offloading. In 31st Euromicro Conference on Real-Time Systems (ECRTS’19), July 9–12, 2019, Stuttgart, Germany(
Leibniz International Proceedings in Informatics , Vol. 133), (Ed.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 22:1–22:22.DOI: Google ScholarCross Ref
- [14] Certification Authorities Software Team. 2016. CAST-32A Multi-core Processors. Certification Authorities Software Team. http://cast32a.com/files/cast-32a.pdf.Google Scholar
- [15] . 2019. RapiCover. Low-overhead Coverage Analysis for Critical Software. Retrieved September 5, 2022 from https://www.rapitasystems.com/products/rapicover.Google Scholar
- [16] . 2019. RapiTime. In-depth Execution Time Analysis for Critical Software. Retrieved September 5, 2022 from https://www.rapitasystems.com/products/rapitime.Google Scholar
- [17] . 2017. WCET-aware parallelization of model-based applications for multi-cores: The ARGO approach. In Design, Automation & Test in Europe Conference & Exhibition (DATE’17), Lausanne, Switzerland, March 27-31, 2017, IEEE, New York, NY, 286–289. Google Scholar
Cross Ref
- [18] . 2019. Non-intrusive online timing analysis of large embedded applications. In 19th International Workshop on Worst-Case Execution Time Analysis (WCET’19), July 9, 2019, Stuttgart, Germany(
OASICS , Vol. 72), (Ed.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 2:1–2:11.DOI: Google ScholarCross Ref
- [19] . 2006. Introduction to Radar Signal and Data Processing: The Opportunity.
Technical Report . Selex Sistemi Integrati, Rome, Italy. https://apps.dtic.mil/sti/pdfs/ADA472912.pdf.Google Scholar - [20] . 2016. QorIQ T2080 Reference Manual.
Also supports T2081. Doc. No.: T2080RM. Rev. 3, 11/2016 .Google Scholar - [21] . 2020. Automotive Radar Applications. Springer Singapore, Singapore, 123–142.
DOI: Google ScholarCross Ref
- [22] . 2011. A vector coprocessor architecture for embedded systems. In 2011 International SoC Design Conference. IEEE, New York, NY, 195–198.
DOI: Google ScholarCross Ref
- [23] . 2017. Predictable cache coherence for multi-core real-time systems. In 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’17), Pittsburgh, PA, April 18–21, 2017, (Ed.). IEEE Computer Society, New York, NY, 235–246.
DOI: Google ScholarCross Ref
- [24] . 2015. RTEMS SMP Final Report: Development Environment for Future Leon Multi-core.
Technical Report . European Space Agency (ESA), Paris, France. http://microelectronics.esa.int/gr740/RTEMS-SMPFinalReport-CGAislerASD-OAR.pdf.Google Scholar - [25] . 2017. Artificial intelligence and data science in the automotive industry. CoRR abs/1709.01989 (2017), 1–22.
arXiv:1709.01989 . http://arxiv.org/abs/1709.01989.Google Scholar - [26] . 2012. Testing concurrent programs to achieve high synchronization coverage. In International Symposium on Software Testing and Analysis (ISSTA’12), Minneapolis, MN, July 15–20, 2012, and (Eds.). ACM, New York, NY, 210–220.
DOI: Google ScholarDigital Library
- [27] International Organization for Standardization. 2009. ISO/DIS 26262. Road Vehicles–Functional Safety. International Organization for Standardization.Google Scholar
- [28] . 2018. The increasing risks of risk assessment: On the rise of artificial intelligence and non-determinism in safety-critical systems. In the 26th Safety-Critical Systems Symposium. Safety-Critical Systems Club York, UK, SCSC on Amazon/CreateSpace, York, UK, 15.Google Scholar
- [29] . 2019. CARP: A data communication mechanism for multi-core mixed-criticality systems. In IEEE Real-Time Systems Symposium (RTSS’19), Hong Kong, SAR, China, December 3–6, 2019. IEEE, New York, NY, 419–432.
DOI: Google ScholarCross Ref
- [30] . 2020. How Europe is preparing its core solution for exascale machines and a global, sovereign, advanced computing platform. Mathematical and Computational Applications 25, 3 (2020), 1–8.
DOI: Google ScholarCross Ref
- [31] . 2003. Overcoming the limitations of conventional vector processors. In 30th International Symposium on Computer Architecture (ISCA’03), June 9–11, 2003, San Diego, CA, and (Eds.). IEEE Computer Society, New York, NY, 399–409.
DOI: Google ScholarCross Ref
- [32] . 2015. Tracing flow information for tighter WCET estimation: Application to vectorization. In 21st IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA’15), Hong Kong, China, August 19–21, 2015. IEEE Computer Society, New York, NY, 217–226.
DOI: Google ScholarDigital Library
- [33] . 2004. A programmable Vector coprocessor architecture for wireless applications. In 3rd Workshop on Application Specific Processors. ACM, New York, NY, 103–110. https://cccp.eecs.umich.edu/papers/lin-wasp04.pdf.Google Scholar
- [34] . 2021. DO-178C certification of general-purpose GPU software: Review of existing methods and future directions. In IEEE/AIAA 40th Digital Avionics Systems Conference (DASC’21). IEEE, New York, NY, 1–9.
DOI: Google ScholarCross Ref
- [35] . 2020. Experiences on the characterization of parallel applications in embedded systems with Extrae/Paraver. In 49th International Conference on Parallel Processing (ICPP’20), Edmonton, AB, Canada, August 17–20, 2020, , , and (Eds.). ACM, New York, NY, 53:1–53:11.
DOI: Google ScholarDigital Library
- [36] . 2020. Towards a qualifiable OpenMP framework for embedded systems. In 2020 Design, Automation & Test in Europe Conference & Exhibition, DATE’20, Grenoble, France, March 9–13, 2020. IEEE, New York, NY, 903–908.
DOI: Google ScholarCross Ref
- [37] . 2021. EISPACK. Retrieved September 5, 2022 from http://www.netlib.org/eispack/.Google Scholar
- [38] . 2019. Extrae: An OpenMP-compatible performance monitoring tool for the GR740. In GR740 User Day (at ESTEC/ESA). European Space Agency (ESA), Paris, France, 1–20.Google Scholar
- [39] . 2016. NVIDIA - Jetson TX1 Module. Retrieved September 5, 2022 from https://developer.nvidia.com/embedded/jetson-tx1.Google Scholar
- [40] . 2017. NVIDIA - Jetson TX2 Module. Retrieved September 5, 2022 from https://developer.nvidia.com/embedded/jetson-tx2.Google Scholar
- [41] . 2018. Technical Reference Manual. Xavier Series SoC. DP-09253-002. Version 1.1.
Technical Report . NVIDIA.Google Scholar - [42] . 2021. NVIDIA DRIVE PX. Scalable Supercomputer for Autonomous Driving. Retrieved September 5, 2022 from http://www.nvidia.com/object/drive-px.html.Google Scholar
- [43] . 2020. Dissecting the CUDA scheduling hierarchy: A performance and predictability perspective. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’20), Sydney, Australia, April 21–24, 2020. IEEE, New York, NY, 213–225.
DOI: Google ScholarCross Ref
- [44] . 2020. AMD GPUs as an alternative to NVIDIA for supporting real-time workloads. In 32nd Euromicro Conference on Real-Time Systems (ECRTS’20), July 7–10, 2020, Virtual Conference(
Leibniz International Proceedings in Informatics , Vol. 165), (Ed.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 10:1–10:23.DOI: Google ScholarCross Ref
- [45] . 2013. Automatic WCET analysis of real-time parallel applications. In 13th International Workshop on Worst-Case Execution Time Analysis (WCET’13), July 9, 2013, Paris, France(
OASICS , Vol. 30), (Ed.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 11–20.DOI: Google ScholarCross Ref
- [46] . 2014. Minimizing the cost of synchronisations in the WCET of real-time parallel programs. In 17th International Workshop on Software and Compilers for Embedded Systems SCOPES’14, Sankt Goar, Germany, June 10–11, 2014, and (Eds.). ACM, New York, NY, 98–107.
DOI: Google ScholarDigital Library
- [47] . 2019. On the correctness of GPU programs. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’19), Beijing, China, July 15–19, 2019, and (Eds.). ACM, New York, NY, 443–447.
DOI: Google ScholarDigital Library
- [48] . 2020. Multi-core devices for safety-critical systems: A survey. ACM Computing Surveys 53, 4 (2020), 79:1–79:38.
DOI: Google ScholarDigital Library
- [49] . 2021. Vicuna: A timing-predictable RISC-V vector coprocessor for scalable parallel computation. In 33rd Euromicro Conference on Real-Time Systems (ECRTS’21)(
Leibniz International Proceedings in Informatics , Vol. 196), (Ed.). Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 1:1–1:18.DOI: Google ScholarCross Ref
- [50] . 2019. Generating and exploiting deep learning variants to increase heterogeneous resource utilization in the NVIDIA Xavier. In 31st Euromicro Conference on Real-Time Systems (ECRTS’19)(
Leibniz International Proceedings in Informatics , Vol. 133), (Ed.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 23:1–23:23.DOI: Google ScholarCross Ref
- [51] . 2018. Civil Certification of Multi-core Processing Systems in Commercial Avionics. White Paper. Retrieved September 6, 2022 from https://www.collinsaerospace.com/-/media/CA/product-assets/marketing/m/multicore-processing-systems/multi-core-certification-white-paper.pdf?rev=5649fddf5fa345dfab1fbf5a6193112d.Google Scholar
- [52] . 2018. YOLOv3: An incremental improvement. CoRR abs/1804.02767 (2018), 1–6.
arXiv:1804.02767 . http://arxiv.org/abs/1804.02767.Google Scholar - [53] . 2017. A functional safety OpenMP ^* for critical real-time embedded systems. In Proceedings of Scaling OpenMP for Exascale Performance and Portability — 13th International Workshop on OpenMP (IWOMP’17), Stony Brook, NY, September 20–22, 2017(
Lecture Notes in Computer Science , Vol. 10468), , , , , and (Eds.). Springer, New York, NY, 231–245.DOI: Google ScholarCross Ref
- [54] . 2011. DO-178C/ED-12C, Software Considerations in Airborne Systems and Equipment Certification. RTCA and EUROCAE.Google Scholar
- [55] . 2011. RTCA DO-330 — Software Tool Qualification Considerations. RTCA and EUROCAE.Google Scholar
- [56] . 2020. On how to identify cache coherence: Case of the NXP QorIQ T4240. In 32nd Euromicro Conference on Real-Time Systems (ECRTS’20), July 7–10, 2020, Virtual Conference(
Leibniz International Proceedings in Informatics , Vol. 165), (Ed.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 13:1–13:22.DOI: Google ScholarCross Ref
- [57] . 2019. Enabling predictable, simultaneous and coherent data sharing in mixed criticality systems. In IEEE Real-Time Systems Symposium, RTSS 2019, Hong Kong, SAR, China, December 3-6, 2019. IEEE, New York, NY, 433–445. Google Scholar
Cross Ref
- [58] . 2019. Enabling predictable, simultaneous and coherent data sharing in mixed criticality systems. In IEEE Real-Time Systems Symposium (RTSS’19), Hong Kong, SAR, China, December 3–6, 2019. IEEE, New York, NY, 433–445.
DOI: Google ScholarCross Ref
- [59] . 2017. The ARM scalable vector extension. IEEE Micro 37, 2 (2017), 26–39.
DOI: Google ScholarDigital Library
- [60] . 2020. Vector compliance testing for RISC-V. In RISC-V Global Forum. RISC-V International, Zurich, Switzerland, 1–35. Retrieved September 6, 2022 from https://riscvglobalforum2020.sched.com/event/dO3P/vector-compliance-testing-for-risc-v-hideki-sugimoto-koji-adachi-nsitexe-inc.Google Scholar
- [61] . 2019. The Basics of Automotive Radar. Retrieved September 6, 2022 from https://www.designworldonline.com/the-basics-of-automotive-radar/.Google Scholar
- [62] . 2018. Brook auto: High-level certification-friendly programming for GPU-powered automotive systems. In Proceedings of the 55th Annual Design Automation Conference (DAC’18), San Francisco, CA, June 24–29, 2018. ACM, New York, NY, 100:1–100:6.
DOI: Google ScholarDigital Library
- [63] . 2016. Parallelizing industrial hard real-time applications for the parMERASA multicore. ACM Trans. Embed. Comput. Syst. 15, 3 (2016), 53:1–53:27. Google Scholar
Digital Library
- [64] . 2021. Coffee with Vector: Code Coverage for CUDA Code using VectorCAST/QA. Retrieved September 6, 2022 from https://www.vector.com/es/es/eventos/global-de-en/webinar-recordings/2021/coffee-with-vector-code-coverage-for-cuda-code-using-vectorcastqa/.Google Scholar
- [65] . 2008. The worst-case execution-time problem —overview of methods and survey of tools. ACM Transactions on Embedded Computing Systems 7, 3 (2008), 36:1–36:53.
DOI: Google ScholarDigital Library
- [66] . 2010. Towards an error model for OpenMP. In Proceedings of Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More, 6th International Workshop on OpenMP (IWOMP’10), Tsukuba, Japan, June 14–16, 2010(
Lecture Notes in Computer Science , Vol. 6132), , , , , and (Eds.). Springer, New York, NY, 70–82.DOI: Google ScholarDigital Library
- [67] . 2019. Rockwell Collins Uses Zynq UltraScale+ RFSoC Devices in Revolutionizing How Arrays are Produced and Fielded: Powered by Xilinx. Retrieved September 6, 2022 from https://www.xilinx.com/video/corporate/rockwell-collins-rfsoc-revolutionizing-how-arrays-are-produced.html.Google Scholar
- [68] . 2018. Avoiding pitfalls when using NVIDIA GPUs for real-time tasks in autonomous systems. In 30th Euromicro Conference on Real-Time Systems, ECRTS’18, July 3–6, 2018, Barcelona, Spain(
Leibniz International Proceedings in Informatics , Vol. 106), (Ed.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 20:1–20:21.DOI: Google ScholarCross Ref
Index Terms
Vector Extensions in COTS Processors to Increase Guaranteed Performance in Real-Time Systems
Recommendations
Parallel Real-Time OLAP on Multi-Core Processors
One of the most powerful and prominent technologies for knowledge discovery in decision support systems is online analytical processing (OLAP). Most of the traditional OLAP research, and most of the commercial systems, follow the static data cube ...
Multi-core Desktop Processors Make Possible Real-Time Electron Tomography
PDP '11: Proceedings of the 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based ProcessingElectron tomography (ET) allows elucidation of the three-dimensional (3D) structure of large complex biological specimens at molecular resolution. In order to achieve such resolution levels, large projection images have to be used to compute the 3D ...
The ACROSS MPSoC -- A New Generation of Multi-core Processors Designed for Safety-Critical Embedded Systems
DSD '12: Proceedings of the 2012 15th Euromicro Conference on Digital System DesignThe European ARTEMIS ACROSS project aims to overcome the limitations of existing Multi-Processor System-on-a-Chip (MPSoC) architectures with respect to safety-critical applications. MPSoCs have a tremendous potential in the domain of embedded systems ...






Comments