Abstract
Software debugging is one of the most challenging aspects of embedded system development due to growing hardware and software complexity, limited visibility of system components, and tightening time-to-market. To find software bugs faster, developers often rely on on-chip trace modules with large buffers to capture program execution traces with minimum interference with program execution. However, the high volumes of trace data and the high cost of trace modules limit the visibility into the system operation to short program segments. This article introduces a new hardware/software technique for capturing and filtering read data value traces in multicores that enables a complete reconstruction of parallel program execution. The proposed technique exploits tracking of data reads in data caches and cache coherence protocol states to minimize the number of trace messages streamed out of the target platform to the software debugger. The effectiveness of the proposed technique is determined by analyzing the required trace port bandwidth and trace buffer sizes as a function of the data cache size and the number of processor cores. The results show that the proposed technique significantly reduces the required trace port bandwidth, from 12.2 to 73.9 times, when compared to the Nexus-like read data value tracing, thus enabling continuous on-the-fly data tracing at modest hardware cost.
- Arm. 2018. Arm Embedded Trace Macrocell Architecture Specification ETMv4.0 to ETMv4.4. Retrieved June 7, 2018 from https://static.docs.arm.com/ihi0064/f/etm_v4_4_architecture_specification_IHI0064F.pdf.Google Scholar
- Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. 72.Google Scholar
Digital Library
- Mike Burrows and David J. Wheeler. 1994. A Block-sorting Lossless Data Compression Algorithm. Digital SRC. Retrieved from https://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.pdf.Google Scholar
- James Campbell, Valeriy Kazantsev, and Hugh O'Keeffe. 2017. Real-Time Trace: A Better Way to Debug Embedded Applications. Ashling Microsystems. Retrieved July 12, 2017 from http://www.ashling.com/wp-content/uploads/Real-time_trace_a_better_way_to_debug_embedded_applications.pdf.Google Scholar
- Yunji Chen, Weiwu Hu, Tianshi Chen, and Ruiyang Wu. 2010. LReplay: A pending period based deterministic replay scheme. In Proceedings of the 37th Annual International Symposium on Computer Architecture. 187--197. Google Scholar
Digital Library
- John L. Hennessy and David A. Patterson. 2012. Computer Architecture: A Quantitative Approach (5th ed.). Morgan Kaufmann/Elsevier, Waltham MA. Google Scholar
Digital Library
- Christian Hochberger and Alexander Weiss. 2008. Acquiring an exhaustive, continuous and real-time trace from SoCs. In Proceedings of the IEEE International Conference on Computer Design 2008 (ICCD’08). 356--362.Google Scholar
Cross Ref
- Andrew B. T. Hopkins and Klaus D. McDonald-Maier. 2006. Debug support strategy for systems-on-chips with multiple processor cores. IEEE Trans. Comput. 55, 2 (2006), 174--184. Google Scholar
Digital Library
- IEEE-ISTO. 2012. The Nexus 5001 Forum Standard for a Global Embedded Processor Debug Interface V 3.01. Retrieved November 28, 2015 from http://www.nexus5001.org/standard.Google Scholar
- Intel. 2016. Intel 64 and IA-32 Architectures Developer's Manual: Vol. 3C. Retrieved July 11, 2017 from https://goo.gl/QLKR85.Google Scholar
- Intel. 2018. Nios II Processor Reference Guide. Intel. Retrieved June 7, 2018 from https://goo.gl/Ghp8xk.Google Scholar
- Kai-uwe Irrgang and Thomas B. Preußer. 2015. An LZ77-style bit-level compression for trace data compaction. In Proceedings of the 2015 25th International Conference on Field Programmable Logic and Applications (FPL’15). 1--4.Google Scholar
- Chung-Fu Kao, Shyh-Ming Huang, and Ing-Jer Huang. 2007. A Hardware Approach to Real-Time Program Trace Compression for Embedded Processors. IEEE Trans. Circ Syst. 54, 3 (2007), 530--543.Google Scholar
Cross Ref
- Georgios Kornaros and Dionisios Pnevmatikatos. 2013. A survey and taxonomy of on-chip monitoring of multicore systems-on-chip. ACM Trans. Autom. Electron. Syst. 18, 2 (2013), 17:1--17:38. Google Scholar
Digital Library
- Felix Martin and Michael Deubzer. 2017. Hardware Tracing of Embedded Multi-Core Real-Time Systems. SAE International, Warrendale, PA.Google Scholar
- Albrecht Mayer, Harry Siebert, and Klaus D. McDonald-Maier. 2007. Boosting debugging support for complex systems on chip. Computer 40, 4 (2007), 76--81. Google Scholar
Digital Library
- Bojan Mihajlović, Željko Žilić, and Warren J. Gross. 2015. Architecture-aware real-time compression of execution traces. ACM Trans. Embed. Comput. Syst. 14, 4 (2015), 75:1--75:24. Google Scholar
Digital Library
- Aleksandar Milenković, Vladimir Uzelac, Milena Milenković, and Burtscher Burtscher. 2011. Caches and predictors for real-time, unobtrusive, and cost-effective program tracing in embedded systems. IEEE Trans. Comput. 60, 7 (2011), 992--1005. Google Scholar
Digital Library
- MIPS Technologies. 2012. MIPS PDtrace Specification. MIPS. Retrieved April 1, 2016 from http://www.t-es-t.hu/download/mips/md00439g.pdf.Google Scholar
- Pablo Montesinos, Luis Ceze, and Josep Torrellas. 2008. Delorean: recording and deterministically replaying shared-memory multiprocessor execution efficiently. In Proceedings of the 35th International Symposium on Computer Architecture, 289--300. Google Scholar
Digital Library
- Satish Narayanasamy, Gilles Pokam, and Brad Calder. 2005. BugNet: Continuously recording program execution for deterministic replay debugging. In Proceedings of the 32nd International Symposium on Computer Architecture (ISCA’05). 284--295. Google Scholar
Digital Library
- William Orme. 2008. Debug and Trace for Multicore SoCs. Retrieved March 28, 2016 from https://www.arm.com/files/pdf/CoresightWhitepaper.pdf.Google Scholar
- Mounika Ponugoti and Aleksandar Milenković. 2016. Exploiting cache coherence for effective on-the-fly data tracing in multicores. In Proceedings of the 2016 IEEE 34th International Conference on Computer Design (ICCD’16). 312--319.Google Scholar
Cross Ref
- Mounika Ponugoti, Amrish K. Tewar, and Aleksandar Milenkovic. 2016. On-the-fly load data value tracing in multicores. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’16).Google Scholar
Digital Library
- Suchakrapani Datt Sharma and Michel Dagenais. 2016. Hardware-assisted instruction profiling and latency detection. J. Eng. 2016, 10 (2016), 367--376.Google Scholar
Cross Ref
- Neal Stollon and R. Collins. 2006. Nexus based multi-core debug. In Proceedings of the Design Conference International Engineering Consortium. 805--822. Retrieved March 28, 2016 from http://nexus5001.org/wp-content/uploads/2015/02/DesignCon_2006_Nexus_FS2_Freescale.pdf.Google Scholar
- Gregory Tassey. 2002. The Economic Impacts of Inadequate Infrastructure for Software Testing. Retrieved from http://www.rti.org/pubs/software_testing.pdf.Google Scholar
- Amrish Tewar, Albert Myers, and Aleksandar Milenković. 2015. mcfTRaptor: Toward unobtrusive on-the-fly control-flow tracing in multicores. J. Syst. Archit. 61, 10 (2015), 601--614. Google Scholar
Digital Library
- Henrik Thane and Hans Hansson. 2000. Using deterministic replay for debugging of distributed real-time systems. In Proceedings of the 12th Euromicro Conference on Real-time Systems (Euromicro-RTS’00). 265--272. Google Scholar
Digital Library
- Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, and David Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. 335. Google Scholar
Digital Library
- Vladimir Uzelac and Aleksandar Milenkovic. 2009. A real-time program trace compressor utilizing double move-to-front method. In Proceedings of the Design Automation Conference. 738--743. Google Scholar
Digital Library
- Vladimir Uzelac and Aleksandar Milenkovic. 2013. Hardware-based load value trace filtering for on-the-fly debugging. Trans. Embed. Comput. Syst. 12, 2s (2013), 1--18.Google Scholar
Digital Library
- Vladimir Uzelac, Aleksandar Milenković, Milena Milenković, and Martin Burtscher. 2014. Using branch predictors and variable encoding for on-the-fly program tracing. IEEE Trans. Comput. 63, 4 (2014), 1008--1020. Google Scholar
Digital Library
- Michael Williams. 2012. ARMV8 debug and trace architectures. In Proceedings of the 2012 System, Software, SoC and Silicon Debug Conference. 1--6.Google Scholar
- Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. 24--36.Google Scholar
Digital Library
- Min Xu, Rastislav Bodik, and Mark D. Hill. 2003. A “flight data recorder” for enabling full-system multiprocessor deterministic replay. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA’03). 122--135.Google Scholar
- Min Xu, Mark D. Hill, and Rastislav Bodik. 2006. A regulated transitive reduction (RTR) for longer memory race recording. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. 49--60. Google Scholar
Digital Library
- Jacob Ziv and Abraham Lempel. 2006. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23, 3 (2006), 337--343. Google Scholar
Digital Library
- 2005. Freescale—MPC565 Reference Manual. Retrieved from https://www.nxp.com/docs/en/data-sheet/MPC565RM.pdf.Google Scholar
- International Technology Roadmap for Semiconductors 2007 Edition. Retrieved April 8, 2016 from https://goo.gl/TdZY52.Google Scholar
- University of Cambridge Reverse Debugging Study. Retrieved December 17, 2017 from https://goo.gl/4asWCW.Google Scholar
Index Terms
Enabling On-the-Fly Hardware Tracing of Data Reads in Multicores
Recommendations
On-the-fly load data value tracing in multicores
CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded SystemsSoftware testing and debugging of modern multicore-based embedded systems is a challenging proposition because of growing hardware and software complexity, increased integration, and tightening time-to-market. To find more bugs faster, software ...
Hardware-based data value and address trace filtering techniques
CASES '10: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systemsCapturing program and data traces during program execution unobtrusively in real-time is crucial in debugging and testing of cyber-physical systems. However, tracing a complete program unobtrusively is often cost-prohibitive, requiring large on-chip ...
mcfTRaptor
Software testing and debugging has become the most critical aspect of the development of modern embedded systems, mainly driven by growing software and hardware complexity, increasing integration, and tightening time-to-market deadlines. Software ...






Comments