Abstract
In recent years, on-chip trace generation has been recognized as a solution to the debugging of increasingly complex software. An execution trace can be seen as the most fundamentally useful type of trace, allowing the execution path of software to be determined post hoc. However, the bandwidth required to output such a trace can be excessive. Our architecture-aware trace compression (AATC) scheme adds an on-chip branch predictor and branch target buffer to reduce the volume of execution trace data in real time through on-chip compression. Novel redundancy reduction strategies are employed, most notably in exploiting the widespread use of linked branches and the compiler-driven movement of return addresses between link register, stack, and program counter. In doing so, the volume of branch target addresses is reduced by 52%, whereas other algorithmic improvements further decrease trace volume. An analysis of spatial and temporal redundancy in the trace stream allows a comparison of encoding strategies to be made for systematically increasing compression performance. A combination of differential, Fibonacci, VarLen, and Move-to-Front encodings are chosen to produce two compressor variants: a performance-focused xAATC that encodes 56.5 instructions/bit using 24,133 gates and an area-efficient fAATC that encodes 48.1 instructions/bit using only 9,854 gates.
- ARM. 2011a. Embedded Trace Macrocell Architecture Specification. http://infocenter.arm.com/help/topic/com.arm.doc.ihi0014q/IHI0014Q_etm_architecture_spec.pdf.Google Scholar
- ARM. 2011b. RealView Debugger User Guide—Version 4.1.2. http://infocenter.arm.com/help/index.jsp? topic=/com.arm.doc.dui0153n/Babdjcjf.html.Google Scholar
- Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In Proceedings of the USENIX Annual Technical Conference. 41--46. Google Scholar
Digital Library
- Martin Burtscher, Ilya Ganusov, Sandra J. Jackson, Jian Ke, Paruj Ratanaworabhan, and Nana B. Sam. 2005. The VPC trace-compression algorithms. IEEE Transactions on Computers 54, 11, 1329--1344. Google Scholar
Digital Library
- Eric S. Chung and James C. Hoe. 2010. High-level design and validation of the BlueSPARC multithreaded processor. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 29, 10, 1459--1470. Google Scholar
Digital Library
- GNU. 2015a. GCC, the GNU Compiler Collection. Retrieved August 14, 2015, from http://gcc.gnu.org.Google Scholar
- GNU. 2015b. GDB: The GNU Project Debugger. Retrieved April 14, 2015, from http://www.gnu.org/software/gdb.Google Scholar
- Patrice Godefroid and Nachiappan Nagappan. 2008. Concurrency at Microsoft: An exploratory survey. In Proceedings of the Workshop on Exploiting Concurrency Efficiently and Correctly.Google Scholar
- Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, and Richard B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization. 3--14. Google Scholar
Digital Library
- Brent Hailpern and Peter Santhanam. 2002. Software debugging, testing, and verification. IBM Systems Journal 41, 1, 4--12. Google Scholar
Digital Library
- Andrew B. T. Hopkins and Klaus D. McDonald-Maier. 2006a. Debug support for complex systems on-chip: A review. IEE Proceedings on Computers and Digital Techniques 153, 4, 197--207.Google Scholar
Cross Ref
- Andrew B. T. Hopkins and Klaus D. McDonald-Maier. 2006b. Debug support strategy for systems-on-chips with multiple processor cores. IEEE Transactions on Computers 55, 2, 174--184. Google Scholar
Digital Library
- IEEE-ISTO 5001. 2012. The Nexus 5001 Forum Standard for a Global Embedded Processor Debug Interface. Available at http://nexus5001.org.Google Scholar
- Yuan-Long Jeang, Ching-Ta Chen, and Chih-Chung Tai. 2006. A new and efficient real-time address tracer for embedded microprocessors. In Proceedings of the International Conference on Innovative Computing, Information, and Control. 14--17. Google Scholar
Digital Library
- Daniel A. Jimenez. 2003. Fast path-based neural branch prediction. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 243--252. Google Scholar
Digital Library
- Chung-Fu Kao, Shyh-Ming Huang, and Ing-Jer Huang. 2007. A hardware approach to real-time program trace compression for embedded processors. IEEE Transactions on Circuits and Systems I: Regular Papers 54, 3, 530--543.Google Scholar
Cross Ref
- Bojan Mihajlovic, Warren J. Gross, and Zeljko Zilic. 2013. Software debugging infrastructure for multi-core systems-on-chip. In Multicore Technology: Architecture, Reconfiguration, and Modeling, M. Y. Qadri and S. Sangwine (Eds.). CRC Press, 257--282.Google Scholar
- Bojan Mihajlovic and Zeljko Zilic. 2011. Real-time address trace compression for emulated and real system-on-chip processor core debugging. In Proceedings of the ACM Great Lakes Symposium on VLSI. 331--336. Google Scholar
Digital Library
- Bojan Mihajlovic, Zeljko Zilic, and Warren J. Gross. 2014. Dynamically instrumenting the QEMU emulator for Linux process trace generation with the GDB debugger. ACM Transactions on Embedded Computing Systems 13, 5s, 167:1--167:18. Google Scholar
Digital Library
- Aleksandar Milenkovic, Vladimir Uzelac, Milena Milenkovic, and Martin Burtscher. 2011. Caches and predictors for real-time, unobtrusive, and cost-effective program tracing in embedded systems. IEEE Transactions on Computers 60, 7, 992--1005. Google Scholar
Digital Library
- Milena Milenkovic and Martin Burtscher. 2007. Algorithms and hardware structures for unobtrusive real-time compression of instruction and data address traces. In Proceedings of the Data Compression Conference. 283--292. Google Scholar
Digital Library
- William Orme. 2008. Debug and Trace for Multicore SoCs. White Paper. ARM Corp. Available at http://www.arm.com/files/pdf/CoresightWhitepaper.pdf.Google Scholar
- Bernhard Plattner. 1984. Real-time execution monitoring. IEEE Transactions on Software Engineering SE-10, 6, 756--764. Google Scholar
Digital Library
- Vladimir Uzelac and Aleksandar Milenkovic. 2009. A real-time program trace compressor utilizing double Move-to-Front method. In Proceedings of the ACM/IEEE Design Automation Conference. 738--743. Google Scholar
Digital Library
- Vladimir Uzelac, Aleksandar Milenkovic, Martin Burtscher, and Milena Milenkovic. 2010. Real-time unobtrusive program execution trace compression using branch predictor events. In Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems. 97--106. Google Scholar
Digital Library
- Vladimir Uzelac, Aleksandar Milenkovic, Milena Milenkovic, and Martin Burtscher. 2014. Using branch predictors and variable encoding for on-the-fly program tracing. IEEE Transactions on Computers 63, 4, 1008--1020. Google Scholar
Digital Library
- Fu-Ching Yang, Cheng-Lung Chiang, and Ing-Jer Huang. 2010. A reverse-encoding-based on-chip bus tracer for efficient circular-buffer utilization. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 18, 5, 732--741. Google Scholar
Digital Library
- Jacob Ziv and Abraham Lempel. 1977. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23, 3, 337--343. Google Scholar
Digital Library
Index Terms
Architecture-Aware Real-Time Compression of Execution Traces
Recommendations
An efficient single-pass trace compression technique utilizing instruction streams
Trace-driven simulations have been widely used in computer architecture for quantitative evaluations of new ideas and design prototypes. Efficient trace compression and fast decompression are crucial for contemporary workloads, as representative ...
Block-aware instruction set architecture
Instruction delivery is a critical component for wide-issue, high-frequency processors since its bandwidth and accuracy place an upper limit on performance. The processor front-end accuracy and bandwidth are limited by instruction-cache misses, ...
Lossless Trace Compression
The tremendous storage space required for a useful data base of program traces has prompted a search for trace reduction techniques. In this paper, we discuss a range of information-lossless address and instruction trace compression schemes that can ...






Comments