Abstract
Application analysis and simulation tools are used extensively by embedded system designers to improve existing optimization techniques or develop new ones. We propose the Alleria framework to make it easier for designers to comprehensively collect critical information such as virtual and physical memory addresses, accessed values, and thread schedules about one or more target applications. Such profilers often incur substantial performance overheads that are orders of magnitude larger than native execution time. We discuss how that overhead can be significantly reduced using a novel profiling mechanism called adaptive profiling. We develop a heuristic-based adaptive profiling mechanism and evaluate its performance using single-threaded and multi-threaded applications. The proposed technique can improve profiling throughput by up to 145% and by 37% on an average, enabling Alleria to be used to comprehensively profile applications with a throughput of over 3 million instructions per second.
- Moshe Bach et al. 2010. Analyzing parallel programs with pin. Computer 43, 3 (2010), 34--41.Google Scholar
Cross Ref
- Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New tools for interconnect exploration in innovative off-chip memories. ACM Trans. Archit. Code Optim. 14, 2, Article 14 (June 2017), 25 pages. DOI:https://doi.org/10.1145/3085572Google Scholar
Digital Library
- David Beniamine and Guillaume Huard. 2016. Moca: An efficient memory trace collection system. (2016).Google Scholar
- Rishiraj A. Bheda, Jason A. Poovey, Jesse G. Beu, and Thomas M. Conte. 2011. Energy efficient phase change memory based main memory for future high performance systems. In IGCC. 1--8.Google Scholar
- Nathan Binkert et al. 2011. The Gem5 simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7.Google Scholar
- Anastasiia Butko, Rafael Garibotti, Luciano Ost, Vianney Lapotre, Abdoulaye Gamatie, Gilles Sassatelli, and Chris Adeniyi-Jones. 2015. A trace-driven approach for fast and accurate simulation of manycore architectures. In ASP-DAC. IEEE.Google Scholar
- Trever E. Carlson and Wim Heirman. 2013. The sniper user manual. (2013).Google Scholar
- Trevor E. Carlson, Wim Heirmant, and Lieven Eeckhout. 2011. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE, 1--12.Google Scholar
Digital Library
- Xiangyu Dong, Cong Xu, Norm Jouppi, and Yuan Xie. 2014. NVSim: A circuit-level performance, energy, and area model for emerging non-volatile memory. In Emerging Memory Technologies. Springer, 15--50.Google Scholar
- DynamoRIO: Dynamic Instrumentation Tool Platform. 2017. Cache Simulator. http://dynamorio.org/docs/page_drcachesim.html.Google Scholar
- Jason E. Fritts, Frederick W. Steiling, Joseph A. Tucek, and Wayne Wolf. 2009. MediaBench II video: Expediting the next generation of video systems research. Microprocessors and Microsystems 33, 4 (2009), 301--318.Google Scholar
Digital Library
- Cosmin Gorgovan, Amanieu d’Antras, and Mikel Luján. 2016. MAMBO: A low-overhead dynamic binary modification tool for ARM. ACM Transactions on Architecture and Code Optimization (TACO) 13, 1 (2016), 14.Google Scholar
- Ayman Hroub, M. E. S. Elrabaa, M. F. Mudawar, and A. Khayyat. 2017. Efficient generation of compact execution traces for multicore architectural simulations. ACM Trans. Archit. Code Optim. 14, 3, Article 27 (Aug. 2017), 25 pages.Google Scholar
Digital Library
- Aamer Jaleel, Robert S. Cohn, Chi-Keung Luk, and Bruce Jacob. 2008. CMP$im: A pin-based on-the-fly multi-core cache simulator. In MoBS. 28--36.Google Scholar
- Tomislav Janjusic and Krishna Kavi. 2013. Gleipnir: A memory profiling and tracing tool. SIGARCH Comput. Archit. News 41, 4 (Dec. 2013), 8--12.Google Scholar
Digital Library
- Mehmet Kayaalp, Dmitry Ponomarev, Nael Abu-Ghazaleh, and Aamer Jaleel. 2016. A high-resolution side-channel attack on last-level cache. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE. IEEE, 1--6.Google Scholar
Digital Library
- Yoongu Kim, Weikun Yang, and Onur Mutlu. 2016. Ramulator: A fast and extensible DRAM simulator. Computer Architecture Letters 15, 1 (2016).Google Scholar
Digital Library
- Michael A. Laurenzano, Joshua Peraza, Laura Carrington, Ananta Tiwari, William A. Ward, and Roy Campbell. 2015. PEBIL: Binary instrumentation for practical data-intensive program analysis. Cluster Computing 18, 1 (2015), 1--14.Google Scholar
Digital Library
- Chi-Keung Luk et al. 2005. Pin: Building customized program analysis tools with dynamic instrumentation (PLDI). 190--200.Google Scholar
Digital Library
- Jaydeep Marathe, Frank Mueller, Tushar Mohan, Sally A. Mckee, Bronis R. De Supinski, and Andy Yoo. 2007. METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies. ACM Trans. Program. Lang. Syst. 29, 2, Article 12 (April 2007).Google Scholar
Digital Library
- Tipp Moseley, Dirk Grunwald, Daniel A. Connors, Ram Ramanujam, Vasanth Tovinkere, and Ramesh Peri. 2006. Loopprof: Dynamic techniques for loop detection and profiling. In Proceedings of the 2006 Workshop on Binary Instrumentation and Applications (WBIA).Google Scholar
- Onur Mutlu. 2013. Memory scaling: A systems architecture perspective. In Memory Workshop (IMW). IEEE, 21--25.Google Scholar
Cross Ref
- Lifeng Nai, Yinglong Xia, Ilie G. Tanase, Hyesoon Kim, and Ching-Yung Lin. 2015. GraphBIG: Understanding graph computing in the context of industrial solutions. In International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.Google Scholar
Digital Library
- Nicholas Nethercote. 2004. Dynamic Binary Analysis and Instrumentation. Technical Report. University of Cambridge, Computer Laboratory.Google Scholar
- Nicholas Nethercote and Julian Seward. 2007. Valgrind: A framework for heavyweight dynamic binary instrumentation. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). ACM, New York, NY, USA, 89--100.Google Scholar
Digital Library
- Johannes Passing, Alexander Schmidt, Martin von Lowis, and Andreas Polze. 2009. NTrace: Function boundary tracing for windows on IA-32. In 16th Working Conference on Reverse Engineering, 2009. (WCRE’09). IEEE, 43--52.Google Scholar
Digital Library
- Marc Rittinghaus, Thorsten Groeninger, and Frank Bellosa. 2015. Simutrace: A toolkit for full system memory tracing. (2015).Google Scholar
- Julian Seward and Nicholas Nethercote. 2005. Using valgrind to detect undefined value errors with bit-precision. In USENIX.Google Scholar
- The Standard Performance Evaluation Corporation (SPEC). 2006. SPEC CPU2006. http://www.spec.orgGoogle Scholar
- Yingying Tian, Samira M. Khan, Daniel A. Jiménez, and Gabriel H. Loh. 2014. Last-level cache deduplication. In Proceedings of ICS. ACM, 53--62.Google Scholar
- Xin Tong and Andreas Moshovos. 2015. QTrace: A framework for customizable full system instrumentation. In 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 245--255.Google Scholar
Cross Ref
- Eran Tromer, Dag Arne Osvik, and Adi Shamir. 2010. Efficient cache attacks on AES, and countermeasures. Journal of Cryptology 23, 1 (2010), 37--71.Google Scholar
Digital Library
- Paul Tschirhart, Jim Stevens, Zeshan Chishti, Shih-Lien Lu, and Bruce Jacob. 2015. Bringing modern hierarchical memory systems into focus: A study of architecture and workload factors on system performance (MEMSYS’15). 12.Google Scholar
- Dan Upton, Kim Hazelwood, Robert Cohn, and Greg Lueck. 2009. Improving instrumentation speed via buffering. In Proceedings of the Workshop on Binary Instrumentation and Applications (WBIA’09). ACM, New York, NY, USA, 52--61.Google Scholar
Digital Library
- Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight persistent memory (ASPLOS XVI). 91--104.Google Scholar
- Shasha Wen, Milind Chabbi, and Xu Liu. 2017. REDSPY: Exploring value locality in software (ASPLOS). 47--61.Google Scholar
Digital Library
- Vinson Young, Prashant J. Nair, and Moinuddin K. Qureshi. 2017. DICE: Compressing DRAM caches for bandwidth and capacity (ISCA). 627--638.Google Scholar
- Weifeng Zhang, Brad Calder, and Dean M. Tullsen. 2005. An event-driven multithreaded dynamic optimization framework. In PACT. IEEE, 87--98.Google Scholar
- Qin Zhao, Ioana Cutcutache, and Weng-Fai Wong. 2010. PiPA: Pipelined profiling and analysis on multicore systems. TACO 7, 3, Article 13 (Dec. 2010), 29 pages.Google Scholar
- Qin Zhao, Joon Edward Sim, Weng-Fai Wong, and Larry Rudolph. 2006. DEP: Detailed execution profile. In Proceedings of PACT. ACM, New York, NY, USA.Google Scholar
Digital Library
- Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A durable and energy efficient main memory using phase change memory technology (ISCA’09). 10.Google Scholar
Index Terms
Alleria: An Advanced Memory Access Profiling Framework
Recommendations
LiwePMS: A Lightweight Persistent Memory with Wear-aware Memory Management
Next-generation Storage Class Memory (SCM) offers low-latency, high-density, byte-addressable access and persistency. The potent combination of these attractive characteristics makes it possible for SCM to unify the main memory and storage to reduce the ...
System evaluation of the Intel optane byte-addressable NVM
MEMSYS '19: Proceedings of the International Symposium on Memory SystemsByte-addressable non-volatile memory (NVM) features high density, DRAM comparable performance, and persistence. These characteristics position NVM as a promising new tier in the memory hierarchy. Nevertheless, NVM has asymmetric read and write ...
Exploiting Phase-Change Memory in Cooperative Caches
SBAC-PAD '12: Proceedings of the 2012 IEEE 24th International Symposium on Computer Architecture and High Performance ComputingModern servers require large main memories, which so far have been enabled by improvements in DRAM density. However, the scalability of DRAM is approaching its limit, so Phase-Change Memory (PCM) is being considered as an alternative technology. PCM is ...






Comments