skip to main content
research-article

Alleria: An Advanced Memory Access Profiling Framework

Published:08 October 2019Publication History
Skip Abstract Section

Abstract

Application analysis and simulation tools are used extensively by embedded system designers to improve existing optimization techniques or develop new ones. We propose the Alleria framework to make it easier for designers to comprehensively collect critical information such as virtual and physical memory addresses, accessed values, and thread schedules about one or more target applications. Such profilers often incur substantial performance overheads that are orders of magnitude larger than native execution time. We discuss how that overhead can be significantly reduced using a novel profiling mechanism called adaptive profiling. We develop a heuristic-based adaptive profiling mechanism and evaluate its performance using single-threaded and multi-threaded applications. The proposed technique can improve profiling throughput by up to 145% and by 37% on an average, enabling Alleria to be used to comprehensively profile applications with a throughput of over 3 million instructions per second.

References

  1. Moshe Bach et al. 2010. Analyzing parallel programs with pin. Computer 43, 3 (2010), 34--41.Google ScholarGoogle ScholarCross RefCross Ref
  2. Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New tools for interconnect exploration in innovative off-chip memories. ACM Trans. Archit. Code Optim. 14, 2, Article 14 (June 2017), 25 pages. DOI:https://doi.org/10.1145/3085572Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. David Beniamine and Guillaume Huard. 2016. Moca: An efficient memory trace collection system. (2016).Google ScholarGoogle Scholar
  4. Rishiraj A. Bheda, Jason A. Poovey, Jesse G. Beu, and Thomas M. Conte. 2011. Energy efficient phase change memory based main memory for future high performance systems. In IGCC. 1--8.Google ScholarGoogle Scholar
  5. Nathan Binkert et al. 2011. The Gem5 simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7.Google ScholarGoogle Scholar
  6. Anastasiia Butko, Rafael Garibotti, Luciano Ost, Vianney Lapotre, Abdoulaye Gamatie, Gilles Sassatelli, and Chris Adeniyi-Jones. 2015. A trace-driven approach for fast and accurate simulation of manycore architectures. In ASP-DAC. IEEE.Google ScholarGoogle Scholar
  7. Trever E. Carlson and Wim Heirman. 2013. The sniper user manual. (2013).Google ScholarGoogle Scholar
  8. Trevor E. Carlson, Wim Heirmant, and Lieven Eeckhout. 2011. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE, 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Xiangyu Dong, Cong Xu, Norm Jouppi, and Yuan Xie. 2014. NVSim: A circuit-level performance, energy, and area model for emerging non-volatile memory. In Emerging Memory Technologies. Springer, 15--50.Google ScholarGoogle Scholar
  10. DynamoRIO: Dynamic Instrumentation Tool Platform. 2017. Cache Simulator. http://dynamorio.org/docs/page_drcachesim.html.Google ScholarGoogle Scholar
  11. Jason E. Fritts, Frederick W. Steiling, Joseph A. Tucek, and Wayne Wolf. 2009. MediaBench II video: Expediting the next generation of video systems research. Microprocessors and Microsystems 33, 4 (2009), 301--318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cosmin Gorgovan, Amanieu d’Antras, and Mikel Luján. 2016. MAMBO: A low-overhead dynamic binary modification tool for ARM. ACM Transactions on Architecture and Code Optimization (TACO) 13, 1 (2016), 14.Google ScholarGoogle Scholar
  13. Ayman Hroub, M. E. S. Elrabaa, M. F. Mudawar, and A. Khayyat. 2017. Efficient generation of compact execution traces for multicore architectural simulations. ACM Trans. Archit. Code Optim. 14, 3, Article 27 (Aug. 2017), 25 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Aamer Jaleel, Robert S. Cohn, Chi-Keung Luk, and Bruce Jacob. 2008. CMP$im: A pin-based on-the-fly multi-core cache simulator. In MoBS. 28--36.Google ScholarGoogle Scholar
  15. Tomislav Janjusic and Krishna Kavi. 2013. Gleipnir: A memory profiling and tracing tool. SIGARCH Comput. Archit. News 41, 4 (Dec. 2013), 8--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mehmet Kayaalp, Dmitry Ponomarev, Nael Abu-Ghazaleh, and Aamer Jaleel. 2016. A high-resolution side-channel attack on last-level cache. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE. IEEE, 1--6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yoongu Kim, Weikun Yang, and Onur Mutlu. 2016. Ramulator: A fast and extensible DRAM simulator. Computer Architecture Letters 15, 1 (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Michael A. Laurenzano, Joshua Peraza, Laura Carrington, Ananta Tiwari, William A. Ward, and Roy Campbell. 2015. PEBIL: Binary instrumentation for practical data-intensive program analysis. Cluster Computing 18, 1 (2015), 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Chi-Keung Luk et al. 2005. Pin: Building customized program analysis tools with dynamic instrumentation (PLDI). 190--200.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jaydeep Marathe, Frank Mueller, Tushar Mohan, Sally A. Mckee, Bronis R. De Supinski, and Andy Yoo. 2007. METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies. ACM Trans. Program. Lang. Syst. 29, 2, Article 12 (April 2007).Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tipp Moseley, Dirk Grunwald, Daniel A. Connors, Ram Ramanujam, Vasanth Tovinkere, and Ramesh Peri. 2006. Loopprof: Dynamic techniques for loop detection and profiling. In Proceedings of the 2006 Workshop on Binary Instrumentation and Applications (WBIA).Google ScholarGoogle Scholar
  22. Onur Mutlu. 2013. Memory scaling: A systems architecture perspective. In Memory Workshop (IMW). IEEE, 21--25.Google ScholarGoogle ScholarCross RefCross Ref
  23. Lifeng Nai, Yinglong Xia, Ilie G. Tanase, Hyesoon Kim, and Ching-Yung Lin. 2015. GraphBIG: Understanding graph computing in the context of industrial solutions. In International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nicholas Nethercote. 2004. Dynamic Binary Analysis and Instrumentation. Technical Report. University of Cambridge, Computer Laboratory.Google ScholarGoogle Scholar
  25. Nicholas Nethercote and Julian Seward. 2007. Valgrind: A framework for heavyweight dynamic binary instrumentation. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). ACM, New York, NY, USA, 89--100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Johannes Passing, Alexander Schmidt, Martin von Lowis, and Andreas Polze. 2009. NTrace: Function boundary tracing for windows on IA-32. In 16th Working Conference on Reverse Engineering, 2009. (WCRE’09). IEEE, 43--52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Marc Rittinghaus, Thorsten Groeninger, and Frank Bellosa. 2015. Simutrace: A toolkit for full system memory tracing. (2015).Google ScholarGoogle Scholar
  28. Julian Seward and Nicholas Nethercote. 2005. Using valgrind to detect undefined value errors with bit-precision. In USENIX.Google ScholarGoogle Scholar
  29. The Standard Performance Evaluation Corporation (SPEC). 2006. SPEC CPU2006. http://www.spec.orgGoogle ScholarGoogle Scholar
  30. Yingying Tian, Samira M. Khan, Daniel A. Jiménez, and Gabriel H. Loh. 2014. Last-level cache deduplication. In Proceedings of ICS. ACM, 53--62.Google ScholarGoogle Scholar
  31. Xin Tong and Andreas Moshovos. 2015. QTrace: A framework for customizable full system instrumentation. In 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 245--255.Google ScholarGoogle ScholarCross RefCross Ref
  32. Eran Tromer, Dag Arne Osvik, and Adi Shamir. 2010. Efficient cache attacks on AES, and countermeasures. Journal of Cryptology 23, 1 (2010), 37--71.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Paul Tschirhart, Jim Stevens, Zeshan Chishti, Shih-Lien Lu, and Bruce Jacob. 2015. Bringing modern hierarchical memory systems into focus: A study of architecture and workload factors on system performance (MEMSYS’15). 12.Google ScholarGoogle Scholar
  34. Dan Upton, Kim Hazelwood, Robert Cohn, and Greg Lueck. 2009. Improving instrumentation speed via buffering. In Proceedings of the Workshop on Binary Instrumentation and Applications (WBIA’09). ACM, New York, NY, USA, 52--61.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight persistent memory (ASPLOS XVI). 91--104.Google ScholarGoogle Scholar
  36. Shasha Wen, Milind Chabbi, and Xu Liu. 2017. REDSPY: Exploring value locality in software (ASPLOS). 47--61.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Vinson Young, Prashant J. Nair, and Moinuddin K. Qureshi. 2017. DICE: Compressing DRAM caches for bandwidth and capacity (ISCA). 627--638.Google ScholarGoogle Scholar
  38. Weifeng Zhang, Brad Calder, and Dean M. Tullsen. 2005. An event-driven multithreaded dynamic optimization framework. In PACT. IEEE, 87--98.Google ScholarGoogle Scholar
  39. Qin Zhao, Ioana Cutcutache, and Weng-Fai Wong. 2010. PiPA: Pipelined profiling and analysis on multicore systems. TACO 7, 3, Article 13 (Dec. 2010), 29 pages.Google ScholarGoogle Scholar
  40. Qin Zhao, Joon Edward Sim, Weng-Fai Wong, and Larry Rudolph. 2006. DEP: Detailed execution profile. In Proceedings of PACT. ACM, New York, NY, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A durable and energy efficient main memory using phase change memory technology (ISCA’09). 10.Google ScholarGoogle Scholar

Index Terms

  1. Alleria: An Advanced Memory Access Profiling Framework

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!