Abstract
Cache memories are an essential component of modern processors and consume a large percentage of their power consumption. Its efficacy depends heavily on the memory demands of the software. Thus, finding the optimal cache for a particular program is not a trivial task and usually involves exhaustive simulation. In this article, we propose a machine learning–based methodology that predicts the optimal cache reconfiguration for any given application, based on its dynamic instructions. Our evaluation shows that our methodology reaches 91.1% accuracy. Moreover, an additional experiment shows that only a small portion of the dynamic instructions (10%) suffices to reach 89.71% accuracy.
- David H. Albonesi. 1999. Selective cache ways: On-demand cache resource allocation. In Proceedings of the 32nd Annual International Symposium on Microarchitecture (MICRO-32’99). IEEE, 248--259.Google Scholar
Cross Ref
- ANDANDTECH. 2017. Intel Launches 8th Generation Core CPUs, Starting with Kaby Lake Refresh for 15W Mobile. Retrieved August 23, 2017 from http://www.anandtech.com/show/11738/intel-launches-8th-generation-cpus-starting-with-kaby-lake-refresh-for-15w-mobile.Google Scholar
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Comput. Arch. News 39, 2 (2011), 1--7.Google Scholar
Digital Library
- Garo Bournoutian and Alex Orailoglu. 2013. Application-aware adaptive cache architecture for power-sensitive mobile processors. ACM Trans. Embed. Comput. Syst. 13, 3 (2013), 41.Google Scholar
Digital Library
- Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16 (2002), 321--357.Google Scholar
Digital Library
- Nitesh V. Chawla, Nathalie Japkowicz, and Aleksander Kotcz. 2004. Special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 1--6.Google Scholar
Digital Library
- Christophe Dubach, Timothy M. Jones, and Edwin V. Bonilla. 2013. Dynamic microarchitectural adaptation using machine learning. ACM Trans. Arch. Code Optimiz. 10, 4 (2013), 31.Google Scholar
- Praveen Elakkumanan, Lushan Liu, V. Kumar Vankadara, and Ramalingam Sridhar. 2005. CHIDDAM: A data mining based technique for cache hierarchy determination in commercial applications. In Proceedings of the 48th Midwest Symposium on Circuits and Systems. IEEE, 1888--1891.Google Scholar
Cross Ref
- Faustino J. Gomez, Doug Burger, and Risto Miikkulainen. 2001. A neuro-evolution method for dynamic resource allocation on a chip multiprocessor. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’01), Vol. 4. IEEE, 2355--2360.Google Scholar
Cross Ref
- Ann Gordon-Ross, Frank Vahid, and Nikil D. Dutt. 2009. Fast configurable-cache tuning with a unified second-level cache. IEEE Trans. VLSI Syst. 17, 1 (2009), 80--91.Google Scholar
Digital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. Proceedings of the 4th Annual IEEE International Workshop on Workload Characterization.3--14. DOI:https://doi.org/10.1109/WWC.2001.990739Google Scholar
- Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: An update. ACM SIGKDD Explor. Newslett. 11, 1 (2009), 10--18.Google Scholar
Digital Library
- Tin Kam Ho. 1995. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Vol. 1. IEEE, 278--282.Google Scholar
Digital Library
- Intel. 2017. Intel Pentium III Xeon Processor 667 MHz, 256K Cache, 133 MHz FSB. Retrieved August 24, 2017 from http://ark.intel.com/products/27566/Intel-Pentium-III-Xeon-Processor-667-MHz-256K-Cache-133-MHz-FSB.Google Scholar
- Engin Ipek, Sally A. McKee, Karan Singh, Rich Caruana, Bronis R. de Supinski, and Martin Schulz. 2008. Efficient architectural design space exploration via predictive modeling. ACM Trans. Arch. Code Optimiz. 4, 4 (2008), 1.Google Scholar
Digital Library
- Daniel A. Jiménez. 2003. Fast path-based neural branch prediction. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 243.Google Scholar
Digital Library
- Daniel A. Jiménez and Calvin Lin. 2002. Neural methods for dynamic branch prediction. ACM Trans. Comput. Syst. 20, 4 (2002), 369--397.Google Scholar
Digital Library
- Songchok Khakhaeng and Chantana Chantrapornchai. 2016. On the finding proper cache prediction model using neural network. In Proceedings of the 2016 8th International Conference on Knowledge and Smart Technology (KST’16). IEEE, 146--151.Google Scholar
Cross Ref
- Hugh Leather, Edwin Bonilla, and Michael O’Boyle. 2009. Automatic feature generation for machine learning based optimizing compilation. In Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Computer Society, 81--91.Google Scholar
Digital Library
- Jung-Hoon Lee, Shin-Dug Kim, and Charles Weems. 2002. Application-adaptive intelligent cache memory system. ACM Trans. Embed. Comput. Syst. 1, 1 (2002), 56--78.Google Scholar
Digital Library
- Tim Leiding. 2015. Adaptive Cache for Soft Real-Time Systems with no Reliance on Offline Processing. Master’s thesis. Ruhr University Bochum, Bochum, Germany.Google Scholar
- Yun Liang and Tulika Mitra. 2010. Instruction cache locking using temporal reuse profile. In Proceedings of the 47th Design Automation Conference. ACM, 344--349.Google Scholar
Digital Library
- Yun Liang and Tulika Mitra. 2013. An analytical approach for fast and accurate design space exploration of instruction caches. ACM Trans. Embed. Comput. Syst. 13, 3 (2013), 43.Google Scholar
Digital Library
- Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In ACM SIGPLAN Notices, Vol. 40. ACM, 190--200.Google Scholar
- Osvaldo Navarro and Michael Hübner. 2014. An adaptive victim cache scheme. In Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig’14). IEEE, 1--4.Google Scholar
Cross Ref
- Osvaldo Navarro and Michael Hübner. 2018. Runtime adaptive cache for the LEON3 processor. In Proceedings of the Conference on Applied Reconfigurable Computing. Architectures, Tools, and Applications (ARC 2018). Lecture Notes in Computer Science, Vol. 10824. Springer.Google Scholar
Cross Ref
- Osvaldo Navarro, Tim Leiding, and Michael Hübner. 2015. Configurable cache tuning with a victim cache. In Proceedings of the 10th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC’15). IEEE, 1--6.Google Scholar
Cross Ref
- Osvaldo Navarro, Tim Leiding, and Michael Hübner. 2016. A dynamic cache reconfiguration platform for soft real-time systems. In Proceedings of the IEEE International Conference on Electronics, Circuits and Systems (ICECS’16). IEEE, 388--391.Google Scholar
Cross Ref
- Osvaldo Navarro, Jones Mori, Javier Hoffmann, Fabian Stuckmann, and Michael Hübner. 2017. A machine learning methodology for cache recommendation. In Proceedings of the International Symposium on Applied Reconfigurable Computing. Springer, 311--322.Google Scholar
Cross Ref
- Keni Qiu, Mengying Zhao, Chun Jason Xue, and Alex Orailoglu. 2014. Branch prediction-directed dynamic instruction cache locking for embedded systems. ACM Trans. Embed. Comput. Syst. 13, 5s (2014), 156.Google Scholar
Digital Library
- Marisha Rawlins and Ann Gordon-Ross. 2013. Adaptive loop caching using lightweight runtime control flow analysis. ACM Trans. Embed. Comput. Syst. 12, 1s (2013), 55.Google Scholar
Digital Library
- Kevin P. Murphy. 2012. Machine learning: A Probabilistic Perspective. MIT press.Google Scholar
Digital Library
- Andreas Sembrant, David Eklov, and Erik Hagersten. 2011. Efficient software-based online phase classification. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’11). IEEE, 104--115.Google Scholar
Digital Library
- Timothy Sherwood, Erez Perelman, and Brad Calder. 2001. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques. IEEE, 3--14.Google Scholar
Digital Library
- Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder. 2002. Automatically characterizing large scale program behavior. ACM SIGARCH Comput. Arch. News 30, 5 (2002), 45--57.Google Scholar
Digital Library
- Bruno A. Silva, Lucas A. Cuminato, Vanderlei Bonato, and Pedro C. Diniz. 2015. Run-time cache configuration for the LEON-3 embedded processor. In Proceedings of the 28th Symposium on Integrated Circuits and Systems Design (SBCCI’15). ACM, New York, NY, Article 42, 6 pages. DOI:https://doi.org/10.1145/2800986.2801026Google Scholar
- Vasileios Spiliopoulos, Andreas Sembrant, and Stefanos Kaxiras. 2012. Power-sleuth: A tool for investigating your program’s power behavior. In Proceedings of the IEEE 20th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS’12). IEEE, 241--250.Google Scholar
Digital Library
- Rabin A. Sugumar and Santosh G. Abraham. 1995. Set-associative cache simulation using generalized binomial trees. ACM Trans. Comput. Syst. 13, 1 (1995), 32--56.Google Scholar
Digital Library
- David Tarjan, Shyamkumar Thoziyoor, and Norman P. Jouppi. 2006. CACTI 4.0. Technical Report. Technical Report HPL-2006-86, HP Laboratories Palo Alto.Google Scholar
- John Thomson, Michael O’Boyle, Grigori Fursin, and Björn Franke. 2009. Reducing training time in a one-shot machine learning-based compiler. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Springer, 399--407.Google Scholar
- Florida State University. 2016. C Source Codes Benchmark. Retrieved June 12, 2017 from http://people.sc.fsu.edu/ jburkardt/c_src/c_src.html.Google Scholar
- Miguel A. Vega, Raúl Martín, Francisco A. Zarallo, Juan M. Sánchez, and Juan A. Gómez. 2000. SMPCache: Simulador de sistemas de memoria cache en multiprocesadores simétricos. In XI Jornadas de Paralelismo. Granada (2000).Google Scholar
- Weixun Wang, Prabhat Mishra, and A. Gordon-Ross. 2012. Dynamic cache reconfiguration for soft real-time systems. ACM Trans. Embed. Comput. Syst. 11, 2 (2012). DOI:https://doi.org/10.1145/0000000.0000000Google Scholar
Digital Library
- Yu Wang and Lei Chen. 2015. Dynamic Branch Prediction Using Machine Learning. ECS-201A Fall. Technical report. Department of Computer Science,University of Massachusetts, Amherst.Google Scholar
- Chuanjun Zhang and Frank Vahid. 2003. Cache configuration exploration on prototyping platforms. In Proceedings of the 14th IEEE International Workshop on Rapid Systems Prototyping 2003. IEEE, 164--170.Google Scholar
Cross Ref
- Chuanjun Zhang, Frank Vahid, and Roman Lysecky. 2004. A self-tuning cache architecture for embedded systems. ACM Trans. Embed. Comput. Syst. 3, 2 (2004), 407--425.Google Scholar
Digital Library
- Chuanjun Zhang, Frank Vahid, and Walid Najjar. 2003. A highly configurable cache architecture for embedded systems. In Proceedings of the 30th Annual International Symposium on Computer Architecture, 2003. IEEE, 136--146.Google Scholar
Digital Library
- Chuanjun Zhang, Frank Vahid, and Walid Najjar. 2005. A highly configurable cache for low energy embedded systems. ACM Trans. Embed. Comput. Syst. 4, 2 (2005), 363--387.Google Scholar
Digital Library
Index Terms
A Machine Learning Methodology for Cache Memory Design Based on Dynamic Instructions
Recommendations
Lessons from Experimental Methodology of Cache Hierarchy Changes with the Memory Technology
CSE '14: Proceedings of the 2014 IEEE 17th International Conference on Computational Science and EngineeringCaching is an important technique to improve computer system performance by storing the most recently used data and instructions for main memory. Cache is widely used in modern computer systems and will continue to be an irreplaceable unit to narrow the ...
A dynamic adaptive converter and management for PRAM-based main memory
As DRAM-based main memory becomes a dominant factor in the energy consumption and cost of any computer system, new non-volatile memory technologies have been proposed to replace DRAMs. For example, PRAM is emerged as a leading alternative for main ...
Optimal Worst Case Formulas Comparing Cache Memory Associativity
In this paper we derive a worst case formula comparing the number of cache hits for two different cache memories. From this various other bounds for cache memory performance may be derived.
Consider an arbitrary program P which is to be executed on a ...






Comments