Abstract
Software cache coherence schemes tend to be the solution of choice in dedicated multi/many core systems on chip, as they make the hardware much simpler and predictable. However, despite the developers’ effort, it is hard to make sure that all preventive measurements are taken to ensure coherence. In this work, we propose a method to identify the potential cache coherence violations using traces obtained from virtual platforms. These traces contain causality relations among events, which allow first to simplify the analysis, and second to avoid relying on timestamps. Our method identifies potential violations that may occur during a given execution for write-through and write-back cache policies. Therefore, it is independent of the software coherence protocol. We conducted experiments on parallel applications running on a lightweight SMP operating system, and we were able to detect coherence issues that we could then solve.
- K. Aisopos and L.-S. Peh. 2011. A systematic methodology to develop resilient cache coherence protocols. In Proceedings of the 44th International Symposium on Microarchitecture. 47--58. DOI:http://dx.doi.org/10.1145/2155620.2155627 Google Scholar
Digital Library
- T. Ashby, P. Díaz, and M. Cintra. 2011. Software-based cache coherence with hardware-assisted selective self-invalidations using bloom filters. IEEE Trans. Comput. 60, 4 (April 2011), 472--483. DOI:http://dx.doi.org/10.1109/TC.2010.155 Google Scholar
Digital Library
- L. Censier and P. Feautrier. 1978. A new solution to coherence problems in multicache systems. IEEE Trans. Comput. C-27, 12 (1978), 1112--1118. DOI:http://dx.doi.org/10.1109/TC.1978.1675013 Google Scholar
Digital Library
- B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. Adve, V. Adve, N. Carter, and C.-T. Chou. 2011. DeNovo: Rethinking the memory hierarchy for disciplined parallelism. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques. 155--166. DOI:http://dx.doi.org/10.1109/PACT.2011.21 Google Scholar
Digital Library
- M. A. P. Cunha, N. Fournel, and F. Pétrot. 2015. Collecting traces in dynamic binary translation based virtual prototyping platforms. In Proceedings of the 7th ACM RAPIDO Workshop on Rapid Simulation and Performance Evaluation. DOI:http://dx.doi.org/10.1145/2693433.2693437 Google Scholar
Digital Library
- M. A. P. Cunha, N. Fournel, and F. Pétrot. 2016. Deterministic reversible MPSoC debugger based on virtual platform execution traces. Des. Autom. Embed. Syst. 20, 1 (2016), 47--63. Google Scholar
Digital Library
- D. L. Dill. 1998. What’s between simulation and formal verification? In Proceedings of the Design Automation Conference. ACM, 328--329. Google Scholar
Cross Ref
- B. Dupont de Dinechin, R. Ayrignac, P.-E. Beaucamps, P. Couvert, B. Ganne, P. Guironnet de Massas, F. Jacquet, S. Jones, N. Morey Chaisemartin, F. Riss, and T. Strudel. 2013. A clustered manycore processor architecture for embedded and accelerated applications. In Proceedings of the IEEE High Performance Extreme Computing Conference. IEEE, 1--6. Google Scholar
Cross Ref
- M. Gligor, N. Fournel, and F. Pétrot. 2009. Using binary translation in event driven simulation for fast and flexible MPSoC simulation. In Proceedings of the 7th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis. ACM, 71--80. DOI:http://dx.doi.org/10.1145/1629435.1629446 Google Scholar
Digital Library
- D. Hedde and F. Pétrot. 2011. A non intrusive simulation-based trace system to analyse multiprocessor systems-on-chip software. In Proceedings of the 22nd IEEE International Symposium on Rapid System Prototyping (RSP). 106--112. DOI:http://dx.doi.org/10.1109/RSP.2011.5929983 Google Scholar
Cross Ref
- Kalray. 2014. MPPA Many Core. Retrieved from http://www.kalray.eu/products/.Google Scholar
- H. Kapoor, P. Kanakala, M. Verma, and S. Das. 2013. Design and formal verification of a hierarchical cache coherence protocol for NoC based multiprocessors. J. Supercomput. 65, 2 (Aug. 2013), 771--796. DOI:http://dx.doi.org/10.1007/s11227-012-0865-8 Google Scholar
Digital Library
- J. H. Kelm, D. R. Johnson, M. R. Johnson, N. C. Crago, W. Tuohy, A. Mahesri, S. S. Lumetta, M. I. Frank, and S. J. Patel. 2009. Rigel: An architecture and scalable programming interface for a 1000-core accelerator. In Proceedings of the 36th Annual International Symposium on Computer Architecture. ACM, 140--151. DOI:http://dx.doi.org/10.1145/1555754.1555774 Google Scholar
Digital Library
- G. Keramidas, N. Strikos, and S. Kaxiras. 2011. Multicore cache simulations using heterogeneous computing on general purpose and graphics processors. In Proceedings of the 2011 14th Euromicro Conference on Digital System Design (DSD). 270--273. DOI:http://dx.doi.org/10.1109/DSD.2011.38 Google Scholar
Digital Library
- R. Komuravelli, S. V. Adve, and C.-T. Chou. 2014. Revisiting the complexity of hardware cache coherence and some implications. ACM Trans. Archit. Code Optim. 11, 4 (Dec. 2014), 37:1--37:22. DOI:http://dx.doi.org/10.1145/2663345 Google Scholar
Digital Library
- M. Lis, K. S. Shim, M. H. Cho, and S. Devadas. 2011. Memory coherence in the age of multicores. In Proceedings of the 29th International Conference on Computer Design. IEEE, 1--8. Google Scholar
Digital Library
- M. Loghi and M. Poncino. 2005. Exploring energy/performance tradeoffs in shared memory MPSoCs: Snoop-based cache coherence vs. software solutions. In Proceedings of the Design, Automation and Test in Europe. 508--513 Vol. 1. DOI:http://dx.doi.org/10.1109/DATE.2005.148 Google Scholar
Digital Library
- M. M. K. Martin, M. D. Hill, and D. J. Sorin. 2012. Why on-chip cache coherence is here to stay. Commun. ACM 55, 7 (July 2012), 78--89. DOI:http://dx.doi.org/10.1145/2209249.2209269 Google Scholar
Digital Library
- T. G. Mattson, M. Riepen, T. Lehnig, P. Brett, W. Haas, P. Kennedy, J. Howard, S. Vangal, N. Borkar, G. Ruhl, and others. 2010. The 48-core SCC processor: The programmer’s view. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 1--11. Google Scholar
Digital Library
- B. Mihajlović, v. Žilić, and W. J. Gross. 2014. Dynamically instrumenting the QEMU emulator for linux process trace generation with the GDB debugger. ACM Trans Embed. Comput. Syst. 13, 5s (Dec. 2014), 1--18. Google Scholar
Digital Library
- S. Owicki and A. Agarwal. 1989. Evaluating the performance of software cache coherence. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems. 230--242. Google Scholar
Digital Library
- A. Schmidt and O. Horst. 2012. Software-based online monitoring of cache contents on platforms without coherence fabric. In Proceedings of the 19th Asia-Pacific Software Engineering Conference. 194--202. DOI:http://dx.doi.org/10.1109/APSEC.2012.10 Google Scholar
Digital Library
- D. J. Sorin, M. D. Hill, and D. A. Wood. 2011. A primer on memory consistency and cache coherence. Synth. Lect. Comput. Archit. 6, 3 (May 2011), 1--212. DOI:http://dx.doi.org/10.2200/S00346ED1V01Y201104CAC016 Google Scholar
Cross Ref
- S. Taylor, C. Ramey, C. Barner, and D. Asher. 2001. A simulation-based method for the verification of shared memory in multiprocessor systems. In Proceedings of the International Conference on Computer Aided Design. 10--17. Google Scholar
Cross Ref
- A. Terechko, J. Hoogerbrugge, G. Alkadi, S. Guntur, A. Lahiri, M. Duranton, C. Wüst, P. Christie, A. Nackaerts, and A. Kumar. 2012. Balancing programmability and silicon efficiency of heterogeneous multicore architectures. ACM Trans. Embed. Comput. Syst. (TECS) 11S, 1 (June 2012), 1--32. Google Scholar
Digital Library
- H. Zhao, A. Shriraman, S. Kumar, and S. Dwarkadas. 2013. Protozoa: Adaptive granularity cache coherence. In Proceedings of the 40th Annual International Symposium on Computer Architecture. 547--558. DOI:http://dx.doi.org/10.1145/2485922.2485969 Google Scholar
Digital Library
Index Terms
Detecting Software Cache Coherence Violations in MPSoC Using Traces Captured on Virtual Platforms
Recommendations
Boosting performance of directory-based cache coherence protocols with coherence bypass at subpage granularity and a novel on-chip page table
CF '16: Proceedings of the ACM International Conference on Computing FrontiersChip multiprocessors (CMPs) require effective cache coherence protocols as well as fast virtual-to-physical address translation mechanisms for high performance. Directory-based cache coherence protocols are the state-of-the-art approaches in many-core ...
Cache coherency communication cost in a NoC-based MPSoC platform
SBCCI '07: Proceedings of the 20th annual conference on Integrated circuits and systems designCache coherency and cache consistency in NoC-based heterogeneous platforms are still open problems. Current works addressing platform design avoid this issue either by proposing cacheless implementations or using snoopy protocols over buses. This paper ...
Techniques for Compiler-Directed Cache Coherence
The performance of large-scale shared-memory multiprocessors can be greatly improved if they can cache remote shared data in the private caches of the processors. However, maintaining cache coherence for such systems remains a challenge. Although ...






Comments