skip to main content
research-article
Open Access

Minimising Access Conflicts on Shared Multi-Bank Memory

Published:27 September 2017Publication History
Skip Abstract Section

Abstract

A common multi-core pattern consists of processors communicating through shared, multi-banked on-chip memory. Two approaches exist: Interleaved address mapping, which spreads consecutive data over all banks, and contiguous address mapping, which stores consecutive data on a single bank.

In this work, we compare both approaches on the Kalray MPPA-256 platform. For contiguous mapping, we propose an algorithm, based on graph colouring techniques, to automatically perform the assignment of data blocks to memory banks with the goal of minimising access collisions and delays. Experiments with representative, parallel real-world benchmarks show that 69% of the tested configurations, when optimised for contiguous mapping by our algorithm, run up to 86% faster on average than with interleaved mapping.

References

  1. Matthias Becker, Dakshina Dasari, Borislav Nicolic, Benny Akesson, Vincent Nelis, and Thomas Nolte. 2016. Contention-Free Execution of Automotive Applications on a Clustered Many-Core Platform. In 2016 28th Euromicro Conference on Real-Time Systems (ECRTS). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  2. L. Benini, E. Flamand, D. Fuin, and D. Melpignano. 2012. P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE 2012). IEEE, 983--987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Thomas Carle, Manel Djemal, Dumitru Potop-Butucaru, Robert de Simone, and Zhen Zhang. 2014. Static Mapping of Real-Time Applications onto Massively Parallel Processor Arrays. In 2014 14th International Conference on Application of Concurrency to System Design. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. J. Chaitin. 1982. Register allocation 8 spilling via graph coloring. In Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction - SIGPLAN'82. ACM Press, 98--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Vishwanathan Chandru and Frank Mueller. 2016. Reducing NoC and Memory Contention for Manycores. In Architecture of Computing Systems -- ARCS 2016. Springer International Publishing, 293--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jeonghun Cho, Yunheung Paek, and David Whalley. 2002. Efficient register and memory assignment for non-orthogonal architectures via graph coloring and MST algorithms. In Proceedings of the Joint Conference on Languages, Compilers and Tools for Embedded Systems Software and Compilers for Embedded Systems (LCTES/SCOPES’02). ACM Press, New York, NY, USA, 130--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Francesco Conti. CConvNet open source project. Retrieved July 10, 2017 from https://micrel-web-services.dei.unibo.it/brain-inspired/cconvnet-release.Google ScholarGoogle Scholar
  8. Francesco Conti, Antonio Pullini, and Luca Benini. 2014. Brain-Inspired Classroom Occupancy Monitoring on a Low-Power Mobile Platform. In 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Francesco Conti, Davide Rossi, Antonio Pullini, Igor Loi, and Luca Benini. 2015. PULP: A Ultra-Low Power Parallel Accelerator for Energy-Efficient and Flexible Embedded Vision. Journal of Signal Processing Systems 84, 3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Benoit Dupont de Dinechin, Duco van Amstel, Marc Poulhies, and Guillaume Lager. 2014. Time-critical computing on a single-chip massively parallel processor. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE), 2014. IEEE Conference Publications, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Gautham and Erik Rainey. 2014. The Khronos OpenVXTM 1.0 Specification. https://www.khronos.org/openvx/.Google ScholarGoogle Scholar
  12. Georgia Giannopoulou, Nikolay Stoimenov, Pengcheng Huang, and Lothar Thiele. 2014. Mapping mixed-criticality applications on multi-core architectures. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE), 2014. IEEE Conference Publications, 98:1--98:6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Andrés Goens, Jeronimo Castrillon, Maximilian Odendahl, and Rainer Leupers. 2016. An optimal allocation of memory buffers for complex multicore platforms. Journal of Systems Architecture 66--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Min Kyu Jeong, Doe Hyun Yoon, Dam Sunwoo, Mike Sullivan, Ikhwan Lee, and Mattan Erez. 2012. Balancing DRAM locality and parallelism in shared memory CMP systems. In IEEE International Symposium on High-Performance Comp Architecture. IEEE, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Taewhan Kim and Jungeun Kim. 2007. Integration of Code Scheduling, Memory Allocation, and Array Binding for Memory-Access Optimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 26, 1 (jan 2007), 142--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yongjoo Kim, Jongeun Lee, Aviral Shrivastava, and Yunheung Paek. 2010. Operation and data mapping for CGRAs with multi-bank memory. ACM SIGPLAN Notices 45, 4 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ming-Yung Ko and Shuvra S. Bhattacharyya. 2003. Partitioning for DSP Software Synthesis. Springer Berlin Heidelberg, Berlin, Heidelberg, 344--358.Google ScholarGoogle Scholar
  18. R. Leupers and D. Kotte. 2001. Variable partitioning for dual memory bank DSPs. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), Vol. 2. IEEE, 1121--1124 vol. 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao, Mingyu Chen, and Chengyong Wu. 2012. A software memory partition approach for eliminating bank-level interference in multicore systems. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques - PACT'12. ACM Press, 367--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Wei Mi, Xiaobing Feng, Jingling Xue, and Yaocang Jia. 2010. Software-Hardware Cooperative DRAM Bank Partitioning for Chip Multiprocessors. In Network and Parallel Computing. LNCS, Vol. 6289. Springer Berlin Heidelberg, 329--343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Alastair Murray and Björn Franke. 2008. Fast source-level data assignment to dual memory banks. In Proceedings of the 11th international workshop on Software 8 compilers for embedded systems - SCOPES'08. ACM Press, 43--52. Google ScholarGoogle ScholarCross RefCross Ref
  22. Vincent Nélis, Patrick Meumeu Yomsi, and Luis Miguel Pinho. 2016. The variability of application execution times on a multi-core platform. In 16th International Workshop on Worst-Case Execution Time Analysis (WCET 2016). http://www.cister.isep.ipp.pt/docs/the_variability_of_application_execution_times_on_a_multi_core_platform/1224/attach.pdf.Google ScholarGoogle Scholar
  23. Andreas Olofsson, Roman Trogan, Oleg Raikhman, and Lexington Adapteva. 2011. A 1024-core 70 GFLOP/W floating point manycore microprocessor. In Poster on 15th Workshop on High Performance Embedded Computing (HPEC 2011). http://www.adapteva.com/wp-content/uploads/2011/10/adapteva_hpec11.pdf.Google ScholarGoogle Scholar
  24. Xing Pan, Yasaswini Jyothi Gownivaripalli, and Frank Mueller. 2016. TintMalloc: Reducing Memory Access Divergence via Controller-Aware Coloring. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 363--372.Google ScholarGoogle ScholarCross RefCross Ref
  25. Quentin Perret, Pascal Maurere, Eric Noulard, Claire Pagetti, Pascal Sainrat, and Benoit Triquet. 2016. Temporal Isolation of Hard Real-Time Applications on Many-Core Processors. In 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 1--11.Google ScholarGoogle Scholar
  26. B. Ramakrishna Rau. 1991. Pseudo-randomly interleaved memory. ACM SIGARCH Computer Architecture News 19, 3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jan Reineke, Isaac Liu, Hiren D. Patel, Sungjun Kim, and Edward A. Lee. 2011. PRET DRAM controller: bank privatization for predictability and temporal isolation. In Proceedings of the Seventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis - CODES+ISSS'11. ACM Press, 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hamza Rihani, Matthieu Moy, Claire Maiza, Robert I. Davis, and Sebastian Altmeyer. 2016. Response Time Analysis of Synchronous Data Flow Programs on a Many-Core Processor. In Proceedings of the 24th International Conference on Real-Time Networks and Systems - RTNS'16. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mazen A. R. Saghir, Paul Chow, and Corinna G. Lee. 1996. Exploiting dual data-memory banks in digital signal processors. ACM SIGOPS Operating Systems Review 30, 5Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Shyam and R. Govindarajan. 2007. An Array Allocation Scheme for Energy Reduction in Partitioned Memory Architectures. Springer Berlin Heidelberg, Berlin, Heidelberg, 32--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Viera Sipkova. 2003. Efficient Variable Allocation to Dual Memory Banks of DSPs. Springer Berlin Heidelberg, Berlin, Heidelberg, 359--372.Google ScholarGoogle Scholar
  32. Maria Soto, Marc Sevaux, André Rossi, and Johann Laurent. 2013. Memory Allocation Problems in Embedded Systems: Optimization Methods. Wiley-ISTE. 256 pages. https://hal.archives-ouvertes.fr/hal-00767031.Google ScholarGoogle Scholar
  33. StreamIt Benchmark Suite. Retrieved July 10, 2017 from http://groups.csail.mit.edu/cag/streamit/shtml/benchmarks.shtml.Google ScholarGoogle Scholar
  34. Andreas Tretter, Pratyush Kumar, and Lothar Thiele. 2015. Interleaved Multi-Bank Scratchpad Memories: A Probabilistic Description of Access Conflicts. In Proceedings of the 52nd Annual Design Automation Conference on - DAC'15. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Andreas Tretter, Harshavardhan Pandit, Pratyush Kumar, and Lothar Thiele. 2014. Deterministic memory sharing in Kahn process networks: Ultrasound imaging as a case study. In 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  36. Prathap Kumar Valsan and Heechul Yun. 2015. MEDUSA: A Predictable and High-Performance DRAM Controller for Multicore Based Embedded Systems. In 2015 IEEE 3rd International Conference on Cyber-Physical Systems, Networks, and Applications. IEEE, 86--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zheng Pei Wu, Yogen Krish, and Rodolfo Pellizzoni. 2013. Worst Case Analysis of DRAM Latency in Multi-requestor Systems. In 2013 IEEE 34th Real-Time Systems Symposium. IEEE, 372--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Heechul Yun, Renato Mancuso, Zheng-Pei Wu, and Rodolfo Pellizzoni. 2014. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 155--166.Google ScholarGoogle ScholarCross RefCross Ref
  39. Lei Zhang, Meikang Qiu, Edwin H.-M. Sha, and Qingfeng Zhuge. 2011. Variable assignment and instruction scheduling for processor with multi-module memory. Microprocessors and Microsystems 35, 3 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Minimising Access Conflicts on Shared Multi-Bank Memory

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!