skip to main content
10.1145/1454115.1454156acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Hybrid access-specific software cache techniques for the cell BE architecture

Authors Info & Claims
Published:25 October 2008Publication History

ABSTRACT

Ease of programming is one of the main impediments for the broad acceptance of multi-core systems with no hardware support for transparent data transfer between local and global memories. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this paper, we propose a hierarchical, hybrid software-cache architecture that classifies at compile time memory accesses in two classes, high-locality and irregular. Our approach then steers the memory references toward one of two specific cache structures optimized for their respective access pattern. The specific cache structures are optimized to enable high-level compiler optimizations to aggressively unroll loops, reorder cache references, and/or transform surrounding loops so as to practically eliminate the software cache overhead in the innermost loop. Performance evaluation indicates that improvements due to the optimized software-cache structures combined with the proposed code-optimizations translate into 3.5 to 8.4 speedup factors, compared to a traditional software cache approach. As a result, we demonstrate that the Cell BE processor can be a competitive alternative to a modern server-class multi-core such as the IBM Power5 processor for a set of parallel NAS applications.

References

  1. A. E. Eichenberger et al., "Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture," IBM Sytems Journal, Vol. 45, No. 1, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Kistler et al., "Cell Multiprocessor Communication Network: Built for Speed," IEEE Micro, Vol. 26, Issue 3, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Pham et al., "The Design and Implementation of a First-Generation CELL Processor," in the Proceedings of the IEEE International Solid-State Circuits Conference, 2005.Google ScholarGoogle Scholar
  4. M. Gschwind et al., "A Novel SIMD Architecture for the CELL Heterogeneous Chip-Multiprocessor," In Hot Chips 17, 2005.Google ScholarGoogle Scholar
  5. T. Chen et al., "Optimizing the use of static buffers for DMA on a Cell chip," in the Proceedings of the International Workshop on Languages and Compilers for Parallel Computing, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. E. Eichenberger et al., "Optimizing Compiler for a Cell Processor," in the proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Bailey et al. "The NAS parallel benchmarks," Technical Report TR RNR-91-002, NASA Ames, 1991.Google ScholarGoogle Scholar
  8. E. Witchel et al. "Direct Addressed Caches for Reduced Power Consumption," in the Proceedings of the Annual International Symposium on Microarchitecture, , 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. A. Moritz et al., "Hot Pages: Software Caching for Raw Microprocessors," MIT-LCS Technical Memo LCS-TM-599, 1999.Google ScholarGoogle Scholar
  10. J. B. Fryman et al., "SoftCache: A Technique for Power and Area Reduction in Embedded Systems," CERCS; GIT-CERCS-03-06Google ScholarGoogle Scholar
  11. J. E. Miller and A. Agarwal, "Software-based Instruction Caching for Embedded Processors," in the Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. A. Moritz et al., "FlexCache: A framework for flexible compiler generated data caching," in the Proceedings of the 2nd Workshop on Intelligent Memory Systems, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Udayakumaran et al., "Dynamic Allocation for Scratch-Pad Memory Using Compile-Time Decisions," ACM Transactions on Embedded Computing Systems, Vol. 5, No. 2, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Sinharoy et al., "POWER 5 system micro-architecture," IBM Journal of Research and Development, Vol. 49, No. 4/5, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Hoeflinger and B. de Supinski, "The OpenMP Memory Model," in the Proceedings of the First International Workshop on OpenMP, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Altevogt et al., "IBM BladeCenter QS21 Hardware Performance," IBM Technical White Paper WP101245, 2008.Google ScholarGoogle Scholar
  17. T. Chen et al., "Orchestrating Data Transfer for the Cell B.E. processor," in the Proceedings of the Annual International Conference on Supercomputing, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Chen et al., "Prefetching Irregular References for Software Cache on Cell, Proceedings of the sixth Annual International Symposium on Code Generation and Optimization. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hybrid access-specific software cache techniques for the cell BE architecture

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques
          October 2008
          328 pages
          ISBN:9781605582825
          DOI:10.1145/1454115

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 October 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate121of471submissions,26%

          Upcoming Conference

          PACT '24
          International Conference on Parallel Architectures and Compilation Techniques
          October 14 - 16, 2024
          Southern California , CA , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader