skip to main content
10.1145/1134650.1134679acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
Article

Optimizing compiler for shared-memory multiple SIMD architecture

Published:14 June 2006Publication History

ABSTRACT

With the rapid growth of multimedia and game, these applications put more and more pressure on the processing ability of modern processors. Multiple SIMD architecture is widely used in multimedia processing field as a multimedia accelerator.With the consideration of power consumption and chip size, shared memory multiple SIMD architecture is mainly used in embedded SOCs. In order to further fit mobile environment, there is the constraint of limited register number as well. Although shared memory multiple SIMD architecture simplify the chip design, these constraints are the major obstacles to map the real multimedia applications to these architectures. Until now, to our best knowledge, there is little research on the optimizing techniques for shared memory multiple SIMD architecture.In this paper, we present a compiler framework, which aims at automatically generating high performance codes for shared memory multiple SIMD architecture. In this framework, we reduce the competition of shared data bus through increasing the register locality, improve the utilization of data bus by read-only data vector replication and solve the problem of limited register number through a resource allocation algorithm. The framework also handlers the issues concerning on data transformation. As the experimental results shown, this framework is successful in mapping real multimedia applications to shared memory multiple SIMD architecture. It leads to an average speedup by a factor of 3.19 and an average utilization of SM-SIMD architecture with 8 SIMD units by a factor of 52.6%.

References

  1. E. Mirsky and A. DeHon. MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources. Proceedings of IEEE Symposium on FPGAs for Custom Computing Machines, Napa, California, 1996, 157--166Google ScholarGoogle ScholarCross RefCross Ref
  2. H. Singh, M. H. Lee, N. Bagherzadeh, E. M. C. Filho. MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications. IEEE Transaction on Computers, 2000, 49, 5, 465--481 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chengzhi Pan, Nader Bagherzadeh Arezou Koohi. Design and Analysis of a Programmable Single-Chip Architecture for DVBT Base-Band Receiver. Design,Automation and Test in Europe Conference and Exhibition (DATE'03), Munich, Germany, pp.468--472, March, 2003 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Miyamori and K. Olukotun. A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications. IEEE Symposium on FPGAs for Custom Computing Machines, Napa, California, 1998, 2--11 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Takashi Komuro, Masatoshi Ishikawa. A Dynamically Reconfigurable SIMD Processor for a Vision Chip. IEEE Journal of Solid-State Circuits, Vol. 39, No. 1, Jan 2004Google ScholarGoogle Scholar
  6. http://www.motorala.comGoogle ScholarGoogle Scholar
  7. http://www.intrinsity.comGoogle ScholarGoogle Scholar
  8. http://www.morphotech.comGoogle ScholarGoogle Scholar
  9. Eric S. Gayles, Thomas P. and Mary Jane Irwin. The Design of the MGAP-2: A Micro-Grained Massively Parallel Array. IEEE Transaction on Very Large Scale Integration(VLSI) Systems, Vol. 8, No. 6, Dec 2000 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H.Peter Hofstee, Power Efficient Processor Architecture and The Cell Processor, 11th International Conference on High-Performance Computer Architecture, San Francisco, USA, February 2005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Gebis, S.William, C. Kozyrakis, D. Patterson, VIRAM1: A Media-Oriented Vector Processor with Embedded DRAM", 41st Design Automation Student Design Contenst, San Diego, CA, June 2004Google ScholarGoogle Scholar
  12. A. Eichenberger, K. O'Brien, K. O'Brien. Processor Optimizing Compiler for the Cell. The Fourteenth International Conference on Parallel Architectures and Compilation Techniques. Saint Louis, Missouri September, 2005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Girish Venkataramani, Walid Najjar, Nader Bagherzadeh, Synthesis and Design Tools: A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture. Proceedings of the 2001 International conference on Compilers, architecture, and synthesis for embedded systems, Nov 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Girish Venkataramani, Walid Najjar, Fadi Kurdahi, Nader Bagherzadeh, Wim Bohm, Jeff Hammes. Automatic compilation to a coarse-grained reconfigurable system-opn-chip. November 2003 ACM Transactions on Embedded Computing Systems (TECS), Vol. 2 Issue 4 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J.P.Hammes. The SA-C Language. www.cs.colostate.edu/cameron Colorado State University. 2001Google ScholarGoogle Scholar
  16. Weihua Jiang, Chao Mei, Bo Huang, Jianhui Li, Jiahua Zhu, Bingyu Zang, Chuanqi Zhu "Boosting the Performance of Multimedia Applications Using SIMD Instructions" The 15th International Conference on Compiler Construction. April 2005 Edinburgh, Scotland Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jiahua Zhu, Bingyu Zang, Chuanqi Zhu "Overflow Controlled SIMD Arithmetic" The 17th International Workshop on Languages and Compilers for Parallel Computing. Sep, 2004 West Lafayette, Indiana, USA Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J.M. Anderson, Decomposition for multiprocessors. PhD thesis, Stanford University, Standford, CA, 1997Google ScholarGoogle Scholar
  19. J.M. Anderson, M.S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, June 1993, 112--125 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Bruce Schneier, Applied Cryptography:Protocols,Algorithms,and Source Code in C Second Edition. Wiley Publisher. Jan 1996 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Hsieh, T. Lin, "VLSI Architecture For Block-Matching Motion Estimation Algorithm," IEEE Transaction On Circuits and Systems for Video Tech, June 1992, Vol. 2, 169--175Google ScholarGoogle Scholar
  22. M. Kandemir A. Choudharyy J. Ramanujamz P. Banerjeey, A Matrix-Based Approach to the Global Locality Optimization Problem, Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, Paris France Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Michael E. Wolf, Monica S. Lam, "A Data Locality Optimizing Algorithm" ACM SIGPLAN Conference on Programming Language Design and Implementation. 1991, Ontario, Canada Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kathryn S. McKinley, Steve Carr, Chau-Wen Tseng. Improving Data Locality with Loop Transformations. ACM Transactions on Programming Languages and Systems. Vol. 18 No. 4 July 1996 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S.S. Muchnick. Advanced Compiler Design and Implementation, Morgan Kaufmann, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    LCTES '06: Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
    June 2006
    220 pages
    ISBN:159593362X
    DOI:10.1145/1134650
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 41, Issue 7
      Proceedings of the 2006 LCTES Conference
      July 2006
      208 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1159974
      Issue’s Table of Contents

    Copyright © 2006 ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 June 2006

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate116of438submissions,26%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!