ABSTRACT
With the rapid growth of multimedia and game, these applications put more and more pressure on the processing ability of modern processors. Multiple SIMD architecture is widely used in multimedia processing field as a multimedia accelerator.With the consideration of power consumption and chip size, shared memory multiple SIMD architecture is mainly used in embedded SOCs. In order to further fit mobile environment, there is the constraint of limited register number as well. Although shared memory multiple SIMD architecture simplify the chip design, these constraints are the major obstacles to map the real multimedia applications to these architectures. Until now, to our best knowledge, there is little research on the optimizing techniques for shared memory multiple SIMD architecture.In this paper, we present a compiler framework, which aims at automatically generating high performance codes for shared memory multiple SIMD architecture. In this framework, we reduce the competition of shared data bus through increasing the register locality, improve the utilization of data bus by read-only data vector replication and solve the problem of limited register number through a resource allocation algorithm. The framework also handlers the issues concerning on data transformation. As the experimental results shown, this framework is successful in mapping real multimedia applications to shared memory multiple SIMD architecture. It leads to an average speedup by a factor of 3.19 and an average utilization of SM-SIMD architecture with 8 SIMD units by a factor of 52.6%.
- E. Mirsky and A. DeHon. MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources. Proceedings of IEEE Symposium on FPGAs for Custom Computing Machines, Napa, California, 1996, 157--166Google Scholar
Cross Ref
- H. Singh, M. H. Lee, N. Bagherzadeh, E. M. C. Filho. MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications. IEEE Transaction on Computers, 2000, 49, 5, 465--481 Google Scholar
Digital Library
- Chengzhi Pan, Nader Bagherzadeh Arezou Koohi. Design and Analysis of a Programmable Single-Chip Architecture for DVBT Base-Band Receiver. Design,Automation and Test in Europe Conference and Exhibition (DATE'03), Munich, Germany, pp.468--472, March, 2003 Google Scholar
Digital Library
- T. Miyamori and K. Olukotun. A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications. IEEE Symposium on FPGAs for Custom Computing Machines, Napa, California, 1998, 2--11 Google Scholar
Digital Library
- Takashi Komuro, Masatoshi Ishikawa. A Dynamically Reconfigurable SIMD Processor for a Vision Chip. IEEE Journal of Solid-State Circuits, Vol. 39, No. 1, Jan 2004Google Scholar
- http://www.motorala.comGoogle Scholar
- http://www.intrinsity.comGoogle Scholar
- http://www.morphotech.comGoogle Scholar
- Eric S. Gayles, Thomas P. and Mary Jane Irwin. The Design of the MGAP-2: A Micro-Grained Massively Parallel Array. IEEE Transaction on Very Large Scale Integration(VLSI) Systems, Vol. 8, No. 6, Dec 2000 Google Scholar
Digital Library
- H.Peter Hofstee, Power Efficient Processor Architecture and The Cell Processor, 11th International Conference on High-Performance Computer Architecture, San Francisco, USA, February 2005 Google Scholar
Digital Library
- J. Gebis, S.William, C. Kozyrakis, D. Patterson, VIRAM1: A Media-Oriented Vector Processor with Embedded DRAM", 41st Design Automation Student Design Contenst, San Diego, CA, June 2004Google Scholar
- A. Eichenberger, K. O'Brien, K. O'Brien. Processor Optimizing Compiler for the Cell. The Fourteenth International Conference on Parallel Architectures and Compilation Techniques. Saint Louis, Missouri September, 2005 Google Scholar
Digital Library
- Girish Venkataramani, Walid Najjar, Nader Bagherzadeh, Synthesis and Design Tools: A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture. Proceedings of the 2001 International conference on Compilers, architecture, and synthesis for embedded systems, Nov 2001 Google Scholar
Digital Library
- Girish Venkataramani, Walid Najjar, Fadi Kurdahi, Nader Bagherzadeh, Wim Bohm, Jeff Hammes. Automatic compilation to a coarse-grained reconfigurable system-opn-chip. November 2003 ACM Transactions on Embedded Computing Systems (TECS), Vol. 2 Issue 4 Google Scholar
Digital Library
- J.P.Hammes. The SA-C Language. www.cs.colostate.edu/cameron Colorado State University. 2001Google Scholar
- Weihua Jiang, Chao Mei, Bo Huang, Jianhui Li, Jiahua Zhu, Bingyu Zang, Chuanqi Zhu "Boosting the Performance of Multimedia Applications Using SIMD Instructions" The 15th International Conference on Compiler Construction. April 2005 Edinburgh, Scotland Google Scholar
Digital Library
- Jiahua Zhu, Bingyu Zang, Chuanqi Zhu "Overflow Controlled SIMD Arithmetic" The 17th International Workshop on Languages and Compilers for Parallel Computing. Sep, 2004 West Lafayette, Indiana, USA Google Scholar
Digital Library
- J.M. Anderson, Decomposition for multiprocessors. PhD thesis, Stanford University, Standford, CA, 1997Google Scholar
- J.M. Anderson, M.S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, June 1993, 112--125 Google Scholar
Digital Library
- Bruce Schneier, Applied Cryptography:Protocols,Algorithms,and Source Code in C Second Edition. Wiley Publisher. Jan 1996 Google Scholar
Digital Library
- C. Hsieh, T. Lin, "VLSI Architecture For Block-Matching Motion Estimation Algorithm," IEEE Transaction On Circuits and Systems for Video Tech, June 1992, Vol. 2, 169--175Google Scholar
- M. Kandemir A. Choudharyy J. Ramanujamz P. Banerjeey, A Matrix-Based Approach to the Global Locality Optimization Problem, Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, Paris France Google Scholar
Digital Library
- Michael E. Wolf, Monica S. Lam, "A Data Locality Optimizing Algorithm" ACM SIGPLAN Conference on Programming Language Design and Implementation. 1991, Ontario, Canada Google Scholar
Digital Library
- Kathryn S. McKinley, Steve Carr, Chau-Wen Tseng. Improving Data Locality with Loop Transformations. ACM Transactions on Programming Languages and Systems. Vol. 18 No. 4 July 1996 Google Scholar
Digital Library
- S.S. Muchnick. Advanced Compiler Design and Implementation, Morgan Kaufmann, 1997. Google Scholar
Digital Library






Comments