ABSTRACT
Increasing complexity in the communication patterns of embedded applications parallelized over multiple processing units makes it difficult to continue using the traditional bus-based on-chip communication techniques. The main contribution of this paper is to demonstrate the importance of compiler technology in reducing power consumption of applications designed for emerging multi processor, NoC (Network-on-Chip) based embedded systems. Specifically, we propose and evaluate a compiler-directed approach to NoC power management in the context of array-intensive applications, used frequently in embedded image/video processing. The unique characteristic of the compiler-based approach proposed in this paper is that it increases the idle periods of communication channels by reusing the same set of channels for as many communication messages as possible. The unused channels in this case take better advantage of the underlying power saving mechanism employed by the network architecture. However, this channel reuse optimization should be applied with care as it can hurt performance if two or more simultaneous communications are mapped onto the same set of channels. Therefore, the problem addressed in this paper is one of reducing the number of channels used to implement a set of communications without increasing the communication latency significantly. To test the effectiveness of our approach, we implemented it within an optimizing compiler and performed experiments using twelve application codes and a network simulation environment. Our experiments show that the proposed compiler-based approach is very successful in practice and works well under both hardware based and software based channel turn-off schemes.
- Mediabench. http://cares.icsl.ucla.edu/MediaBench/.Google Scholar
- Mibench. http://www.eecs.umich.edu/mibench.Google Scholar
- V. S. Adve and M. K. Vernon. Performance analysis of mesh interconnection networks with deterministic routing. IEEE Trans. Parallel Distrib. Syst., 5(3):225--246, 1994. Google Scholar
Digital Library
- A. Agarwal. Limits on interconnection network performance. IEEE Trans. Parallel Distrib. Syst., 2(4):398--412, 1991. Google Scholar
Digital Library
- G. Ascia, V. Catania, and M. Palesi. Multi-objective mapping for mesh-based NoC architectures. In Proc. the International Conference on Hardware/Software Codesign and System Synthesis, Sept. 2004. Google Scholar
Digital Library
- E. Ayguad and J. Torres. Partitioning the statement per iteration space using non-singular matrices. In Proc. 7th ACM International Conference on Supercomputing ICS, pages 407--415, Tokyo, Japan, July 1993. Google Scholar
Digital Library
- P. Banerjee, J. A. Chandy, M. Gupta, E. W. H. IV, J. G. Holm, A. Lain, D. J. Palermo, S. Ramaswamy, and E. Su. The paradigm compiler for distributed-memory multicomputers. Computer, 28(10):37--47, 1995. Google Scholar
Digital Library
- R. K. Barua. Maps: A Compiler-Managed Memory System for Raw Machines. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 1999.Google Scholar
Digital Library
- L. Benini and G. D. Micheli. Networks on chips: a new Soc paradigm. IEEE Computer, 35(1):70--78, 2002. Google Scholar
Digital Library
- R. V. Boppana and S. Chalasani. A framework for designing deadlock-free wormhole routing algorithms. IEEE Trans. Parallel Distrib. Syst., 7(2):169--183, 1996. Google Scholar
Digital Library
- Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, and S. Ranka. Fortran 90d/hpf compiler for distributed memory mimd computers: design, implementation, and performance results. In Supercomputing '93: Proceedings of the 1993 ACM/IEEE conference on Supercomputing, pages 351--360, New York, NY, USA, 1993. ACM Press. Google Scholar
Digital Library
- S. Chakrabarti, M. Gupta, and J.-D. Choi. Global communication analysis and optimization. In PLDI '96: Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation, pages 68--78, New York, NY, USA, 1996. ACM Press. Google Scholar
Digital Library
- W. J. Dally and C. L. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. Comput., 36(5):547--553, 1987. Google Scholar
Digital Library
- W. J. Dally and B. Towles. Route packets, not wires: On-chip inteconnectoin networks. In Proc. the 38th Conference on Design Automation, 2001. Google Scholar
Digital Library
- J. B. Duato, S. Yalamanchili, and L. Ni. Interconnection Networks. Morgan Kaufmann Publishers, 2002. Google Scholar
Digital Library
- N. Eisley and L.-S. Peh. High-level power analysis of on-chip networks. In Proc. the 7th International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Sept. 2004. Google Scholar
Digital Library
- M. Gomaa, C. Scarbrough, T. N. Vijaykumar, and I. Pomeranz. Transient-fault recovery for chip multiprocessors. SIGARCH Comput. Archit. News, 31(2), 2003. Google Scholar
Digital Library
- P. Hazucha and C. Svensson. Impact of cmos technology scaling on the atmospheric neutron soft error rate. IEEE Transactions on Nuclear Science, 47(6), 2000.Google Scholar
Cross Ref
- S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66--80, Aug. 1992. Google Scholar
Digital Library
- R. Ho, K. Mai, and M. Horowitz. Efficient on-chip global interconnects. In Proc. Symposium on VLSI Circuits, June 2003.Google Scholar
Cross Ref
- J. Hu and R. Marculescu. Exploiting the routing flexibility for energy/performance aware mapping of regular NoC architectures. In Proc. the Design Automation and Test in Europe, Mar. 2003. Google Scholar
Digital Library
- J. Hu and R. Marculescu. Energy- and performance-aware mapping for regular Noc architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 24(4):551--562, Apr. 2005. Google Scholar
Digital Library
- A. Jose, G. Patounakis, and K. Shepard. A 8gbps on-chip serial link. Technical Report TR10-03-01, Columbia University, Nov. 2003.Google Scholar
- E. J. Kim, K. H. Yum, G. Link, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, M. Yousif, and C. R. Das. Energy optimization techniques in cluster interconnects. In Proc. the International Symposium on Low Power Electronics and Design, Aug. 2003. Google Scholar
Digital Library
- I. Kolcu. Personal communication.Google Scholar
- W. Lee, D. Puppin, S. Swenson, and S. Amarasinghe. Convergent scheduling. In Proc. the 35th International Symposium on Microarchitecture, Nov. 2002. Google Scholar
Digital Library
- P. Mohapatra. Wormhole routing techniques for directed connected multicomputer systems. ACM Computing Surveys, 30(3):374--410, Sept. 1998. Google Scholar
Digital Library
- R. Nagarajan, D. Burger, K. S. McKinley, C. Lin, S. W. Keckler, and S. K. Kushwaha. Static placement, dynamic issue (SPDI) scheduling for EDGE architectures. In Proc. International Conference on Parallel Architectures and Compilation Techniques, Oct. 2004. Google Scholar
Digital Library
- L. M. Ni and P. K. McKinley. A survey of wormhole routing techniques in direct networks. Computer, 26(2):62--76, 1993. Google Scholar
Digital Library
- C. S. Patel. Power constrained design of multiprocessor interconnection networks. In Proc. the International Conference on Computer Design, Washington, DC, USA, 1997. Google Scholar
Digital Library
- V. Raghunathan, M. B. Srivastava, and R. K. Gupta. A survey of techniques for energy efficient on-chip communication. In Proc. the 40th Design Automation Conference, 2003. Google Scholar
Digital Library
- S. K. Reinhardt and S. S. Mukherjee. Transient fault detection via simultaneous multithreading. In ISCA, 2000. Google Scholar
Digital Library
- K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, and C. R. Moore. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In Proc. the 30th Annual International Symposium on Computer Architecture, 2003. Google Scholar
Digital Library
- L. Shang, L.-S. Peh, and N. K. Jha. Dynamic voltage scaling with links for power optimization of interconnection networks. In Proc. the International Symposium on High-Performance Computer Architecture, Feb. 2003. Google Scholar
Digital Library
- D. Shin and J. Kim. Power-aware communication optimization for networks-on-chips with voltage scalable links. In Proc. the International Conference on Hardware/Software Codesign and System Synthesis, Sept. 2004. Google Scholar
Digital Library
- T. Simunic and S. Boyd. Managing power consumption in networks on chip. In Proc. the Conference on Design, Automation and Test in Europe. IEEE Computer Society, 2002. Google Scholar
Digital Library
- V. Soteriou and L.-S. Peh. Dynamic power management for power optimization of interconnection networks using on/off links. In 11th Symposium on High Performance Interconnects (Hot-I), 2003.Google Scholar
Cross Ref
- V. Soteriou and L.-S. Peh. Design space exploration of power-aware on/off interconnection networks. In Proc. the 22nd International Conference on Computer Design, Oct. 2004. Google Scholar
Digital Library
- F. Vermeulen, F. Catthor, D. Verkest, and H. DeMan. Formalized three-layer system-level reuse model and methodology for embedded data-dominated applications. In DATE '00: Proceedings of the conference on Design, automation and test in Europe, pages 92--98, New York, NY, USA, 2000. ACM Press. Google Scholar
Digital Library
- E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: Raw machines. Computer, 30(9):86--93, 1997. Google Scholar
Digital Library
- H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik. Orion: a power-performance simulator for interconnection networks. In Proc. the 35th International Symposium on Microarchitecture, Nov. 2002. Google Scholar
Digital Library
- W. Wolf. The future of multiprocessor systems-on-chips. In DAC '04: Proceedings of the 41st annual conference on Design automation, pages 681--685, New York, NY, USA, 2004. ACM Press. Google Scholar
Digital Library
- F. Worm, P. Ienne, P. Thiran, and G. D. Micheli. An adaptive low power transmission scheme for on-chip networks. In Proc. the International System Synthesis Symposium, 2002. Google Scholar
Digital Library
- N. D. Zervas, K. Masselos, and C. Goutis. Code transformations for embedded multimedia applications: impact on power and performance. In Proceedings of ISCA Power-Driven Microarchitecture Workshop, 1998.Google Scholar
Index Terms
Compiler-directed channel allocation for saving power in on-chip networks
Recommendations
Compiler-directed channel allocation for saving power in on-chip networks
Proceedings of the 2006 POPL ConferenceIncreasing complexity in the communication patterns of embedded applications parallelized over multiple processing units makes it difficult to continue using the traditional bus-based on-chip communication techniques. The main contribution of this paper ...
Compiler directed network-on-chip reliability enhancement for chip multiprocessors
LCTES '10: Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systemsChip multiprocessors (CMPs) are expected to be the building blocks for future computer systems. While architecting these emerging CMPs is a challenging problem on its own, programming them is even more challenging. As the number of cores accommodated in ...
Compiler directed network-on-chip reliability enhancement for chip multiprocessors
LCTES '10Chip multiprocessors (CMPs) are expected to be the building blocks for future computer systems. While architecting these emerging CMPs is a challenging problem on its own, programming them is even more challenging. As the number of cores accommodated in ...







Comments