Abstract
Due to their great ability to parallelize at a very high integration level, Multi-Processors Systems-on-Chip (MPSoCs) are good candidates for systems and applications such as multimedia. Memory is becoming a key player for significant improvements in these applications (power, performance and area). The large amount of data manipulated by these applications requires high-capacity computing and memory. Lately, new programming models have been introduced. This leads to the need of new optimization and mapping techniques suitable for embedded systems and their programming models. This article presents novel approaches for combining memory optimization with mapping of data-driven applications while considering anti-dependence conflicts. Two different approaches are studied and integrated with existing mapping algorithms. The first approach (based on heuristic algorithms) keeps the graph transformation for memory optimization stage from the mapping stage and enables their combination in a design flow. The second approach (based on evolutionary algorithms) combines these two stages and integrates them in a unique stage. Some significant improvements are obtained for memory gain, communication load and physical links.
- Andy, D. P., Todor, S., Hristo, N., Mark, T., Simon, P., and Ed, F.D. 2008. Tool integration and interoperability challenges of a system-level design flow: A case study. In Proceedings of the 8th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation. Springer. Google Scholar
Digital Library
- Bacon, D. F., Graham, S. L., and Sharp, O. J. 1994. Compiler transformations for high-performance computing. ACM Comput. Surv. 26, 4, 345--420. Google Scholar
Digital Library
- Bastoul, C. 2004. Improving Data Locality in Static Control Programs. Université Pierre & Marie Curie, 182.Google Scholar
- Bondhugula, U. 2009. PLUTO: An automatic parallelizer and locality optimizer for multicores. http://pluto-compiler.sourceforge.netGoogle Scholar
- Carr, S. and Kennedy, K. 1994. Scalar replacement in the presence of conditional control flow. Softw. Pract. Exper. 24, 51--77. Google Scholar
Digital Library
- Catthoor, F., Wuytack, S., Greef, E. D., Banica, F., Nachtergaele, L., and Vandecappelle, A. 1998. Custom Memory Management Methodology -- Exploration of Memory Organisation for Embedded Multimedia System Design. Kluwer Academic Publishers. Google Scholar
Digital Library
- Chandraiah, P. and Domer, R. 2008. Code and data structure partitioning for parallel and flexible mpsoc specification using designer-controlled recoding. IEEE Trans Comput.-Aid. Des. Integr. Circ. Syst. 27, 1078--1090. Google Scholar
Digital Library
- Cierniak, M. and Li, W. 1995. Unifying data and control transformations for distributed shared-memory machines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 205--217. Google Scholar
Digital Library
- Coppola, M., Locatelli, R., Maruccia, G., Pieralisi, L., and Scandurra, A. 2004. Spidergon: A novel on-chip communication network. In Proceedings of the International Symposium on System-on-Chip. 15.Google Scholar
- Darte, A. 1999. On the complexity of loop fusion. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 149--157. Google Scholar
Digital Library
- Dick, R. P., Rhodes, D. L., and Wolf, W. 1998. TGFF: Task graphs for free. In Proceedings of the 6th International Workshop on Hardware/Software Codesign (CODES/CASHE’98). D. L. Rhodes Ed., 97--101. Google Scholar
Digital Library
- Fraboulet, A., Kodary, K., and Mignotte, A. 2001. Loop fusion for memory space optimization. In Proceedings of the 14th International Symposium on System Synthesis. 95--100. Google Scholar
Digital Library
- Girodias, B., Bouchebaba, Y., Nicolescu, G., Aboulhamid, E. M., Paulin, P., and Lavigueur, B. 2006. Application-level memory optimization for MPSoC. In Proceedings of the 17th IEEE International Workshop on Rapid System Prototyping. 169--178. Google Scholar
Digital Library
- Gordon, M. I., Thies, W., Karczmarek, M., Lin, J. S., Meli, A., Lamb, A., Leger, C., Wong, J., Hoffmann, H., Maze, D., and Amarasinghe, S. 2002. A stream compiler for communication-exposed architectures. SIGARCH Comput. Archit. News 30, 291--303. Google Scholar
Digital Library
- Gordon, M. I., Thies, W., and Amarasinghe, S. 2006. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating System. ACM. Google Scholar
Digital Library
- Greef, E. D. 1998. Storage size reduction for multimedia application. Ph.D. thesis, Katholieke Universiteit, Leuven.Google Scholar
- Guangyu, C., Li, F., Son, S. W., and Kandemir, M. 2008. Application mapping for chip multiprocessors. In Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08). 620--625. Google Scholar
Digital Library
- Hu, J. and Marculescu, R. 2005. Communication and task scheduling of application-specific networks-on-chip. IEE Proc. Comput. Digital Techn. 152, 643--651.Google Scholar
Cross Ref
- Hu, Q., Kjeldsberg, P. G., Vandecappelle, A., Palkovic, M., and Catthoor, F. 2007. Incremental hierarchical memory size estimation for steering of loop transformations. ACM Trans. Des. Autom. Electron. Syst. 12, 50. Google Scholar
Digital Library
- Jerraya, A. A. and Wayne, W. 2005. Multiprocessor Systems-on-Chips. Morgan Kaufmann.Google Scholar
- Kennedy, K. 2001. Fast greedy weighted fusion. Int. J. Parallel Program. 29, 463--491. Google Scholar
Digital Library
- Konaka, A., Coitb, D. W., and Smith, A. E. 2006. Multi-objective optimization using genetic algorithms: A tutorial. Reliability Engineering and System Safety 91, 992--1007Google Scholar
Cross Ref
- Markus, S. and Thomas, W. 1994. Mapping and scheduling by genetic algorithms. In Proceedings of the 3rd Joint International Conference on Vector and Parallel Processing: Parallel Processing. Springer. Google Scholar
Digital Library
- Meyer, B. H. and Thomas, D. E. 2007. Simultaneous synthesis of buses, data mapping and memory allocation for MPSoC. In Proceedings of the 5th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis. ACM. Google Scholar
Digital Library
- Olukotun, K., Nayfeh, B. A., Hammond, L., Wilson, K., and Chang, K. 1996. The case for a single chip multiprocessor. In Proceedings of the7th International Conference on Architectural Support for Programming Languages and Operating Systems. 2--11. Google Scholar
Digital Library
- Pasricha, S. and Dutt, N. 2006. COSMECA: Application specific co-synthesis of memory and communication architectures for MPSoC. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’06). 1--6. Google Scholar
Digital Library
- Paulin, P. G., Pilkington, C., Langevin, M., Bensoudane, E., Lyonnard, D., Benny, O., Lavigueur, B., Lo, D., Beltrame, G., Gagne, V., and Nicolescu, G. 2006. Parallel programming models for a multiprocessor SoC platform applied to networking and multimedia. IEEE Trans. VLSI Syst. 14, 667--680. Google Scholar
Digital Library
- Ruggiero, M., Guerri, A., Bertozzi, D., Milano, M., and Benini, L. 2008. A fast and accurate technique for mapping parallel applications on stream-oriented MPSoC platforms with communication awareness. Int. J. Parallel Program. 36, 3--36. Google Scholar
Digital Library
- Shih-Wei, L., Du, Z., Wu, G., and Lueh, G.-Y. 2006. Data and computation transformations for Brook streaming applications on multiprocessors. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’06). D. Zhaohui Ed. Google Scholar
Digital Library
- Tang, L. and Kumar, S. 2003. A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In Proceedings of the Euromicro Symposium on Digital System Design. S. Kumar Ed., 180--187. Google Scholar
Digital Library
- Thiele, L., Bacivarov, I., Haid, W., and Kai, H. 2007. Mapping applications to tiled multiprocessor embedded systems. In Proceedings of the 7th International Conference on Application of Concurrency to System Design (ACSD’07). 29--40. Google Scholar
Digital Library
- Wolf, M. E. and Lam, M. S. 1991. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 30--44. Google Scholar
Digital Library
Index Terms
Integrating Memory Optimization with Mapping Algorithms for Multi-Processors System-on-Chip
Recommendations
Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications
Multiprocessor System-on-Chip is one of the main drivers of the semiconductor industry revolution by enabling the integration of complex functionality on a single chip. The techniques for processor design and application optimizations can be combined ...
An SDRAM-aware router for networks-on-chip
Special section on the ACM IEEE international conference on formal methods and models for codesign (MEMOCODE) 2009Networks-on-chip (NoCs) may interface with lots of synchronous dynamic random access memories (SDRAM) to provide enough memory bandwidth and guaranteed quality-of-service for future systems-on-chip (SoCs). SDRAM is commonly controlled by a memory ...






Comments