Abstract
In this article, we propose a heuristic algorithm for scheduling synchronous data flow (SDF) models on scratch pad memory (SPM) enhanced processors with the objective of minimizing its steady-state execution time. The task involves partitioning the limited on-chip SPM for actor code and data buffer, and executing actors in such a manner that the physical SPM is time shared with different actors and buffers (formally defined as code overlay and data overlay, respectively). In our setup, a traditional minimum buffer schedule could result in very high code overlay overhead and therefore may not be optimal. To reduce the number of direct memory access (DMA) transfers, actors need to be grouped into segments. Prefetching of code and data overlay that overlaps DMA transfers with actor executions also need to be exploited. The efficiency of the our heuristic was evaluated by compiling ten stream applications onto one synergistic processing engine (SPE) of an IBM Cell Broadband Engine. We compare the performance results of our heuristic approach with a minimum buffer scheduling approach and a 3-stage ILP approach, and show that our heuristic is able to generate high quality solutions with fast algorithm run time.
- Angiolini, F., Benini, L., and Caprara, A. 2003. Polynomial-time algorithm for on-chip scratchpad memory partitioning. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'03). ACM, New York, 318--326. Google Scholar
Digital Library
- Angiolini, F., Menichelli, F., Ferrero, A., Benini, L., and Olivieri, M. 2004. A post-compiler approach to scratchpad mapping of code. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'04). ACM, New York, 259--267. Google Scholar
Digital Library
- Avissar, O., Barua, R., and Stewart, D. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1, 6--26. Google Scholar
Digital Library
- Baker, M. A., Panda, A., Ghadge, N., Kadne, A., and Chatha, K. S. 2010. A performance model and code overlay generator for scratchpad enhanced embedded processors. In Proceedings of the eighth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES'10). ACM, New York, 287--296. Google Scholar
Digital Library
- Bandyopadhyay, S. 2006. Automated memory allocation of actor code and data buffer in heterochronous dataflow models to scratchpad memory. Tech. rep. No. UCB/EECS-2006-105.Google Scholar
- Bandyopadhyay, S., Feng, T. H., Patel, H. D., and Lee, E. A. 2008. A scratchpad memory allocation scheme for dataflow models. Tech. Rep., University of Berkeley, Berkeley, CA.Google Scholar
- Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., and Hanrahan, P. 2004. Brook for GPUS: Stream computing on graphics hardware. In Proceedings of the ACM SIGGRAPH Papers (SIGGRAPH'04). ACM, New York, 777--786. Google Scholar
Digital Library
- Che, W. and Chatha, K. 2010. Scheduling of synchronous data flow models on scratchpad memory based embedded processors. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 205--212. Google Scholar
Digital Library
- Egger, B., Kim, C., Jang, C., Nam, Y., Lee, J., and Min, S. L. 2006. A dynamic code placement technique for scratchpad memory using postpass optimization. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'06). ACM, New York, 223--233. Google Scholar
Digital Library
- Flachs, B., Asano, S., Dhong, S., Hotstee, P., Gervais, G., and Kim, R. e. a. 2005. A streaming processing unit for a cell processor. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC'05). Digest of Technical Papers, Vol. 1, 134--135.Google Scholar
- Janapsatya, A., Ignjatović, A., and Parameswaran, S. 2006. A novel instruction scratchpad memory optimization method based on concomitance metric. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'06). IEEE Press, 612--617. Google Scholar
Digital Library
- Jantsch, A. 2003. Modeling Embedded Systems and SoC's: Concurrency and Time in Models of Computation; Electronic Version. Morgan Kaufmann Series in Systems on Silicon, Elsevier. Google Scholar
Digital Library
- Jung, S. C., Shrivastava, A., and Bai, K. 2010. Dynamic code mapping for limited local memory systems. In Proceedings of the 21st IEEE International Conference on Application-Specific Systems Architectures and Processors (ASAP). 13--20.Google Scholar
- Kistler, M., Perrone, M., and Petrini, F. 2006. Cell multiprocessor communication network: Built for speed. Micro, IEEE 26, 3, 10--23. Google Scholar
Digital Library
- MIT. a. Streamit benchmarks. http://groups.csail.mit.edu/cag/streamit/shtml/benchmarks.shtml.Google Scholar
- MIT. b. Streamit compiler source code. http://groups.csail.mit.edu/cag/streamit/restricted/files.shtml.Google Scholar
- Nguyen, N., Dominguez, A., and Barua, R. 2005. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'05). ACM, New York, 115--125. Google Scholar
Digital Library
- Owens, J. 2007. GPU architecture overview. In ACM SIGGRAPH Courses (SIGGRAPH'07). ACM, New York. Google Scholar
Digital Library
- Pabalkar, A., Shrivastava, A., Kannan, A., and Lee, J. 2008. SDRM: Simultaneous determination of regions and function-to-region mapping for scratchpad memories. In Proceedings of the 15th International Conference on High Performance Computing (HiPC'08). Springer-Verlag, Berlin, Heidelberg, 569--582. Google Scholar
Digital Library
- Pham, D., Aipperspach, T., Boerstler, D., Bolliger, M., Chaudhry, R., Cox, D., Harvey, P., Harvey, P., Hofstee, H., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., Pham, M., Pille, J., Posluszny, S., Riley, M., Stasiak, D., Suzuoki, M., Takahashi, O., Warnock, J., Weitzel, S., Wendel, D., and Yazawa, K. 2006. Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor. IEEE J. Sol.-State Circ. 41, 1, 179--196.Google Scholar
Cross Ref
- Steinke, S., Wehmeyer, L., Lee, B.-S., and Marwedel, P. 2002. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, 2002. 409--415. Google Scholar
Digital Library
- Thies, W., Karczmarek, M., and Amarasinghe, S. 2002. Streamit: A language for streaming applications. In Proceedings of the International Conference on Compiler Construction. Grenoble, France. Google Scholar
Digital Library
- Truong, L. 2009. Low power consumption and a competitive price tag make the six-core tms320c6472 ideal for high-performance applications. White Paper, Texas Instruments.Google Scholar
- Verma, M. and Marwedel, P. 2006. Overlay techniques for scratchpad memories in low power embedded processors. IEEE Trans. VLSI Syst. 14, 8, 802--815. Google Scholar
Digital Library
- Verma, M., Wehmeyer, L., and Marwedel, P. 2004. Dynamic overlay of scratchpad memory for energy minimization. In Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'04). ACM, New York, 104--109. Google Scholar
Digital Library
- Willink, E. D., Eker, J., and Janneck, J. W. 2002. Programming specification in CAL. In Proceedings of OOPSLA Workshop Generative Technology Context Model-Driven Architecture.Google Scholar
Index Terms
Scheduling of synchronous data flow models onto scratchpad memory-based embedded processors
Recommendations
A performance model and code overlay generator for scratchpad enhanced embedded processors
CODES/ISSS '10: Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesisSoftware managed scratchpad memories (SPMs) provide improved performance and power in embedded processors by reducing required hardware resources. Performance depends strongly on the scheme used to map code and data onto the SPM, but generating optimal ...
Scheduling of synchronous data flow models on scratchpad memory based embedded processors
ICCAD '10: Proceedings of the International Conference on Computer-Aided DesignMany embedded processors incorporate scratchpad memories (SPM) due to their lower power consumption characteristics. SPMs are utilized to host both code and data, often on the same physical unit. Synchronous dataflow (SDF) is a popular format for ...
Overlay techniques for scratchpad memories in low power embedded processors
Energy consumption is one of the important parameters to be optimized during the design of portable embedded systems. Thus, most of the contemporary portable devices feature low-power processors coupled with on-chip memories (e.g., caches, scratchpads). ...






Comments