skip to main content
research-article

Scheduling of synchronous data flow models onto scratchpad memory-based embedded processors

Published:06 December 2013Publication History
Skip Abstract Section

Abstract

In this article, we propose a heuristic algorithm for scheduling synchronous data flow (SDF) models on scratch pad memory (SPM) enhanced processors with the objective of minimizing its steady-state execution time. The task involves partitioning the limited on-chip SPM for actor code and data buffer, and executing actors in such a manner that the physical SPM is time shared with different actors and buffers (formally defined as code overlay and data overlay, respectively). In our setup, a traditional minimum buffer schedule could result in very high code overlay overhead and therefore may not be optimal. To reduce the number of direct memory access (DMA) transfers, actors need to be grouped into segments. Prefetching of code and data overlay that overlaps DMA transfers with actor executions also need to be exploited. The efficiency of the our heuristic was evaluated by compiling ten stream applications onto one synergistic processing engine (SPE) of an IBM Cell Broadband Engine. We compare the performance results of our heuristic approach with a minimum buffer scheduling approach and a 3-stage ILP approach, and show that our heuristic is able to generate high quality solutions with fast algorithm run time.

References

  1. Angiolini, F., Benini, L., and Caprara, A. 2003. Polynomial-time algorithm for on-chip scratchpad memory partitioning. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'03). ACM, New York, 318--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Angiolini, F., Menichelli, F., Ferrero, A., Benini, L., and Olivieri, M. 2004. A post-compiler approach to scratchpad mapping of code. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'04). ACM, New York, 259--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Avissar, O., Barua, R., and Stewart, D. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1, 6--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Baker, M. A., Panda, A., Ghadge, N., Kadne, A., and Chatha, K. S. 2010. A performance model and code overlay generator for scratchpad enhanced embedded processors. In Proceedings of the eighth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES'10). ACM, New York, 287--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bandyopadhyay, S. 2006. Automated memory allocation of actor code and data buffer in heterochronous dataflow models to scratchpad memory. Tech. rep. No. UCB/EECS-2006-105.Google ScholarGoogle Scholar
  6. Bandyopadhyay, S., Feng, T. H., Patel, H. D., and Lee, E. A. 2008. A scratchpad memory allocation scheme for dataflow models. Tech. Rep., University of Berkeley, Berkeley, CA.Google ScholarGoogle Scholar
  7. Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., and Hanrahan, P. 2004. Brook for GPUS: Stream computing on graphics hardware. In Proceedings of the ACM SIGGRAPH Papers (SIGGRAPH'04). ACM, New York, 777--786. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Che, W. and Chatha, K. 2010. Scheduling of synchronous data flow models on scratchpad memory based embedded processors. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 205--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Egger, B., Kim, C., Jang, C., Nam, Y., Lee, J., and Min, S. L. 2006. A dynamic code placement technique for scratchpad memory using postpass optimization. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'06). ACM, New York, 223--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Flachs, B., Asano, S., Dhong, S., Hotstee, P., Gervais, G., and Kim, R. e. a. 2005. A streaming processing unit for a cell processor. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC'05). Digest of Technical Papers, Vol. 1, 134--135.Google ScholarGoogle Scholar
  11. Janapsatya, A., Ignjatović, A., and Parameswaran, S. 2006. A novel instruction scratchpad memory optimization method based on concomitance metric. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'06). IEEE Press, 612--617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jantsch, A. 2003. Modeling Embedded Systems and SoC's: Concurrency and Time in Models of Computation; Electronic Version. Morgan Kaufmann Series in Systems on Silicon, Elsevier. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jung, S. C., Shrivastava, A., and Bai, K. 2010. Dynamic code mapping for limited local memory systems. In Proceedings of the 21st IEEE International Conference on Application-Specific Systems Architectures and Processors (ASAP). 13--20.Google ScholarGoogle Scholar
  14. Kistler, M., Perrone, M., and Petrini, F. 2006. Cell multiprocessor communication network: Built for speed. Micro, IEEE 26, 3, 10--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. MIT. a. Streamit benchmarks. http://groups.csail.mit.edu/cag/streamit/shtml/benchmarks.shtml.Google ScholarGoogle Scholar
  16. MIT. b. Streamit compiler source code. http://groups.csail.mit.edu/cag/streamit/restricted/files.shtml.Google ScholarGoogle Scholar
  17. Nguyen, N., Dominguez, A., and Barua, R. 2005. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'05). ACM, New York, 115--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Owens, J. 2007. GPU architecture overview. In ACM SIGGRAPH Courses (SIGGRAPH'07). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Pabalkar, A., Shrivastava, A., Kannan, A., and Lee, J. 2008. SDRM: Simultaneous determination of regions and function-to-region mapping for scratchpad memories. In Proceedings of the 15th International Conference on High Performance Computing (HiPC'08). Springer-Verlag, Berlin, Heidelberg, 569--582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Pham, D., Aipperspach, T., Boerstler, D., Bolliger, M., Chaudhry, R., Cox, D., Harvey, P., Harvey, P., Hofstee, H., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., Pham, M., Pille, J., Posluszny, S., Riley, M., Stasiak, D., Suzuoki, M., Takahashi, O., Warnock, J., Weitzel, S., Wendel, D., and Yazawa, K. 2006. Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor. IEEE J. Sol.-State Circ. 41, 1, 179--196.Google ScholarGoogle ScholarCross RefCross Ref
  21. Steinke, S., Wehmeyer, L., Lee, B.-S., and Marwedel, P. 2002. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, 2002. 409--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Thies, W., Karczmarek, M., and Amarasinghe, S. 2002. Streamit: A language for streaming applications. In Proceedings of the International Conference on Compiler Construction. Grenoble, France. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Truong, L. 2009. Low power consumption and a competitive price tag make the six-core tms320c6472 ideal for high-performance applications. White Paper, Texas Instruments.Google ScholarGoogle Scholar
  24. Verma, M. and Marwedel, P. 2006. Overlay techniques for scratchpad memories in low power embedded processors. IEEE Trans. VLSI Syst. 14, 8, 802--815. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Verma, M., Wehmeyer, L., and Marwedel, P. 2004. Dynamic overlay of scratchpad memory for energy minimization. In Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'04). ACM, New York, 104--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Willink, E. D., Eker, J., and Janneck, J. W. 2002. Programming specification in CAL. In Proceedings of OOPSLA Workshop Generative Technology Context Model-Driven Architecture.Google ScholarGoogle Scholar

Index Terms

  1. Scheduling of synchronous data flow models onto scratchpad memory-based embedded processors

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Article Metrics

          • Downloads (Last 12 months)5
          • Downloads (Last 6 weeks)0

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!