skip to main content
article

Generalized cache tiling for dataflow programs

Published:13 June 2016Publication History
Skip Abstract Section

Abstract

The dataflow programming paradigm has facilitated the expression of a great number of algorithmic applications on embedded platforms in a wide variety of applicative domains. Whether it is a Domain Specific Language (DSL) or a more generalistic one, the dataflow paradigm allows to intuitively state the successive steps of an algorithm and link them through data communications. The optimization of cache-memory in this context has been a subject of interest since the early '90s as the reuse and communication of data between the agents of a dataflow program is a key factor in achieving a high-performance implementation within the reduced limits of embedded architectures. In order to improve data reuse among the dataflow agents we propose a modelisation of the communications and data usage within a dataflow program. Aside from providing an estimate of the amount of cache-misses that a given scheduling generates, this model allows us to specify the associated optimization problem in a manner that is identical to loop-nest tiling. Improving on the existing state-of-the-art methods we extend our tiling technique to include non-uniform dependencies on one of the dimensions of the iteration space. When applying the proposed technique to dataflow programs expressed within the StreamIt framework we are able to showcase significant reductions in the number of cache-misses for a majority of test-cases when compared to existing optimizations.

References

  1. A. Arasu, S. Babu, and J. Widom. The cql continuous query language: semantic foundations and query execution. The Journal VLDBThe International Journal on Very Large Data Bases, 15(2):121–142, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Beldiceanu and S. Demassey. Global constraint catalog. http://www.emn.fr/z-info/sdemasse /gccat/, 2013.Google ScholarGoogle Scholar
  3. G. Berry and G. Gonthier. The esterel synchronous programming language: Design, semantics, implementation. Sci. Comput. Program., 19(2):87–152, Nov. 1992. ISSN 0167-6423. doi: 10.1016/0167-6423(92)90005-V. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Bilsen, M. Engels, R. Lauwereins, and J. Peperstraete. Cyclo-static data flow. In Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on, volume 5, pages 3255–3258 vol.5, May 1995. doi: 10.1109/ICASSP.1995.479579.Google ScholarGoogle Scholar
  5. A. Brito, C. Fetzer, H. Sturzrehm, and P. Felber. Speculative out-of-order event processing with software transaction memory. In Proceedings of the second international conference on Distributed event-based systems, pages 265–275. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P.-Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In Application-Specific Systems, Architectures and Processors, 1997. Proceedings., IEEE International Conference on, pages 229–238, Jul 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. doi: 10.1109/ASAP.1997.606829.Google ScholarGoogle Scholar
  8. D. Carney, U. C ¸ etintemel, A. Rasin, S. Zdonik, M. Cherniack, and M. Stonebraker. Operator scheduling in a data stream manager. In Proceedings of the 29th international conference on Very large data bases-Volume 29, pages 838–849. VLDB Endowment, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Consel, H. Hamdi, L. Réveillère, L. Singaravelu, H. Yu, and C. Pu. Spidle: a dsl approach to specifying streaming applications. In Generative Programming and Component Engineering, pages 1–17. Springer, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. B. Dennis. First version of a data flow procedure language. In Programming Symposium, pages 362– 376. Springer, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. M. Farhad, Y. Ko, B. Burgstaller, and B. Scholz. Orchestration by approximation: Mapping stream programs onto multicore architectures. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pages 357–368, New York, NY, USA, 2011. ACM. ISBN 978-1- 4503-0266-1. doi: 10.1145/1950365.1950406. URL http://doi.acm.org/10.1145/1950365.1950406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G12. MiniZinc challenge. http://www.minizinc .org/, 2015.Google ScholarGoogle Scholar
  13. T. Gautier, J. Lima, N. Maillard, and B. Raffin. Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures. In Parallel Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, pages 1299–1308, May 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. doi: 10.1109/IPDPS.2013.66.Google ScholarGoogle Scholar
  15. M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XII, pages 151– 162, New York, NY, USA, 2006. ACM. ISBN 1-59593-451-0. doi: 10.1145/1168857.1168877. URL http://doi.acm.org/10.1145/1168857.1168877. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Goubier, R. Sirdey, S. Louise, and V. David. σc: A programming model and language for embedded manycores. In Algorithms and Architectures for parallel processing, pages 385–394. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data flow programming language lustre. Proceedings of the IEEE, 79(9):1305–1320, Sep 1991. ISSN 0018-9219. doi: 10.1109/5.97300.Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Hirzel, R. Soulé, S. Schneider, B. Gedik, and R. Grimm. A catalog of stream processing optimizations. ACM Comput. Surv., 46(4):46:1–46:34, Mar. 2014. ISSN 0360-0300. doi: 10.1145/2528412. URL http://doi.acm.org/10.1145/2528412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the 15th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL ’88, pages 319–329, New York, NY, USA, 1988. ACM. ISBN 0- 89791-252-7. doi: 10.1145/73560.73588. URL http://doi.acm.org/10.1145/73560.73588. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’08, pages 114–124, New York, NY, USA, 2008. ACM. ISBN 978-1- 59593-860-2. doi: 10.1145/1375581.1375596. URL http://doi.acm.org/10.1145/1375581.1375596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Lam. Software pipelining: An effective scheduling technique for vliw machines. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation, PLDI ’88, pages 318–328, New York, NY, USA, 1988. ACM. ISBN 0-89791-269-1. doi: 10.1145/53990.54022. URL http://doi.acm.org/10.1145/53990.54022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Lee and D. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. Computers, IEEE Transactions on, C-36(1):24–35, Jan 1987. ISSN 0018-9340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. doi: 10.1109/TC.1987.5009446.Google ScholarGoogle Scholar
  24. S. Punyamurtula and V. Chaudhary. Minimum dependence distance tiling of nested loops with non-uniform dependences. In Parallel and Distributed Processing, 1994. Proceedings. Sixth IEEE Symposium on, pages 74–81, Oct 1994. doi: 10.1109/SPDP.1994.346179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J.-C. Régin. A filtering algorithm for constraints of difference in csps. In Proceedings of the Twelfth National Conference on Artificial Intelligence (Vol. 1), AAAI ’94, pages 362–367, Menlo Park, CA, USA, 1994. American Association for Artificial Intelligence. ISBN 0-262-61102-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J.-C. Régin. Generalized arc consistency for global cardinality constraint. In Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1, AAAI’96, pages 209–215. AAAI Press, 1996. ISBN 0-262-51091-X. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. F. Rossi, P. v. Beek, and T. Walsh. Handbook of Constraint Programming (Foundations of Artificial Intelligence). Elsevier Science Inc., New York, NY, USA, 2006. ISBN 0444527265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Sermulins, W. Thies, R. Rabbah, and S. Amarasinghe. Cache aware optimization of stream programs. In Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES ’05, pages 115–126, New York, NY, USA, 2005. ACM. ISBN 1-59593-018-3. doi: 10.1145/1065910.1065927. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R.. Szymanek and K. Kuchcinski. Jacop - java constraint programming solver. http://jacop.osolpro.com/, 2013.Google ScholarGoogle Scholar
  30. W. Thies, M. Karczmarek, and S. P. Amarasinghe. Streamit: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction, CC ’02, pages 179–196, London, UK, UK, 2002. Springer-Verlag. ISBN 3-540- 43369-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Welsh, D. Culler, and E. Brewer. Seda: an architecture for well-conditioned, scalable internet services. ACM SIGOPS Operating Systems Review, 35(5):230– 243, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Wolfe. More iteration space tiling. In Proceedings of the 1989 ACM/IEEE Conference on Supercomputing, Supercomputing ’89, pages 655–664, New York, NY, USA, 1989. ACM. ISBN 0- 89791-341-8. doi: 10.1145/76263.76337. URL http://doi.acm.org/10.1145/76263.76337. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generalized cache tiling for dataflow programs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!