Abstract
The dataflow programming paradigm has facilitated the expression of a great number of algorithmic applications on embedded platforms in a wide variety of applicative domains. Whether it is a Domain Specific Language (DSL) or a more generalistic one, the dataflow paradigm allows to intuitively state the successive steps of an algorithm and link them through data communications. The optimization of cache-memory in this context has been a subject of interest since the early '90s as the reuse and communication of data between the agents of a dataflow program is a key factor in achieving a high-performance implementation within the reduced limits of embedded architectures. In order to improve data reuse among the dataflow agents we propose a modelisation of the communications and data usage within a dataflow program. Aside from providing an estimate of the amount of cache-misses that a given scheduling generates, this model allows us to specify the associated optimization problem in a manner that is identical to loop-nest tiling. Improving on the existing state-of-the-art methods we extend our tiling technique to include non-uniform dependencies on one of the dimensions of the iteration space. When applying the proposed technique to dataflow programs expressed within the StreamIt framework we are able to showcase significant reductions in the number of cache-misses for a majority of test-cases when compared to existing optimizations.
- A. Arasu, S. Babu, and J. Widom. The cql continuous query language: semantic foundations and query execution. The Journal VLDBThe International Journal on Very Large Data Bases, 15(2):121–142, 2006. Google Scholar
Digital Library
- N. Beldiceanu and S. Demassey. Global constraint catalog. http://www.emn.fr/z-info/sdemasse /gccat/, 2013.Google Scholar
- G. Berry and G. Gonthier. The esterel synchronous programming language: Design, semantics, implementation. Sci. Comput. Program., 19(2):87–152, Nov. 1992. ISSN 0167-6423. doi: 10.1016/0167-6423(92)90005-V. Google Scholar
Digital Library
- G. Bilsen, M. Engels, R. Lauwereins, and J. Peperstraete. Cyclo-static data flow. In Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on, volume 5, pages 3255–3258 vol.5, May 1995. doi: 10.1109/ICASSP.1995.479579.Google Scholar
- A. Brito, C. Fetzer, H. Sturzrehm, and P. Felber. Speculative out-of-order event processing with software transaction memory. In Proceedings of the second international conference on Distributed event-based systems, pages 265–275. ACM, 2008. Google Scholar
Digital Library
- P.-Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In Application-Specific Systems, Architectures and Processors, 1997. Proceedings., IEEE International Conference on, pages 229–238, Jul 1997. Google Scholar
Digital Library
- doi: 10.1109/ASAP.1997.606829.Google Scholar
- D. Carney, U. C ¸ etintemel, A. Rasin, S. Zdonik, M. Cherniack, and M. Stonebraker. Operator scheduling in a data stream manager. In Proceedings of the 29th international conference on Very large data bases-Volume 29, pages 838–849. VLDB Endowment, 2003. Google Scholar
Digital Library
- C. Consel, H. Hamdi, L. Réveillère, L. Singaravelu, H. Yu, and C. Pu. Spidle: a dsl approach to specifying streaming applications. In Generative Programming and Component Engineering, pages 1–17. Springer, 2003. Google Scholar
Digital Library
- J. B. Dennis. First version of a data flow procedure language. In Programming Symposium, pages 362– 376. Springer, 1974. Google Scholar
Digital Library
- S. M. Farhad, Y. Ko, B. Burgstaller, and B. Scholz. Orchestration by approximation: Mapping stream programs onto multicore architectures. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pages 357–368, New York, NY, USA, 2011. ACM. ISBN 978-1- 4503-0266-1. doi: 10.1145/1950365.1950406. URL http://doi.acm.org/10.1145/1950365.1950406. Google Scholar
Digital Library
- G12. MiniZinc challenge. http://www.minizinc .org/, 2015.Google Scholar
- T. Gautier, J. Lima, N. Maillard, and B. Raffin. Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures. In Parallel Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, pages 1299–1308, May 2013. Google Scholar
Digital Library
- doi: 10.1109/IPDPS.2013.66.Google Scholar
- M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XII, pages 151– 162, New York, NY, USA, 2006. ACM. ISBN 1-59593-451-0. doi: 10.1145/1168857.1168877. URL http://doi.acm.org/10.1145/1168857.1168877. Google Scholar
Digital Library
- T. Goubier, R. Sirdey, S. Louise, and V. David. σc: A programming model and language for embedded manycores. In Algorithms and Architectures for parallel processing, pages 385–394. Springer, 2011. Google Scholar
Digital Library
- N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data flow programming language lustre. Proceedings of the IEEE, 79(9):1305–1320, Sep 1991. ISSN 0018-9219. doi: 10.1109/5.97300.Google Scholar
Cross Ref
- M. Hirzel, R. Soulé, S. Schneider, B. Gedik, and R. Grimm. A catalog of stream processing optimizations. ACM Comput. Surv., 46(4):46:1–46:34, Mar. 2014. ISSN 0360-0300. doi: 10.1145/2528412. URL http://doi.acm.org/10.1145/2528412. Google Scholar
Digital Library
- F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the 15th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL ’88, pages 319–329, New York, NY, USA, 1988. ACM. ISBN 0- 89791-252-7. doi: 10.1145/73560.73588. URL http://doi.acm.org/10.1145/73560.73588. Google Scholar
Digital Library
- M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’08, pages 114–124, New York, NY, USA, 2008. ACM. ISBN 978-1- 59593-860-2. doi: 10.1145/1375581.1375596. URL http://doi.acm.org/10.1145/1375581.1375596. Google Scholar
Digital Library
- M. Lam. Software pipelining: An effective scheduling technique for vliw machines. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation, PLDI ’88, pages 318–328, New York, NY, USA, 1988. ACM. ISBN 0-89791-269-1. doi: 10.1145/53990.54022. URL http://doi.acm.org/10.1145/53990.54022. Google Scholar
Digital Library
- E. Lee and D. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. Computers, IEEE Transactions on, C-36(1):24–35, Jan 1987. ISSN 0018-9340. Google Scholar
Digital Library
- doi: 10.1109/TC.1987.5009446.Google Scholar
- S. Punyamurtula and V. Chaudhary. Minimum dependence distance tiling of nested loops with non-uniform dependences. In Parallel and Distributed Processing, 1994. Proceedings. Sixth IEEE Symposium on, pages 74–81, Oct 1994. doi: 10.1109/SPDP.1994.346179. Google Scholar
Digital Library
- J.-C. Régin. A filtering algorithm for constraints of difference in csps. In Proceedings of the Twelfth National Conference on Artificial Intelligence (Vol. 1), AAAI ’94, pages 362–367, Menlo Park, CA, USA, 1994. American Association for Artificial Intelligence. ISBN 0-262-61102-3. Google Scholar
Digital Library
- J.-C. Régin. Generalized arc consistency for global cardinality constraint. In Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1, AAAI’96, pages 209–215. AAAI Press, 1996. ISBN 0-262-51091-X. Google Scholar
Digital Library
- F. Rossi, P. v. Beek, and T. Walsh. Handbook of Constraint Programming (Foundations of Artificial Intelligence). Elsevier Science Inc., New York, NY, USA, 2006. ISBN 0444527265. Google Scholar
Digital Library
- J. Sermulins, W. Thies, R. Rabbah, and S. Amarasinghe. Cache aware optimization of stream programs. In Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES ’05, pages 115–126, New York, NY, USA, 2005. ACM. ISBN 1-59593-018-3. doi: 10.1145/1065910.1065927. Google Scholar
Digital Library
- R.. Szymanek and K. Kuchcinski. Jacop - java constraint programming solver. http://jacop.osolpro.com/, 2013.Google Scholar
- W. Thies, M. Karczmarek, and S. P. Amarasinghe. Streamit: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction, CC ’02, pages 179–196, London, UK, UK, 2002. Springer-Verlag. ISBN 3-540- 43369-4. Google Scholar
Digital Library
- M. Welsh, D. Culler, and E. Brewer. Seda: an architecture for well-conditioned, scalable internet services. ACM SIGOPS Operating Systems Review, 35(5):230– 243, 2001. Google Scholar
Digital Library
- M. Wolfe. More iteration space tiling. In Proceedings of the 1989 ACM/IEEE Conference on Supercomputing, Supercomputing ’89, pages 655–664, New York, NY, USA, 1989. ACM. ISBN 0- 89791-341-8. doi: 10.1145/76263.76337. URL http://doi.acm.org/10.1145/76263.76337. Google Scholar
Digital Library
Index Terms
Generalized cache tiling for dataflow programs
Recommendations
Generalized cache tiling for dataflow programs
LCTES 2016: Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded SystemsThe dataflow programming paradigm has facilitated the expression of a great number of algorithmic applications on embedded platforms in a wide variety of applicative domains. Whether it is a Domain Specific Language (DSL) or a more generalistic one, ...
A graph theoretic approach to cache-conscious placement of data for direct mapped caches
ISMM '10: Proceedings of the 2010 international symposium on Memory managementCaches were designed to amortize the cost of memory accesses by moving copies of frequently accessed data closer to the processor. Over the years the increasing gap between processor speed and memory access latency has made the cache a bottleneck for ...
Reuse-Driven Tiling for Improving Data Locality
This paper applies unimodular transformations and tiling to improve data locality of a loop nest. Due to data dependences and reuse information, not all dimensions of the iteration space will and can be tiled. By using cones to represent data ...







Comments