skip to main content
research-article
Public Access

A TTL-based Approach for Data Aggregation in Geo-distributed Streaming Analytics

Authors Info & Claims
Published:19 June 2019Publication History
Skip Abstract Section

Abstract

Streaming analytics require real-time aggregation and processing of geographically distributed data streams continuously over time. The typical analytics infrastructure for processing such streams follow a hub-and-spoke model, comprising multiple edges connected to a center by a wide-area network (WAN). The aggregation of such streams often require that the results be available at the center within a certain acceptable delay bound. Further, the WAN bandwidth available between the edges and the center is often scarce or expensive, requiring that the traffic between the edges and the center be minimized. We propose a novel Time-to-Live (TTL-)based mechanism for real-time aggregation that provably optimizes both delay and traffic, providing a theoretical basis for understanding the delay-traffic tradeoff that is fundamental to streaming analytics. Our TTL-based optimization model provides analytical answers to how much aggregation should be performed at the edge versus the center, how much delay can be incurred at the edges, and how the edge-to-center bandwidth must be apportioned across applications with different delay requirements. To evaluate our approach, we implement our TTL-based aggregation mechanism in Apache Flink, a popular stream analytics framework. We deploy our Flink implementation in a hub-and-spoke architecture on geo-distributed Amazon EC2 data centers and a WAN-emulated local testbed, and run aggregation tasks for realistic workloads derived from extensive Akamai and Twitter traces. The delay-traffic tradeoff achieved by our Flink implementation agrees closely with theoretical predictions of our model. We show that by deriving the optimal TTLs using our model, our system can achieve a "sweet spot" where both delay and traffic are minimized, in comparison to traditional aggregation schemes such as batching and streaming.

References

  1. Akamai Download Analytics solution. Accessed: 2018--10--29. https://www.akamai.com/us/en/multimedia/documents/product-brief/download-analytics-product-brief.pdf.Google ScholarGoogle Scholar
  2. Akamai Download Manager. Accessed: 2018--10--29. https://www.akamai.com/us/en/products/media-delivery/download-manager-overview.jsp.Google ScholarGoogle Scholar
  3. Akamai Media Analytics. Accessed: 2018--10--29. https://www.akamai.com/us/en/products/media-delivery/media-analytics.jsp.Google ScholarGoogle Scholar
  4. Tyler Akidau, Eric Schmidt, Sam Whittle, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, and Frances Perry. 2015. The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. (2015).Google ScholarGoogle Scholar
  5. Hrishikesh Amur, Wolfgang Richter, David G. Andersen, Michael Kaminsky, Karsten Schwan, Athula Balachandran, and Erik Zawadzki. 2013. Memory-efficient Groupby-aggregate Using Compressed Buffer Trees. In Proc. of ACM SOCC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Berger, P. Gland, S. Singla, and F. Ciucu. 2014. Exact Analysis of TTL Cache Networks. Performance Evaluation, Vol. 79 (2014), 2--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Oscar Boykin, Sam Ritchie, Ian O'Connell, and Jimmy Lin. 2014. Summingbird: A framework for integrating batch and online mapreduce computations. VLDB, Vol. 7, 13 (2014), 1441--1451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Robert Goodell Brown. 1963. Smoothing, forecasting and prediction of discrete time series .Prentice-Hall Englewood Cliffs, N.J. 468 p. pages.Google ScholarGoogle Scholar
  9. Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink: Stream and Batch Processing in a Single Engine. (2015).Google ScholarGoogle Scholar
  10. H. Che, Y. Tung, and Z. Wang. 2002. Hierarchical Web Caching Systems: Modeling, Design and Experimental Results. IEEE Journal on Selected Areas in Communications, Vol. 20, 7 (2002), 1305--1314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ronald Fagin. 1977. Asymptotic Miss Ratios over Independent References. J. Comput. System Sci., Vol. 14, 2 (1977), 222--250.Google ScholarGoogle Scholar
  12. A. Ferragut, I. Rodr'iguez, and F. Paganini. 2016. Optimizing TTL Caches under Heavy-tailed Demands. In Proc. of ACM SIGMETRICS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Philippe Flajolet, Éric Fusy, Olivier Gandouet, and et al. 2007. Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm. In Proc. of IN AOFA.Google ScholarGoogle Scholar
  14. N. C. Fofack, M. Dehghan, D. Towsley, M. Badov, and D. L. Goeckel. 2014. On the Performance of General Cache Networks. In VALUETOOLS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. C. Fofack, P. Nain, G. Neglia, and D. Towsley. 2012. Analysis of TTL-based Cache Networks. In VALUETOOLS.Google ScholarGoogle Scholar
  16. M. Garetto, E. Leonardi, and V. Martina. 2016. A Unified Approach to the Performance Analysis of Caching Systems. ACM Transactions on Modeling and Performance Evaluation of Computing Systems, Vol. 1, 3 (2016), 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Benjamin Heintz, Abhishek Chandra, and Ramesh K. Sitaraman. 2016. Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics. In Proc. of ACM SoCC .Google ScholarGoogle Scholar
  18. Benjamin Heintz, Abhishek Chandra, and Ramesh K. Sitaraman. 2017. Optimizing Timeliness and Cost in Geo-Distributed Streaming Analytics. (2017).Google ScholarGoogle Scholar
  19. Chien-Chun Hung, Ganesh Ananthanarayanan, Leana Golubchik, Minlan Yu, and Mingyang Zhang. 2018. Wide-area analytics with multiple resources. In Proc. of EuroSys. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Jiang, P. Nain, and D. Towsley. 2016. On the Convergence of the TTL Approximation for an LRU Cache under Independent Stationary Rrequest Processes. ACM Transactions on Modeling and Performance Evaluation of Computing Systems, Vol. 3, 4 (2016), 20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Albert Jonathan, Abhishek Chandra, and Jon Weissman. 2018. Multi-Query Optimization in Wide-Area Streaming Analytics. In Proc. of ACM SoCC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Jung, A. Berger, and H. Balakrishnan. 2003. Analysis of TTL-based Cache Networks. In IEEE INFOCOM.Google ScholarGoogle Scholar
  23. KSQL: Streaming SQL for Kafka. Accessed: 2018--10--29. https://www.confluent.io/ product/ksql/.Google ScholarGoogle Scholar
  24. Sanjeev Kulkarni, Nikunj Bhagat, Masong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proc. Of ACM SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Erik Nygren, Ramesh K. Sitaraman, and Jennifer Sun. 2010. The Akamai Network: A Platform for High-performance Internet Applications. SIGOPS Oper. Syst. Rev., Vol. 44, 3 (2010), 2--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. K. Panigrahy, J. Li, F. Zafari, D. Towsley, and P. Yu. 2018. Optimizing Timer-based Policies for General Cache Networks. Arxiv preprint arXiv:1711.03941 (2018).Google ScholarGoogle Scholar
  27. Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, and Ion Stoica. 2015. Low Latency Geo-distributed Data Analytics. In ACM SIGCOMM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ariel Rabkin, Matvey Arye, Siddhartha Sen, Vivek S. Pai, and Michael J. Freedman. 2014. Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area. In Proc. of USENIX NSDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ramesh Rajagopalan and Pramod Varshney. 2006. Data-aggregation techniques in sensor networks: a survey. 4 (2006), 48--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Twitter Analytics. Accessed: 2018--10--29. https://business.twitter.com/en/analytics.html.Google ScholarGoogle Scholar
  31. Twitter Developer APIs. Accessed: 2018--10--29. https://developer.twitter.com/en/docs.Google ScholarGoogle Scholar
  32. Twitter usage statistics. Accessed: 2018--10--29. http://www.internetlivestats.com/twitter-statistics/.Google ScholarGoogle Scholar
  33. Raajay Viswanathan, Ganesh Ananthanarayanan, and Aditya Akella. 2016. CLARINET: WAN-Aware Optimization for Analytics Queries. In Proc. of USENIX OSDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ashish Vulimiri, Carlo Curino, Philip Brighten Godfrey, Thomas Jungblut, Konstantinos Karanasos, Jitendra Padhye, and George Varghese. 2015. Wanalytics: Geo-distributed analytics for a data intensive world. In ACM SIGMOD. 1087--1092. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Windows API in Apache Flink. Accessed: 2018--10--29. https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html.Google ScholarGoogle Scholar
  36. Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized streams: Fault-tolerant streaming computation at scale. In ACM SOSP. 423--438.Google ScholarGoogle Scholar
  37. Ben Zhang, Xin Jin, Sylvia Ratnasamy, John Wawrzynek, and Edward A. Lee. 2018. AWStream: adaptive wide-area streaming analytics. In Proc. of ACM SIGCOMM.Google ScholarGoogle Scholar

Index Terms

  1. A TTL-based Approach for Data Aggregation in Geo-distributed Streaming Analytics

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!