ABSTRACT
A fundamental problem in distributed computation is the distributed evaluation of functions. The goal is to determine the value of a function over a set of distributed inputs, in a communication efficient manner. Specifically, we assume that each node holds a time varying input vector, and we are interested in determining, at any given time, whether the value of an arbitrary function on the average of these vectors crosses a predetermined threshold.
In this paper, we introduce a new method for monitoring distributed data, which we term shape sensitive geometric monitoring. It is based on a geometric interpretation of the problem, which enables to define local constraints on the data received at the nodes. It is guaranteed that as long as none of these constraints has been violated, the value of the function does not cross the threshold. We generalize previous work on geometric monitoring, and solve two problems which seriously hampered its performance: as opposed to the constraints used so far, which depend only on the current values of the local input vectors, here we incorporate their temporal behavior into the constraints. Also, the new constraints are tailored to the geometric properties of the specific function which is being monitored, while the previous constraints were generic.
Experimental results on real world data reveal that using the new geometric constraints reduces communication by up to three orders of magnitude in comparison to existing approaches, and considerably narrows the gap between existing results and a newly defined lower bound on the communication complexity.
- Shipra Agrawal, Supratim Deb, K. V. M. Naidu, and Rajeev Rastogi. Efficient detection of distributed constraint violations. In ICDE '07, pages 1320--1324.Google Scholar
- Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In STOC '96, pages 20--29. Google Scholar
- Arvind Arasu and Gurmeet Singh Manku. Approximate counts and quantiles over sliding windows. In PODS '04, pages 286--296. Google Scholar
- Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues in data stream systems. In PODS '02, pages 1--16. Google Scholar
- Brian Babcock and Chris Olston. Distributed top-k monitoring. In SIGMOD '03, pages 28--39. Google Scholar
- Donald Carney, Ugur ¸Cetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Greg Seidman, Michael Stonebraker, Nesime Tatbul, and Stanley B. Zdonik. Monitoring streams - a new class of data management applications. In VLDB '02, pages 215--226. Google Scholar
- Amit Chakrabarti, Graham Cormode, and Andrew McGregor. A near-optimal algorithm for computing the entropy of a stream. In SODA '07. Google Scholar
- Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In ICALP '02, pages 693--703. Google Scholar
- Edith Cohen and Martin J. Strauss. Maintaining time-decaying stream aggregates. J. Algorithms, 59(1):19--36. Google Scholar
- G. Cormode, R. Keralapura, and J. Ramimirtham. Communication-efficient distributed monitoring of thresholded counts. In SIGMOD '06. Google Scholar
- Graham Cormode and Minos Garofalakis. Sketching streams through the net: distributed approximate query tracking. In VLDB '05, pages 13--24. Google Scholar
Digital Library
- Graham Cormode, Minos Garofalakis, S. Muthukrishnan, and Rajeev Rastogi. Holistic aggregates in a networked world: distributed tracking of approximate quantiles. In SIGMOD '05, pages 25--36. Google Scholar
- Graham Cormode, S. Muthukrishnan, and Wei Zhuang. Conquering the divide: Continuous clustering of distributed data streams. In ICDE '07, pages 1036--1045.Google Scholar
- Graham Cormode, S. Muthukrishnan, and Wei Zhuang. What's different: Distributed, continuous monitoring of duplicate-resilient aggregates on data streams. In ICDE '06, page 57. Google Scholar
- Abhinandan Das, Sumit Ganguly, Minos Garofalakis, and Rajeev Rastogi. Distributed set-expression cardinality estimation. In VLDB '04, pages 312--323. Google Scholar
- Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows: (extended abstract). In SODA '02, pages 635--644. Google Scholar
- Mark Dilman and Danny Raz. Efficient reactive monitoring. In INFOCOM '01, pages 1012--1019.Google Scholar
- Gereon Frahling, Piotr Indyk, and Christian Sohler. Sampling in dynamic data streams and applications. In SCG '05, pages 142--149. Google Scholar
- Ling Huang, Minos Garofalakis, Joseph Hellerstein, Anthony Joseph, and Nina Taft. Toward sophisticated detection with distributed triggers. In MineNet '06, pages 311--316. Google Scholar
- Ling Huang, Minos Garofalakis, Joseph Hellerstein, Anthony Joseph, and Nina Taft. Toward sophisticated detection with distributed triggers. In MineNet '06, pages 311--316. Google Scholar
- Ankur Jain, Joseph M. Hellerstein, Sylvia Ratnasamy, and David Wetherall. A wakeup call for internet monitoring systems: The case for distributed triggers. In Proc. 3rd ACM SIGCOMM Workshop on Hot Topics in Networks (HotNets), 2004.Google Scholar
- David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, 2004. Google Scholar
- David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, 2004. Google Scholar
- Samuel Madden, Mehul Shah, Joseph M. Hellerstein, and Vijayshankar Raman. Continuously adaptive continuous queries over streams. In SIGMOD '02, pages 49--60. Google Scholar
- Amit Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere, and Christopher Olston. Finding (recently) frequent items in distributed data streams. In ICDE '05, pages 767--778. Google Scholar
- Gurmeet Singh Manku and Rajeev Motwani. Approximate frequency counts over data streams. In VLDB '02, pages 346--357. Google Scholar
- Gurmeet Singh Manku and Rajeev Motwani. Approximate frequency counts over data streams. In VLDB '02, pages 346--357. Google Scholar
- P.A. Parrilo. Semidefinite programming relaxations for semialgebraic problems. Mathematical Programming, 96(2):293--320, 2003.Google Scholar
Digital Library
- T.G. Rose, M. Stevenson, and M. Whitehead. The Reuters Corpus Volume 1 - from Yesterday's News to Tomorrow's Language Resources. In LREC '02, pages 827--832.Google Scholar
- Izchak Sharfman, Assaf Schuster, and Daniel Keren. A geometric approach to monitoring threshold functions over distributed data streams. In SIGMOD '06, pages 301--312. Google Scholar
- Yiming Yang and Jan O. Pedersen. A comparative study on feature selection in text categorization. In ICML '97, pages 412--420. Google Scholar
- Byoung-Kee Yi, Nikolaos Sidiropoulos, Theodore Johnson, H. V. Jagadish, Christos Faloutsos, and Alexandros Biliris. Online data mining for co-evolving time sequences. In ICDE '00, page 13. Google Scholar
- Yonggang Jerry Zhao, Ramesh Govindan, and Deborah Estrin. Computing aggregates for monitoring wireless sensor networks. In SNPA 03.Google Scholar
- Yunyue Zhu and Dennis Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB '02, pages 358--369. Google Scholar
Index Terms
Shape sensitive geometric monitoring
Recommendations
Shape Sensitive Geometric Monitoring
An important problem in distributed, dynamic databases is to continuously monitor the value of a function defined on the nodes, and check that it satisfies some threshold constraint. We introduce a monitoring method, based on a geometric interpretation ...
Geometric modeling in shape space
SIGGRAPH '07: ACM SIGGRAPH 2007 papersWe present a novel framework to treat shapes in the setting of Riemannian geometry. Shapes -- triangular meshes or more generally straight line graphs in Euclidean space -- are treated as points in a shape space. We introduce useful Riemannian metrics ...
Geometric modeling in shape space
We present a novel framework to treat shapes in the setting of Riemannian geometry. Shapes -- triangular meshes or more generally straight line graphs in Euclidean space -- are treated as points in a shape space. We introduce useful Riemannian metrics ...






Comments