Abstract
Despite the prognosed use of event correlation techniques for monitoring critical complex infrastructures or dealing with disasters in the physical world, little work exists on making event correlation systems themselves tolerant to failure. Existing systems either provide no guarantees on event deliveries, do not support multicast and thus provide no guarantees across individual processes, or then rely on centralized components or strong assumptions on the infrastructure.
The FAIDECS system attempts to reconcile strong guarantees with practical performance in the presence of process crash failures. To that end, the FAIDECS system uses an overlay network with specific guarantees aligned with its proposed correlation language and guarantees. However, the language proposed lacks expressivity, and the system itself supports only very specific rigid semantics, incapable of supporting even fundamental features like sliding windows.
After providing a comprehensive overview of the FAIDECS model and system, this article bridges the gap between strong guarantees and more established correlation languages and systems in several steps. First, we propose alternative semantics for several modules of the FAIDECS matching engine and revisit guarantees. Second, we pinpoint which guarantees are contradicted by which combinations of semantic options. Third, we investigate four correlation languages—StreamSQL, EQL, CEL, and TESLA—showing which semantic options their respective features correspond to in our model, and thus, ultimately, which guarantees of FAIDECS are maintained by which language features.
- Daniel J. Abadi, Don Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. 2003. Aurora: A new model and architecture for data stream management. VLDB J. 12, 2 (Aug. 2003), 120--139. DOI: http://dx.doi.org/10.1007/s00778-003-0095-z Google Scholar
Digital Library
- Marcos K. Aguilera, Robert E. Strom, Daniel C. Sturman, Mark Astley, and Tushar D. Chandra. 1999. Matching events in a content-based subscription system. In Proceedings of the 18th Annual ACM Symposium on Principles of Distributed Computing (PODC '99). ACM, New York, NY, 53--61. DOI: http://dx.doi.org/10.1145/301308.301326 Google Scholar
Digital Library
- Marcos Kawazoe Aguilera and Sam Toueg. 1996. Randomization and failure detection: A hybrid approach to solve consensus. In Proceedings of the 10th International Workshop on Distributed Algorithms (WDAG'96). Lecture Notes in Computer Science, vol. 1151, Springer-Verlag, Berlin, Heidelberg, 29--39. http://dl.acm.org/citation.cfm?id=645953.675629 Google Scholar
Digital Library
- Magdalena Balazinska, Hari Balakrishnan, Samuel R. Madden, and Michael Stonebraker. 2008. Fault-tolerance in the Borealis distributed stream processing system. ACM Trans. Data. Syst. 33, 1, Article 3 (2008). DOI: http://dx.doi.org/10.1145/1331904.1331907 Google Scholar
Digital Library
- Roberto Baldoni, Silvia Bonomi, Marco Platania, and Leonardo Querzoni. 2012. Dynamic message ordering for topic-based publish/subscribe systems. In Proceedings of the IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS'12). IEEE Computer Society, 909--920. DOI: http://dx.doi.org/10.1109/IPDPS.2012.86 Google Scholar
Digital Library
- Anindya Basu, Bernadette Charron-Bost, and Sam Toueg. 1996. Simulating reliable links with unreliable links in the presence of process crashes. In Proceedings of the 10th International Workshop on Distributed Algorithms (WDAG'96). Lecture Notes in Computer Science, vol. 1151. Springer-Verlag, Berlin, Heidelberg, 105--122. http://dl.acm.org/citation.cfm?id=645953.675641 Google Scholar
Digital Library
- Lars Brenna, Alan Demers, Johannes Gehrke, Mingsheng Hong, Joel Ossher, Biswanath Panda, Mirek Riedewald, Mohit Thatte, and Walker White. 2007. Cayuga: A high-performance event processing engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07). ACM, New York, NY, 1100--1102. DOI: http://dx.doi.org/10.1145/1247480.1247620 Google Scholar
Digital Library
- Antonio Carzaniga, David S. Rosenblum, and Alexander L. Wolf. 2001. Design and evaluation of a wide-area event notification service. ACM Trans. Comput. Syst. 19, 3 (2001), 332--383. DOI: http://dx.doi.org/10.1145/380749.380767 Google Scholar
Digital Library
- Sharma Chakravarthy, V. Krishnaprasad, Eman Anwar, and S.-K. Kim. 1994. Composite events for active databases: Semantics, contexts and detection. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94). Morgan Kaufmann Publishers Inc., San Francisco, CA, 606--617. http://dl.acm.org/citation.cfm?id=645920.672994 Google Scholar
Digital Library
- Tushar Deepak Chandra and Sam Toueg. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM 43, 2 (1996), 225--267. DOI: http://dx.doi.org/10.1145/226643.226647 Google Scholar
Digital Library
- Gianpaolo Cugola and Alessandro Margara. 2010. TESLA: A formally defined event specification language. In Proceedings of the 4th ACM International Conference on Distributed Event-Based Systems (DEBS'10). ACM, New York, NY, 50--61. DOI: http://dx.doi.org/10.1145/1827418.1827427 Google Scholar
Digital Library
- Xavier Défago, André Schiper, and Péter Urbán. 2004. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv. 36, 4 (2004), 372--421. DOI: http://dx.doi.org/10.1145/1041680.1041682 Google Scholar
Digital Library
- Alan Demers, Johannes Gehrke, Mingsheng Hong, Mirek Riedewald, and Walker White. 2006. Towards expressive publish/subscribe systems. In Proceedings of the 10th International Conference on Advances in Database Technology (EDBT'06). Lecture Notes in Computer Science, vol. 1151, Springer-Verlag, Berlin, Heidelberg, 627--644. DOI: http://dx.doi.org/10.1007/11687238_38 Google Scholar
Digital Library
- Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (1985), 374--382. DOI: http://dx.doi.org/10.1145/3149.214121 Google Scholar
Digital Library
- S. Gatziu and K. R. Dittrich. 1994. Detecting composite events in active database systems using Petri nets. In Proceedings of the 4th International Workshop on Research Issues in Data Engineering. Active Database Systems. 2--9. DOI: http://dx.doi.org/10.1109/RIDE.1994.282859 Google Scholar
Digital Library
- Narain H. Gehani, H. V. Jagadish, and Oded Shmueli. 1992. Composite event specification in active databases: Model &Amp; implementation. In Proceedings of the 18th International Conference on Very Large Data Bases (VLDB'92). Morgan Kaufmann Publishers Inc., San Francisco, CA, 327--338. http://dl.acm.org/citation.cfm?id=645918.672484 Google Scholar
Digital Library
- Vassos Hadzilacos and Sam Toueg. 1993. Fault-tolerant broadcasts and related problems. Distributed Systems (2nd Ed.) ACM Press/Addison-Wesley Publishing Co., New York, NY. 97--145. http://dl.acm.org/citation.cfm?id=302430.302435 Google Scholar
Digital Library
- Waldemar Hummer, Christian Inzinger, Philipp Leitner, Benjamin Satzger, and Schahram Dustdar. 2012. Deriving a unified fault taxonomy for event-based systems. In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (DEBS'12). ACM, New York, NY, 167--178. DOI: http://dx.doi.org/10.1145/2335484.2335504 Google Scholar
Digital Library
- Gabriela Jacques-Silva, Jim Challenger, Lou Degenaro, James Giles, and Rohit Wagle. 2007. Towards autonomic fault recovery in System-S. In Proceedings of the 4th International Conference on Autonomic Computing (ICAC'07). IEEE Computer Society, 31--. DOI: http://dx.doi.org/10.1109/ICAC.2007.40 Google Scholar
Digital Library
- Namit Jain, Shailendra Mishra, Anand Srinivasan, Johannes Gehrke, Jennifer Widom, Hari Balakrishnan, Uǧur Çetintemel, Mitch Cherniack, Richard Tibbetts, and Stan Zdonik. 2008. Towards a streaming SQL standard. Proc. VLDB Endow. 1, 2 (2008), 1379--1390. DOI: http://dx.doi.org/10.14778/1454159.1454179 Google Scholar
Digital Library
- Ramana Rao Kompella, Jennifer Yates, Albert Greenberg, and Alex C. Snoeren. 2005. IP fault localization via risk modeling. In Proceedings of the 2nd Conference on Symposium on Networked Systems Design & Implementation (NSDI'05). USENIX Association, Berkeley, CA, USA, 57--70. http://dl.acm.org/citation.cfm??id=1251203.1251208 Google Scholar
Digital Library
- Christopher Krügel, Thomas Toth, and Clemens Kerer. 2002. Decentralized event correlation for intrusion detection. In Proceedings of the 4th International Conference Seoul on Information Security and Cryptology (ICISC'01). Lecture Notes in Computer Science, vol. 1151, Springer-Verlag, Berlin, Heidelberg, 114--131. http://dl.acm.org/citation.cfm?id=646283.687988 Google Scholar
Digital Library
- Guoli Li and Hans-Arno Jacobsen. 2005. Composite subscriptions in content-based publish/subscribe systems. In Proceedings of the ACM/IFIP/USENIX International Conference on Middleware (Middleware'05). Lecture Notes in Computer Science, vol. 1151, Springer-Verlag, Berlin, Heidelberg, 249--269. http://dl.acm.org/citation.cfm?id=1515890.1515903 Google Scholar
Digital Library
- Cristian Lumezanu, Neil Spring, and Bobby Bhattacharjee. 2006. Decentralized message ordering for publish/subscribe systems. In Proceedings of the ACM/IFIP/USENIX International Conference on Middleware (Middleware'06). Lecture Notes in Computer Science, vol. 1151, Springer-Verlag, Berlin, Heidelberg, 162--179. http://dl.acm.org/citation.cfm?id=1515984.1515997 Google Scholar
Digital Library
- Peter R. Pietzuch, Brian Shand, and Jean Bacon. 2003. A framework for event composition in distributed systems. In Proceedings of the ACM/IFIP/USENIX International Conference on Middleware (Middleware'03). Lecture Notes in Computer Science, vol. 1151, Springer-Verlag, Berlin, Heidelberg, 62--82. http://dl.acm.org/citation.cfm?id=1515915.1515921 Google Scholar
Digital Library
- Zhengping Qian, Yong He, Chunzhi Su, Zhuojie Wu, Hongyu Zhu, Taizhi Zhang, Lidong Zhou, Yuan Yu, and Zheng Zhang. 2013. TimeStream: Reliable stream computation in the Cloud. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys'13). ACM, New York, NY, 1--14. DOI: http://dx.doi.org/10.1145/2465351.2465353 Google Scholar
Digital Library
- Heiko Sturzrehm, Pascal Felber, and Christof Fetzer. 2009. TM-Stream: An STM framework for distributed event stream processing. In Proceedings of the IEEE International Symposium on Parallel Distributed Processing, (IPDPS'09). 1--8. DOI: http://dx.doi.org/10.1109/IPDPS.2009.5161084 Google Scholar
Digital Library
- P. Triantafillou and A. Economides. 2004. Subscription summarization: A new paradigm for efficient publish/subscribe systems. In Proceedings of the 24th International Conference on Distributed Computing Systems. 562--571. DOI: http://dx.doi.org/10.1109/ICDCS.2004.1281623 Google Scholar
Digital Library
- Gregory Aaron Wilkin and Patrick Eugster. 2013. Multicasting in the presence of aggregated deliveries. J. Parallel Distrib. Comput. 73, 4 (2013), 544--556. DOI: http://dx.doi.org/10.1016/j.jpdc.2012.12.004 Google Scholar
Digital Library
- Gregory Aaron Wilkin, Patrick Eugster, and K. R. Jayaram. 2014. Decentralized fault tolerant event-correlation. Technical Report. http://www.jayaramkr.com/files/FAIDECSTechReport.pdf.Google Scholar
- Gregory Aaron Wilkin, K. R. Jayaram, Patrick Eugster, and Ankur Khetrapal. 2011. FAIDECS: Fair decentralized event correlation. In Proceedings of the 12th ACM/IFIP/USENIX International Conference on Middleware (Middleware'11). Lecture Notes in Computer Science, vol. 1151, Springer-Verlag, Berlin, Heidelberg, 228--248. DOI: http://dx.doi.org/10.1007/978-3-642-25821-3_12 Google Scholar
Digital Library
- Kaiwen Zhang, Vinod Muthusamy, and Hans-Arno Jacobsen. 2012. Total order in content-based publish/subscribe systems. In Proceedings of the IEEE 32nd International Conference on Distributed Computing Systems (ICDCS). 335--344. DOI: http://dx.doi.org/10.1109/ICDCS.2012.17 Google Scholar
Digital Library
- Yuanyuan Zhao and Rob Strom. 2001. Exploitng event stream interpretation in publish-subscribe systems. In Proceedings of the 20th Annual ACM Symposium on Principles of Distributed Computing (PODC'01). ACM, New York, NY, 219--228. DOI: http://dx.doi.org/10.1145/383962.384023 Google Scholar
Digital Library
Index Terms
Decentralized Fault-Tolerant Event Correlation
Recommendations
Guarantees for decentralized event correlation
MDS '11: Proceedings of the 8th Middleware Doctoral SymposiumMany distributed applications rely on forms of event correlation, which result in atomic delivery of multiple events aggregated by following process-specific criteria. Generally, more than one process is aggregating events, implying that events are ...
Making CRDTs Byzantine fault tolerant
PaPoC '22: Proceedings of the 9th Workshop on Principles and Practice of Consistency for Distributed DataIt is often claimed that Conflict-free Replicated Data Types (CRDTs) ensure consistency of replicated data in peer-to-peer systems. However, peer-to-peer systems usually consist of untrusted nodes that may deviate from the specified protocol (i.e. ...
On the Subject of Non-Equivocation: Defining Non-Equivocation in Synchronous Agreement Systems
PODC '20: Proceedings of the 39th Symposium on Principles of Distributed ComputingWe study non-equivocation in synchronous agreement protocols: the restriction on faulty processes that they cannot act differently towards distinct non-faulty processes. Guarantees of non-equivocation have been used to provide improved fault tolerance ...






Comments