Abstract
Sensors in smart-item environments capture data about product conditions and usage to support business decisions as well as production automation processes. A challenging issue in this application area is the restricted quality of sensor data due to limited sensor precision and sensor failures. Moreover, data stream processing to meet resource constraints in streaming environments introduces additional noise and decreases the data quality. In order to avoid wrong business decisions due to dirty data, quality characteristics have to be captured, processed, and provided to the respective business task. However, the issue of how to efficiently provide applications with information about data quality is still an open research problem.
In this article, we address this problem by presenting a flexible model for the propagation and processing of data quality. The comprehensive analysis of common data stream processing operators and their impact on data quality allows a fruitful data evaluation and diminishes incorrect business decisions. Further, we propose the data quality model control to adapt the data quality granularity to the data stream interestingness.
References
- Al-Kateb, M., Lee, B. S., and Wang, X. S. 2007. Adaptive-Size reservoir sampling over data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management. IEEE Computer Society, 22--34. Google Scholar
Digital Library
- Arasu, A., Babcock, B., Babu, S., Datar, M., Ito, K., Nishizawa, I., Rosenstein, J., and Widom, J. 2003. The stream group. STREAM: The Stanford Stream Data Manager. http://infolab.stanford.edu/stream.Google Scholar
- Ballou, D. P. and Tayi, G. K. 1999. Enhancing data quality in data warehouse environments. Comm. ACM 42, 1, 73--78. Google Scholar
Digital Library
- Burdick, D., Deshpande, P. M., Jayram, T. S., Ramakrishnan, R., and Vaithyanathan, S. 2005. Olap over uncertain and imprecise data. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 970--981. Google Scholar
Digital Library
- Haas, P. J. 1997. Large-Sample and deterministic confidence intervals for online aggregation. In Proceedings of the 9th International Conference on Scientific and Statistical Database Management. 51--63. Google Scholar
Digital Library
- Kang, J., Naughton, J. F., and Viglas, S. D. 2002. Evaluating window joins over unbounded streams. In Proceedings of the 28th International Conference on Very Large Data Bases. 341--352.Google Scholar
- Klein, A. 2007. Incorporating quality aspects in sensor data streams. In Proceedings of the ACM 1st Ph.D. Workshop in CIKM (PIKM). ACM, New York, 77--84. Google Scholar
Digital Library
- Klein, A., Do, H.-H., and Lehner, W. 2007. Representing data quality for streaming and static data. In Proceedings of the International Workshop on Ambient Intelligence, Media, and Sensing (AIMS). AIMS Workshop, 3--10. Google Scholar
Digital Library
- Kraemer, J. and Seeger, B. 2004. Pipes - A public infrastructure for processing and exploring streams. In Proceedings of the 9th International Conference on Management of Data, G. Weikum et al., eds. ACM, 925--926. Google Scholar
Digital Library
- Lee, M.-L., Ling, T. W., Lu, H., and Ko, Y. T. 1999. Cleansing data for mining and warehousing. In Proceedings of the 10th International Workshop on Database and Expert Systems Applications. 751--760. Google Scholar
Digital Library
- Mielke, M., Mueller, H., and Naumann, F. 2005. Ein data-quality-wettbewerb. Datenbank-Spektrum 14, 34--37.Google Scholar
- Moon, Y.-S. 2006. Efficient stream sequence matching algorithms for handheld devices on timeseries stream data. In Proceedings of the 24th IASTED International Conference on Database and Applications (DBA). ACTA Press, Anaheim, CA, 44--49. Google Scholar
Digital Library
- Motro, A. and Rakov, I. 1996. Estimating the quality of data in relational databases. In Proceedings of the International Conference on Information Quality (IQ). 94--106.Google Scholar
- Motro, A. and Rakov, I. 1997. Not all answers are equally good: Estimating the quality of database answers. In Flexible Query Answering Systems. Kluwer Academic Publishers, 1--21. Google Scholar
Digital Library
- Naumann, F. and Rolker, C. 1999. Do metadata models meet IQ requirements? In Proceedings of the International Conference on Information Quality (IQ). 99--114.Google Scholar
- Naumann, F. and Rolker, C. 2000. Assessment methods for information quality criteria. In Proceedings of the International Conference on Information Quality (IQ). 148--162.Google Scholar
- Orr, K. 1998. Data quality and systems theory. Comm. ACM 41, 2, 66--71. Google Scholar
Digital Library
- Papula, L. 2006. Mathematische Formelsammlung fuer Ingenieure und Naturwissen-schaftler (German). Vieweg Verlag.Google Scholar
- Scannapieco, M. and Batini, C. 2004. Completeness in the relational model: A comprehensive framework. In Proceedings of the 9th International Conference on Information Quality (IQ). 333--345.Google Scholar
- Schmidt, S., Fiedler, M., and Lehner, W. 2005. Source-Aware join strategies of sensor data streams. In Proceedings of the 17th International Conference on Scientific and Statistical Database Management (SSDBM). California, 123--132. Google Scholar
Digital Library
- Strong, D. M., Lee, Y. W., and Wang, R. Y. 1997. Data quality in context. Comm. ACM 40, 5, 103--110. Google Scholar
Digital Library
- Wand, Y. and Wang, R. Y. 1996. Anchoring data quality dimensions in ontological foundations. Comm. ACM 39, 11, 86--95. Google Scholar
Digital Library
- Wang, R. Y. and Strong, D. M. 1996. Beyond accuracy: What data quality means to data consumers. J. Manag. Inform. Syst. 12, 4, 5--33. Google Scholar
Digital Library
- Wang, R., Ziad, W., and Lee, Y. 2001. Data Quality. The Kluwer International Series on Advances in Database Systems, Vol. 23. 63--77.Google Scholar
- Weikum, G. 1999. Towards guaranteed quality and dependability of information service. In Proceedings of the 8th GI Fachtagung: Datenbanksysteme in Buero, Technik und Wissenschaft, A. P. Buchmann, ed., Springer Verlag, 379--409.Google Scholar
Cross Ref
Index Terms
Representing Data Quality in Sensor Data Streaming Environments





Comments