Abstract
Research in data and information quality has made significant strides over the last 20 years. It has become a unified body of knowledge incorporating techniques, methods, and applications from a variety of disciplines including information systems, computer science, operations management, organizational behavior, psychology, and statistics. With organizations viewing “Big Data”, social media data, data-driven decision-making, and analytics as critical, data quality has never been more important. We believe that data quality research is reaching the threshold of significant growth and a metamorphosis from focusing on measuring and assessing data quality—content—toward a focus on usage and context. At this stage, it is vital to understand the identity of this research area in order to recognize its current state and to effectively identify an increasing number of research opportunities within. Using Latent Semantic Analysis (LSA) to analyze the abstracts of 972 peer-reviewed journal and conference articles published over the past 20 years, this article contributes by identifying the core topics and themes that define the identity of data quality research. It further explores their trends over time, pointing to the data quality dimensions that have—and have not—been well-studied, and offering insights into topics that may provide significant opportunities in this area.
References
- D. P. Ballou and H. L. Pazer. 1985. Designing information systems to optimize the accuracy-timeliness tradeoff. Inf. Syst. Res. 6, (1985).Google Scholar
- D. P. Ballou, R. Y. Wang, and G. K. Tayi. 1998. Modeling information manufacturing systems to determine information product quality. Manage. Sci. 44, 4 (1998), 462--484. Google Scholar
Digital Library
- I. Benbasat and R. Zmud. 2003. The identity crisis within the IS discipline: Defining and communicating the discipline's core properties. MIS Quart. 27, (2003), 183--194.Google Scholar
- R. Blake and P. Mangiameli. 2011. The effects and interactions of data quality and problem complexity on classification. J. Data Inf. Qual. 2, 2 (2011). Google Scholar
Digital Library
- R. Blake. 2010. Identifying the core topics and themes in information quality research. In Proceedings of Americas Conference on Information Systems (AMCIS 2010).Google Scholar
- R. Blake and G. Shankaranarayanan. 2010. Framing data quality research: a semantic analysis approach. In Proceedings of the International Conference on Information Quality (ICIQ 2012).Google Scholar
- R. Blake and G. Shankaranarayanan. 2012. Discovering data and information quality research: insights gained through latent semantic analysis. Int. J. Busin. Intell. Res. 3, 1 (2012), 1--16. Google Scholar
Cross Ref
- B. Cowgill, J. Wolfers, and E. Zitzewitz. 2009. Using prediction markets to track information flows: Evidence from Google. Source: https://www.stat.berkeley.edu/∼aldous/157/Papers/GooglePredictionMarketPaper.pdf.Google Scholar
- S. Cummings and U. Daellenbach. 2009. A guide to the future of strategy: The history of long range planning. Long Range Plan. 42 (2009), 234--263. Google Scholar
Cross Ref
- S. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. 1990. Indexing by latent semantic analysis. J. Soc. Inf. Sci. 41 (1990), 391--407. Google Scholar
Cross Ref
- N. Evangelopoulos, X. Zhang, and V. Prybutok. 2012. Latent semantic analysis: Five methodological recommendations Europ. J. Inf. Syst. 21, 1 (2012), 70--86. Google Scholar
Cross Ref
- A. Even, G. Shankaranarayanan, and P. D. Berger. 2007. Inequality in utility of data and its implications for data management. In Proceedings of the 17th Annual Workshop on Information Technology and Systems (WITS 2007).Google Scholar
- C. W. Fisher, I. Chengalur-Smith, and D. P. Ballou. 2003. The impact of experience and time on the use of data quality information in decision making. Inf. Syst. Res. 14, 2 (2003), 170--188. Google Scholar
Digital Library
- M. Gaynor and G. Shankaranarayanan. 2008. Implications of sensors and sensor networks for data quality management. Int.J. Inf. Qual. 2, 1, 75--93. Google Scholar
Cross Ref
- M. Ge and M. Helfert. 2007. A review of information quality research. In Proceedings of the International Conference on Information Quality (ICIQ 2007).Google Scholar
- T. L. Griffiths and M. Steyvers. 2004. Finding scientific topics. In Proceedings of the National Academy of Sciences of the United States of America 101, Suppl 1, (2004), 5228. Google Scholar
Cross Ref
- J. M. Juran and A. B. Godfrey. 2000. Juran's Quality Handbook. McGraw-Hill International Editions: Industrial Engineering Series, 5thEdition.Google Scholar
- D. Laham. 1997. Latent semantic analysis approaches to categorization. In Proceedings of the 19th Annual Conference of the Cognitive Science Society. (1997).Google Scholar
- T. K. Landauer, P. W. Foltz, and D. Laham. 1998. Introduction to latent semantic analysis. Disc. Process. 25 (1998), 259--284. Google Scholar
Cross Ref
- L. S. Larkey and W. B. Croft. 2003. A text categorization approach to automated essay grading. In Automated Essay Scoring: A Cross-Discipline Perspective. Lawrence Erlbaum, Mahwah, NJ.Google Scholar
- Y. W. Lee, L. L. Pipino, J. D. Funk, and R. Y. Wang. 2006. Journey to Data Quality. MIT Press, Cambridge, MA.Google Scholar
- L. Lima, G. Maçada, and L. M. Vargas. 2006. Research into information quality: A study of the state-of-the-art in IQ and its consolidation. In Proceedings of the International Conference on Information Quality (ICIQ 2006).Google Scholar
- S. Madnick, R. Y. Wang, Y. W. Lee, and H. Zhu. 2009. Overview and framework for data and information quality research. ACM J. Inf. Data Qual. 1, 1--22. Google Scholar
Digital Library
- R. Mason and I. Mitroff. 1973. A program for research on management information systems. Manage. Sci. 19 (1973), 475--487. Google Scholar
Digital Library
- M. P. Neely and J. Cook. 2008. A framework for classification of the data and information quality literature and preliminary results (1996-2007). In Proceedings of Americas Conference on Information Systems (AMCIS 2008).Google Scholar
- M. F. Porter. 1980. An algorithm for suffix stripping. Program. 14, 3 (1980), 130--137. Google Scholar
- G. Shankaranarayanan, M. Ziad, and R. Y. Wang. 2003. Managing data quality in dynamic decision environments: An information product approach. J. Datab. Manage. 14, 4 (2003), 14--32. Google Scholar
Cross Ref
- G. Shankaranarayanan and Y. Cai. 2006. Supporting data quality management in decision making. Decis. Supp. Syst. 42 (2006), 302--317. Google Scholar
Digital Library
- G. Shankaranarayanan, B. Iyer, and D. Stoddard. 2012. Quality of social media data and implications of social media for data quality. In Proceedings of the International Conference on Information Quality (ICIQ 2012), 311--325.Google Scholar
- G. Shankaranarayanan and S. Watts. 2003. A relevant believable approach for data quality assessment. In Proceedings of the International Conference on Information Quality (ICIQ 2003).Google Scholar
- A. Sidorova, N. Evangelopoulos, J. S. Valacich, and T. Ramakrishnan. 2008. Uncovering the intellectual core of the information systems discipline. MIS Quart. 32 (2008) 467--482.Google Scholar
Digital Library
- H. Stotesbury. 2003. Evaluation in research article abstracts in the narrative and hard sciences. J. Eng. Acad. Purp. 2 (2003), 327--341. Google Scholar
Cross Ref
- H. Taylor, S. Dillon, and M. V. Wingen. 2010. Focus and diversity in information systems research: Meeting the dual demands of a healthy applied discipline. MIS Quart. 34, 3 (2010), 647--667.Google Scholar
Digital Library
- R. Y. Wang. 1998. A product perspective on total data quality management. Commun. ACM 41, 2 (1998), 58--65. Google Scholar
Digital Library
- R. Y. Wang, V. C. Storey, and P. Firth. 1995. A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7 (1995), 623--640. Google Scholar
Digital Library
- R. Y. Wang and D. M. Strong. 1996. Beyond accuracy: What data quality means to consumers. J. Manage. Inf. Syst. 12 (1996), 5--34. Google Scholar
Digital Library
- T. Zhang, Y. Wu, and W. Huang. 2014. Comparison of LSA and LDA in data/information quality research. In Proceedings of the International Conference on Information Quality (ICIQ 2014).Google Scholar
Index Terms
From Content to Context: The Evolution and Growth of Data Quality Research





Comments