research-article

From Content to Context: The Evolution and Growth of Data Quality Research

Authors Info & Claims
Online:04 January 2017Publication History

Abstract

Research in data and information quality has made significant strides over the last 20 years. It has become a unified body of knowledge incorporating techniques, methods, and applications from a variety of disciplines including information systems, computer science, operations management, organizational behavior, psychology, and statistics. With organizations viewing “Big Data”, social media data, data-driven decision-making, and analytics as critical, data quality has never been more important. We believe that data quality research is reaching the threshold of significant growth and a metamorphosis from focusing on measuring and assessing data quality—content—toward a focus on usage and context. At this stage, it is vital to understand the identity of this research area in order to recognize its current state and to effectively identify an increasing number of research opportunities within. Using Latent Semantic Analysis (LSA) to analyze the abstracts of 972 peer-reviewed journal and conference articles published over the past 20 years, this article contributes by identifying the core topics and themes that define the identity of data quality research. It further explores their trends over time, pointing to the data quality dimensions that have—and have not—been well-studied, and offering insights into topics that may provide significant opportunities in this area.

References

  1. D. P. Ballou and H. L. Pazer. 1985. Designing information systems to optimize the accuracy-timeliness tradeoff. Inf. Syst. Res. 6, (1985).Google ScholarGoogle Scholar
  2. D. P. Ballou, R. Y. Wang, and G. K. Tayi. 1998. Modeling information manufacturing systems to determine information product quality. Manage. Sci. 44, 4 (1998), 462--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. I. Benbasat and R. Zmud. 2003. The identity crisis within the IS discipline: Defining and communicating the discipline's core properties. MIS Quart. 27, (2003), 183--194.Google ScholarGoogle Scholar
  4. R. Blake and P. Mangiameli. 2011. The effects and interactions of data quality and problem complexity on classification. J. Data Inf. Qual. 2, 2 (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Blake. 2010. Identifying the core topics and themes in information quality research. In Proceedings of Americas Conference on Information Systems (AMCIS 2010).Google ScholarGoogle Scholar
  6. R. Blake and G. Shankaranarayanan. 2010. Framing data quality research: a semantic analysis approach. In Proceedings of the International Conference on Information Quality (ICIQ 2012).Google ScholarGoogle Scholar
  7. R. Blake and G. Shankaranarayanan. 2012. Discovering data and information quality research: insights gained through latent semantic analysis. Int. J. Busin. Intell. Res. 3, 1 (2012), 1--16. Google ScholarGoogle ScholarCross RefCross Ref
  8. B. Cowgill, J. Wolfers, and E. Zitzewitz. 2009. Using prediction markets to track information flows: Evidence from Google. Source: https://www.stat.berkeley.edu/∼aldous/157/Papers/GooglePredictionMarketPaper.pdf.Google ScholarGoogle Scholar
  9. S. Cummings and U. Daellenbach. 2009. A guide to the future of strategy: The history of long range planning. Long Range Plan. 42 (2009), 234--263. Google ScholarGoogle ScholarCross RefCross Ref
  10. S. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. 1990. Indexing by latent semantic analysis. J. Soc. Inf. Sci. 41 (1990), 391--407. Google ScholarGoogle ScholarCross RefCross Ref
  11. N. Evangelopoulos, X. Zhang, and V. Prybutok. 2012. Latent semantic analysis: Five methodological recommendations Europ. J. Inf. Syst. 21, 1 (2012), 70--86. Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Even, G. Shankaranarayanan, and P. D. Berger. 2007. Inequality in utility of data and its implications for data management. In Proceedings of the 17th Annual Workshop on Information Technology and Systems (WITS 2007).Google ScholarGoogle Scholar
  13. C. W. Fisher, I. Chengalur-Smith, and D. P. Ballou. 2003. The impact of experience and time on the use of data quality information in decision making. Inf. Syst. Res. 14, 2 (2003), 170--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Gaynor and G. Shankaranarayanan. 2008. Implications of sensors and sensor networks for data quality management. Int.J. Inf. Qual. 2, 1, 75--93. Google ScholarGoogle ScholarCross RefCross Ref
  15. M. Ge and M. Helfert. 2007. A review of information quality research. In Proceedings of the International Conference on Information Quality (ICIQ 2007).Google ScholarGoogle Scholar
  16. T. L. Griffiths and M. Steyvers. 2004. Finding scientific topics. In Proceedings of the National Academy of Sciences of the United States of America 101, Suppl 1, (2004), 5228. Google ScholarGoogle ScholarCross RefCross Ref
  17. J. M. Juran and A. B. Godfrey. 2000. Juran's Quality Handbook. McGraw-Hill International Editions: Industrial Engineering Series, 5thEdition.Google ScholarGoogle Scholar
  18. D. Laham. 1997. Latent semantic analysis approaches to categorization. In Proceedings of the 19th Annual Conference of the Cognitive Science Society. (1997).Google ScholarGoogle Scholar
  19. T. K. Landauer, P. W. Foltz, and D. Laham. 1998. Introduction to latent semantic analysis. Disc. Process. 25 (1998), 259--284. Google ScholarGoogle ScholarCross RefCross Ref
  20. L. S. Larkey and W. B. Croft. 2003. A text categorization approach to automated essay grading. In Automated Essay Scoring: A Cross-Discipline Perspective. Lawrence Erlbaum, Mahwah, NJ.Google ScholarGoogle Scholar
  21. Y. W. Lee, L. L. Pipino, J. D. Funk, and R. Y. Wang. 2006. Journey to Data Quality. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  22. L. Lima, G. Maçada, and L. M. Vargas. 2006. Research into information quality: A study of the state-of-the-art in IQ and its consolidation. In Proceedings of the International Conference on Information Quality (ICIQ 2006).Google ScholarGoogle Scholar
  23. S. Madnick, R. Y. Wang, Y. W. Lee, and H. Zhu. 2009. Overview and framework for data and information quality research. ACM J. Inf. Data Qual. 1, 1--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Mason and I. Mitroff. 1973. A program for research on management information systems. Manage. Sci. 19 (1973), 475--487. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. P. Neely and J. Cook. 2008. A framework for classification of the data and information quality literature and preliminary results (1996-2007). In Proceedings of Americas Conference on Information Systems (AMCIS 2008).Google ScholarGoogle Scholar
  26. M. F. Porter. 1980. An algorithm for suffix stripping. Program. 14, 3 (1980), 130--137. Google ScholarGoogle Scholar
  27. G. Shankaranarayanan, M. Ziad, and R. Y. Wang. 2003. Managing data quality in dynamic decision environments: An information product approach. J. Datab. Manage. 14, 4 (2003), 14--32. Google ScholarGoogle ScholarCross RefCross Ref
  28. G. Shankaranarayanan and Y. Cai. 2006. Supporting data quality management in decision making. Decis. Supp. Syst. 42 (2006), 302--317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. Shankaranarayanan, B. Iyer, and D. Stoddard. 2012. Quality of social media data and implications of social media for data quality. In Proceedings of the International Conference on Information Quality (ICIQ 2012), 311--325.Google ScholarGoogle Scholar
  30. G. Shankaranarayanan and S. Watts. 2003. A relevant believable approach for data quality assessment. In Proceedings of the International Conference on Information Quality (ICIQ 2003).Google ScholarGoogle Scholar
  31. A. Sidorova, N. Evangelopoulos, J. S. Valacich, and T. Ramakrishnan. 2008. Uncovering the intellectual core of the information systems discipline. MIS Quart. 32 (2008) 467--482.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Stotesbury. 2003. Evaluation in research article abstracts in the narrative and hard sciences. J. Eng. Acad. Purp. 2 (2003), 327--341. Google ScholarGoogle ScholarCross RefCross Ref
  33. H. Taylor, S. Dillon, and M. V. Wingen. 2010. Focus and diversity in information systems research: Meeting the dual demands of a healthy applied discipline. MIS Quart. 34, 3 (2010), 647--667.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. Y. Wang. 1998. A product perspective on total data quality management. Commun. ACM 41, 2 (1998), 58--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. Y. Wang, V. C. Storey, and P. Firth. 1995. A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7 (1995), 623--640. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Y. Wang and D. M. Strong. 1996. Beyond accuracy: What data quality means to consumers. J. Manage. Inf. Syst. 12 (1996), 5--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. T. Zhang, Y. Wu, and W. Huang. 2014. Comparison of LSA and LDA in data/information quality research. In Proceedings of the International Conference on Information Quality (ICIQ 2014).Google ScholarGoogle Scholar

Index Terms

  1. From Content to Context: The Evolution and Growth of Data Quality Research

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            Journal of Data and Information Quality cover image
            Journal of Data and Information Quality  Volume 8, Issue 2
            Challenge Papers and Research Papers
            February 2017
            62 pages
            ISSN:1936-1955
            EISSN:1936-1963
            DOI:10.1145/3035914
            Issue’s Table of Contents

            Copyright © 2017 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Online: 4 January 2017
            • Accepted: 1 September 2016
            • Revised: 1 June 2016
            • Received: 1 December 2015

            Permissions

            Request permissions about this article.

            Request Permissions

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!