skip to main content
research-article
Free access

Data science and prediction

Published: 01 December 2013 Publication History

Abstract

Big data promises automated actionable knowledge creation and predictive models for use by both humans and computers.

References

[1]
Anderson, C. The end of theory: The data deluge makes the scientific method obsolete. Wired 16, 7 (June 23, 2008).
[2]
Aral, S. and Walker, D. Identifying influential and susceptible members of social networks. Science 337, 6092 (June 21, 2012).
[3]
Buchan, I., Winn, J., and Bishop, C. A Unified Modeling Approach to Data-Intensive Healthcare. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, WA, 2009.
[4]
Dhar, V. Prediction in financial markets: The case for small disjuncts. ACM Transactions on Intelligent Systems and Technologies 2, 3 (Apr. 2011).
[5]
Dhar, V. and Chou, D. A comparison of nonlinear models for financial prediction. IEEE Transactions on Neural Networks 12, 4 (June 2001), 907--921.
[6]
Dhar, V. and Stein, R. Seven Methods for Transforming Corporate Data Into Business Intelligence. Prentice-Hall, Englewood Cliffs, NJ, 1997.
[7]
Frawley, W. and Piatetsky-Shapiro, G., Eds. Knowledge Discovery in Databases. AAAI/MIT Press, Cambridge, MA, 1991.
[8]
Gladwell, M. The Tipping Point: How Little Things Can Make a Big Difference. Little Brown, New York, 2000.
[9]
Goel, S., Watts, D., and Goldstein, D. The structure of online diffusion networks. In Proceedings of the 13th ACM Conference on Electronic Commerce (2012), 623--638.
[10]
Hastie, T., Tibsharani, R., and Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, 2009.
[11]
Heilbron, J.L., Ed. The Oxford Companion to the History of Modern Science. Oxford University Press, New York, 2003.
[12]
Hey, T., Tansley, S., and Tolle, K., Eds. 2009. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, WA, 2009.
[13]
Hunt, J., Baldochi, D., and van Ingen, C. Redefining Ecological Science Using Data. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, WA, 2009.
[14]
Issenberg, S. A more perfect union: How President Obama's campaign used big data to rally individual voters. MIT Technology Review (Dec. 2012).
[15]
Kohavi, R., Longbotham, R., Sommerfield, D., and Henne, R. Controlled experiments on the Web: Survey and practical guide. Data Mining and Knowledge Discovery 18 (2009), 140--181.
[16]
Lin, T., Patrick, P., Gamon, M., Kannan, A., and Fuxman, A. Active objects: Actions for entity-centric search. In Proceedings of the 21st International Conference on the World Wide Web (Lyon, France). ACM Press, New York, 2012.
[17]
Linoff, G. and Berry, M. Data Mining Techniques: For Marketing, Sales, and Customer Support. John Wiley & Sons, Inc., New York, 1997.
[18]
Maguire, J. and Dhar, V. Comparative effectiveness for oral anti-diabetic treatments among newly diagnosed Type 2 diabetics: Data-driven predictive analytics in healthcare. Health Systems 2 (2013), 73--92.
[19]
McKinsey Global Institute. Big Data: The Next Frontier for Innovation, Competition, and Productivity. Technical Report, June 2011.
[20]
Meinshausen, N. Relaxed lasso. Computational Statistics & Data Analysis 52, 1 (Sept. 15, 2007), 374--393.
[21]
Papert, S. An exploration in the space of mathematics educations. International Journal of Computers for Mathematical Learning 1, 1 (1996), 95--123.
[22]
Pearl, J. Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, U.K., 2000.
[23]
Perlich, C., Provost, F., and Simonoff, J. Tree induction vs. logistic regression: A learning-curve analysis. Journal of Machine Learning Research 4, 12 (2003), 211--255.
[24]
Popper, K. Conjectures and Refutations. Routledge, London, 1963.
[25]
Provost, F. and Fawcett, T. Data Science for Business. O'Reilly Media, New York, 2013.
[26]
Roush, W. Google gets a second brain, changing everything about search. Xconomy (Dec. 12, 2012); http://www.xconomy.com/san-francisco/2012/12/12/google-gets-a-second-brain-changing-everything-about-search/?single_page=true
[27]
Shmueli, G. To explain or to predict? Statistical Science 25, 3 (Aug. 2010), 289--310.
[28]
Simon, H.A. and Hayes, J.R. The understanding process: Problem isomorphs. Cognitive Psychology 8, 2 (Apr. 1976), 165--190.
[29]
Sloman, S. Causal Models. Oxford University Press, Oxford, U.K. 2005.
[30]
Spirtes, P., Scheines, R., and Glymour, C. Causation, Prediction and Search. Springer, New York, 1993.
[31]
Tukey, J.W. Exploratory Data Analysis. Addison-Wesley, Boston, 1977.
[32]
Wing, J. Computational thinking. Commun. ACM 49, 3 (Mar. 2006), 33--35.

Cited By

View all

Reviews

Charles Kenneth Davis

This is an enlightening treatise on data science. There is no hype here-just a thought-provoking piece that articulates fundamental concepts and implications. The natural audience is the IT or business professional (or manager) who is interested in acquiring a clearer understanding of modern data science. Focusing primarily on examples from the healthcare industry, this article explains succinctly why “big data” really is different because of its impact on well-established approaches to creating knowledge. The author begins by defining “data science” as the “generalizable extraction of knowledge from data,” focusing on the notions that much of today's data is unstructured and that traditional database models are mostly unsuitable for such data. After this introduction, he begins to develop the core thesis of the article with a discussion of prediction and machine learning. The conventional approach to creating knowledge is to build a theory in the human mind based upon previously established theories and then to verify the new theory by collecting and analyzing appropriate data. The author points out that big data turns this on its head by making it possible for machine learning algorithms to build good models for predicting outcomes with little understanding of key underlying relationships and with no theoretical framework to support those models. Furthermore, since these models are based on the data and are essentially computer-generated, they can be made to evolve in conjunction with the processes that create their data. There is no need to rebuild theory as the situation changes in order to build new models. All of this, of course, portends fully automated decision making on a grand scale. This is an important article for those who wish to understand the rationale and potential for data science. The focus is not so much on analytics, per se, as it is on machine-based prediction and machine-based decision making. This informative article lays the conceptual groundwork for these insights, and explains how and why machine learning is the true driving force behind the future of the data science phenomenon. Online Computing Reviews Service

Ahmed S Nagy

Dhar presents a theory of data science that addresses challenges and caveats for dealing with big data. The study is well documented and easy to read for a wide audience. It is a useful guide to understand timely challenges in the area of big data. The review fits well with recent developments in knowledge modeling and the semantic web. The article presents a new perspective on big data and demonstrates this with real-world situations. Dhar argues that we are moving toward a big data era in which computers will be better decision makers than people in many situations. Though that is a bold statement, it is true to a great extent. Dhar emphasizes the limitations of knowledge discovery techniques by claiming that all models are wrong, yet some are useful. The article explains the usefulness of machine learning as an approach for discovering interesting data patterns. Dhar argues that big data helps in reducing errors resulting from misspecifications of a model and small samples by enabling validation. He concludes that big data makes it feasible to uncover the causal models generating the data by using machine learning to model big data. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 56, Issue 12
December 2013
102 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/2534706
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2013
Published in CACM Volume 56, Issue 12

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Popular
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4,762
  • Downloads (Last 6 weeks)421
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Data ScienceEncyclopedia of Libraries, Librarianship, and Information Science10.1016/B978-0-323-95689-5.00268-6(89-96)Online publication date: 2025
  • (2024)Revolution Ethics of Data Science and AIThe Ethical Frontier of AI and Data Analysis10.4018/979-8-3693-2964-1.ch015(245-256)Online publication date: 12-Apr-2024
  • (2024)Implementing AI in Your Leadership StrategyHolistic Approach to AI and Leadership10.4018/979-8-3693-2695-4.ch010(277-334)Online publication date: 28-Jun-2024
  • (2024)Automated Management ProcessesHolistic Approach to AI and Leadership10.4018/979-8-3693-2695-4.ch005(96-119)Online publication date: 28-Jun-2024
  • (2024)Data-Driven Decision MakingHolistic Approach to AI and Leadership10.4018/979-8-3693-2695-4.ch004(73-95)Online publication date: 28-Jun-2024
  • (2024)Competitive Data Use, Analysis, and Big Data Applications in Online AdvertisingAdvancements in Socialized and Digital Media Communications10.4018/979-8-3693-0855-4.ch018(264-291)Online publication date: 26-Jan-2024
  • (2024)Effective Modeling of CO2 Emissions for Light-Duty Vehicles: Linear and Non-Linear Models with Feature SelectionEnergies10.3390/en1707165517:7(1655)Online publication date: 29-Mar-2024
  • (2024)Evolution of Management Information Systems by Super Artificial Intelligence RevolutionsUluslararası Yönetim Bilişim Sistemleri ve Bilgisayar Bilimleri Dergisi10.33461/uybisbbd.15210868:2(127-142)Online publication date: 30-Dec-2024
  • (2024)Scoping review: Machine learning interventions in the management of healthcare systemsDIGITAL HEALTH10.1177/2055207622114409510Online publication date: 22-Oct-2024
  • (2024)How to improve data quality to empower business decision-making process and business strategy agility in the AI ageBusiness Information Review10.1177/0266382124126470541:3(124-129)Online publication date: 25-Jun-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDFChinese translation

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media