skip to main content
survey
Open Access

Data Science: A Comprehensive Overview

Published:29 June 2017Publication History
Skip Abstract Section

Abstract

The 21st century has ushered in the age of big data and data economy, in which data DNA, which carries important knowledge, insights, and potential, has become an intrinsic constituent of all data-based organisms. An appropriate understanding of data DNA and its organisms relies on the new field of data science and its keystone, analytics. Although it is widely debated whether big data is only hype and buzz, and data science is still in a very early phase, significant challenges and opportunities are emerging or have been inspired by the research, innovation, business, profession, and education of data science. This article provides a comprehensive survey and tutorial of the fundamental aspects of data science: the evolution from data analysis to data science, the data science concepts, a big picture of the era of data science, the major challenges and directions in data innovation, the nature of data analytics, new industrialization and service opportunities in the data economy, the profession and competency of data education, and the future of data science. This article is the first in the field to draw a comprehensive big picture, in addition to offering rich observations, lessons, and thinking about data science and analytics.

References

  1. ACEMS. 2014. The Australian Research Council (ARC) Centre of Excellence for Mathematical and Statistical Frontiers. Retrieved from acems.org.au/.Google ScholarGoogle Scholar
  2. Ritu Agarwal and Vasant Dhar. 2014. Editorial-big data, data science, and analytics: The opportunity and challenge for IS research. Information Systems Research 25, 3 (2014), 443--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Xinhua News Agency. 2016. The 13th Five-Year Plan for the National Economic and Social Development of the People’s Republic of China. Retrieved from http://news.xinhuanet.com/politics/2016lh/2016-03/17/c_1118366322.htm.Google ScholarGoogle Scholar
  4. AGIMO. 2013. AGIMO Big Data Strategy - Issues Paper. Retrieved from www.finance.gov.au/files/2013/03/Big-Data-Strategy-Issues-Paper1.pdf.Google ScholarGoogle Scholar
  5. Paul E. Anderson, James F. Bowring, Rene McCauley, George Pothering, and Christopher W. Starr. 2014. An undergraduate degree in data science: Curriculum and a decade of implementation experience. In Proceedings of the 45th ACM Technical Symposium on Computer Science Education (SIGCSE’14). 145--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. ASA. 2015. ASA views on data science. Retrieved from http://magazine.amstat.org/?s=data+science8x=08y=0.Google ScholarGoogle Scholar
  7. AU. 1990. Data-matching Program. Retrieved from http://www.comlaw.gov.au/Series/C2004A04095.Google ScholarGoogle Scholar
  8. AU. 2010. Declaration of Open Government. Retrieved from http://agimo.gov.au/2010/07/16/declaration-of-open-government/.Google ScholarGoogle Scholar
  9. AU. 2013. Attorney-General’s Department. Retrieved from http://www.attorneygeneral.gov.au/Mediareleases/Pages/2013/Seconder/22May2013-AustraliajoinsOpenGovernmentPartnership.aspx.Google ScholarGoogle Scholar
  10. AU. 2016. Australia Big Data. Retrieved from http://www.finance.gov.au/big-data/.Google ScholarGoogle Scholar
  11. Kayode Ayankoya, André P. Calitz, and Jean Greyling. 2014. Intrinsic relations between data science, big data, business analytics and datafication. ACM International Conference Proceeding Series 28 (2014), 192--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. John Bailer, Roger Hoer, David Madigan, Jill Montaquila, and Tommy Wright. 2012. Report of the ASA workgroup on master’s degrees. Retrieved from http://magazine.amstat.org/wp-content/uploads/2013an/masterworkgroup.pdf.Google ScholarGoogle Scholar
  13. Ben Baumer. 2015. A data science course for undergraduates: Thinking with data. The American Statistician 69, 4 (2015), 334--342. Google ScholarGoogle ScholarCross RefCross Ref
  14. BDL. 2016a. Big Data Landscape. Retrieved from www.bigdatalandscape.com.Google ScholarGoogle Scholar
  15. BDL. 2016b. Big Data Landscape 2016 (Version 3.0). Retrieved from http://mattturck.com/2016/02/01/big-data-landscape/.Google ScholarGoogle Scholar
  16. Mark A. Beyer and Douglas Laney. 2012. The Importance of ‘Big Data’: A Definition. Retrieved from https://www.gartner.com/doc/2057415 Gartner.Google ScholarGoogle Scholar
  17. Anant Bhardwaj, Souvik Bhattacherjee, Amit Chavan, Amol Deshp, Aaron J. Elmore, Samuel Madden, and Aditya Parameswaran. 2015. Datahub: Collaborative data science 8 dataset version management at scale. In CIDR.Google ScholarGoogle Scholar
  18. BigML. 2016. BigML. Retrieved from https://bigml.com/.Google ScholarGoogle Scholar
  19. Kirk D. Borne, Suzanne Jacoby, Karen Carney, Andy Connolly, Timothy Eastman, M. Jordan Raddick, J. A. Tyson, and John Wallin. 2010. The revolution in astronomy education: Data science for the masses. Retrieved from http://arxiv.org/pdf/0909.3895v1.pdf.Google ScholarGoogle Scholar
  20. Sebastien Boyer, Ben U. Gelman, Benjamin Schreck, and Kalyan Veeramachaneni. 2015. Data science foundry for MOOCs. In Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA’15). 1--10. Google ScholarGoogle ScholarCross RefCross Ref
  21. Leo Breiman. 2001. Statistical modeling: The two cultures. Statistical Science 16, 3 (2001), 199--231. Google ScholarGoogle ScholarCross RefCross Ref
  22. Gavin Brown. 2009. Review of Education in Mathematics, Data Science and Quantitative Disciplines: Report to the Group of Eight Universities. Retrieved from https://go8.edu.au/publication/go8-review-education-mathematics-data-scie nce-and-quantitative-disciplines.Google ScholarGoogle Scholar
  23. Linda Burtch. 2014. The Burtch Works Study: Salaries of Data Scientists. Retrieved from http://www.burtchworks.com/files/2014/07/Burtch-Works-Study_DS_final.pdf.Google ScholarGoogle Scholar
  24. Kanyarat Bussaban and Phanu Waraporn. 2015. Preparing undergraduate students majoring in computer science and mathematics with data science perspectives and awareness in the age of big data. In Proceedings of the 7th World Conference on Educational Sciences, Vol. 197. 1443--1446. Google ScholarGoogle ScholarCross RefCross Ref
  25. CA. 2016. Canada Capitalizing on Big Data. http://www.sshrc-crsh.gc.ca/news_room-salle_de_presse/latest_news-nouvell es_recentes/big_data_consultation-donnees_massives_consultation-eng.aspx.Google ScholarGoogle Scholar
  26. Longbing Cao. 2010a. Domain driven data mining: Challenges and prospects. IEEE Transactions on Knowledge and Data Engineering 22, 6 (2010), 755--769. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Longbing Cao. 2010b. In-depth behavior understanding and use: The behavior informatics approach. Information Science 180, 17 (2010), 3067--3085. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Longbing Cao. 2011. Strategic Recommendations on Advanced Data Industry and Services for the Yanhuang Science and Technology Park.Google ScholarGoogle Scholar
  29. Longbing Cao. 2014. Non-IIDness learning in behavioral and social data. The Computer Journal 57, 9 (2014), 1358--1370. Google ScholarGoogle ScholarCross RefCross Ref
  30. Longbing Cao. 2015a. Coupling learning of complex interactions. Journal of Information Processing and Management 51, 2 (2015), 167--186. Google ScholarGoogle ScholarCross RefCross Ref
  31. Longbing Cao. 2015b. Metasynthetic Computing and Engineering of Complex Systems. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Longbing Cao. 2016a. Data science and analytics: A new era. International Journal of Data Science and Analytics 1, 1 (2016), 1--2. Google ScholarGoogle ScholarCross RefCross Ref
  33. Longbing Cao. 2016b. Data science: Challenges and directions. Technical Report, UTS Advanced Analytics Institute.Google ScholarGoogle Scholar
  34. Longbing Cao. 2016c. Data Science: Nature and Pitfalls. Technical Report, UTS Advanced Analytics Institute.Google ScholarGoogle Scholar
  35. Longbing Cao. 2016d. Data Science: Profession and Education. Technical Report, UTS Advanced Analytics Institute.Google ScholarGoogle Scholar
  36. Longbing Cao. 2017. Understand Data Science (to be published). Springer.Google ScholarGoogle Scholar
  37. Longbing Cao and Ruwei Dai. 2008. Open Complex Intelligent Systems. Post Telecom Press.Google ScholarGoogle Scholar
  38. Longbing Cao, Ruwei Dai, and Mengchu Zhou. 2009. Metasynthesis: M-space, m-interaction and m-computing for open complex giant systems. IEEE Transactions on Systems, Man, and Cybernetics--Part A 39, 5 (2009), 1007--1021. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Longbing Cao and Philip S. Yu (Eds). 2012. Behavior Computing: Modeling, Analysis, Mining and Decision. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Longbing Cao, Yuming Ou, and Philip S Yu. 2012. Coupled behavior analysis with applications. IEEE Transactions on Knowledge and Data Engineering 24, 8 (2012), 1378--1392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Longbing Cao, Philip S. Yu, Chengqi Zhang, and Yanchang Zhao. 2010. Domain Driven Data Mining. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  42. Capterra. 2016a. Top Project Management Tools. Retrieved from http://www.capterra.com/project-management-software/.Google ScholarGoogle Scholar
  43. Capterra. 2016b. Top Reporting Software Products. Retrieved from http://www.capterra.com/reporting-software/.Google ScholarGoogle Scholar
  44. CBDIO. 2016. China Big Data Industrial Observation. Retrieved from www.cbdio.com.Google ScholarGoogle Scholar
  45. CCF-BDTF. 2013. China Computer Federation Task Force on Big Data. Retrieved from http://www.bigdataforum.org.cn/.Google ScholarGoogle Scholar
  46. John M. Chambers. 1993. Greater or lesser statistics: A choice for future research. Statistics and Computing 3, 4 (1993), 182--184. Google ScholarGoogle ScholarCross RefCross Ref
  47. Swami Chandrasekaran. 2013. Becoming a Data Scientist. Retrieved from http://nirvacana.com/thoughts/becoming-a-data-scientist/.Google ScholarGoogle Scholar
  48. Hsinchun Chen, Roger H. L. Chiang, and Veda C. Storey. 2012. Business intelligence and analytics: From big data to big impact. MIS Quarterly 36, 4 (2012), 1165--1188. Google ScholarGoogle ScholarCross RefCross Ref
  49. China Information Security. 2015. Big Data Strategies and Actions in Major Countries. Retrieved from http://www.cac.gov.cn/2015-07/03/c_1115812491.htm.Google ScholarGoogle Scholar
  50. Thomas R. Clancy, Kathryn H. Bowles, Lillee Gelinas, Ida Androwich, Connie Delaney, Susan Matney, Joyce Sensmeier, Judith Warren, John Welton, and Bonnie Westra. 2014. A call to action: Engage in big data science. Nursing Outlook 62, 1 (2014), 64--65. Google ScholarGoogle ScholarCross RefCross Ref
  51. Classcentral. 2016. Data Science and Big Data—Free Online Courses. Retrieved from https://www.class-central.com/subject/data-science.Google ScholarGoogle Scholar
  52. Kelly Clay. 2013. CES 2013: The Year of The Quantified Self? Retrieved from http://www.forbes.com/sites/kellyclay/2013/01/06/ces-2013-the-year-of-the-quantified-self/♯4cf4d2b55e74.Google ScholarGoogle Scholar
  53. William S. Cleveland. 2001. Data science: An action plan for expanding the technical areas of the field of statistics. International Statistical Review 69, 1 (2001), 21--26. Google ScholarGoogle ScholarCross RefCross Ref
  54. CMIST. 2016. China Will Establish A Series of National Labs. Retrieved from http://news.sciencenet.cn/htmlnews/2016/4/344404.shtm.Google ScholarGoogle Scholar
  55. CNSF. 2015. National Science Foundation China. Retrieved from http://www.nsfc.gov.cn/.Google ScholarGoogle Scholar
  56. European Commission. 2014. Commission urges governments to embrace potential of big data. Retrieved from europa.eu/rapid/press-release_IP-14-769_en.htm.Google ScholarGoogle Scholar
  57. Coursera. 2016. Coursera. Retrieved from www.coursera.org/data-science.Google ScholarGoogle Scholar
  58. Kevin Crowston and Jian Qin. 2011. A capability maturity model for scientific data management: Evidence from the literature. Proceedings of the Association for Information Science and Technology 48, 10 (2011), 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. CSC. 2012. Big data universe beginning to explode. Retrieved from http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explode.Google ScholarGoogle Scholar
  60. CSNSTC. 2009. Harnessing the Power of Digital Data for Science and Society: Report of the Interagency Working Group on Digital Data to the Committee on Science of the National Science and Technology Council. Retrieved from https://www.nitrd.gov/About/Harnessing_Power_Web.pdf.Google ScholarGoogle Scholar
  61. DABS. 2016. Data Analytics Book Series. Retrieved from http://www.springer.com/series/15063.Google ScholarGoogle Scholar
  62. DARPA. 2016. DARPA Xdata program. Retrieved from www.darpa.mil/program/xdata.Google ScholarGoogle Scholar
  63. Data61. 2016. Data61. Retrieved from https://www.data61.csiro.au/.Google ScholarGoogle Scholar
  64. DataRobot. 2016. DataRobot. Retrieved from https://www.datarobot.com/.Google ScholarGoogle Scholar
  65. Datasciences.org. 2005. Homepage. Retrieved from www.datasciences.org.Google ScholarGoogle Scholar
  66. Thomas H. Davenport and D. J. Patil. 2012. Data scientist: The sexiest job of the 21st century. Harvard Business Review (2012), 70--76.Google ScholarGoogle Scholar
  67. Jessica Davis. 2016. 10 Programming Languages And Tools Data Scientists Used. Retrieved from http://www.informationweek.com/devops/programming-languages/10-programming-languages-and-tools-data-scientists-use-now/d/d-id/1326034.Google ScholarGoogle Scholar
  68. Devendra Desale. 2015. Top 30 Social Network Analysis and Visualization Tools. Retrieved from http://www.kdnuggets.com/2015/06/top-30-social-network-analysis-visualization-tools.html.Google ScholarGoogle Scholar
  69. Vasant Dhar. 2013. Data science and prediction. Communications of the ACM 56, 12 (2013), 64--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Herman A. Dierick and Fabrizio Gabbiani. 2015. Drosophila neurobiology: No escape from ‘Big Data’ science. Current Biology 25, 14 (2015), 606--608. Google ScholarGoogle ScholarCross RefCross Ref
  71. Peter J. Diggle. 2015. Statistics: A data science for the 21st century. Journal of the Royal Statistical Society: Series A (Statistics in Society) 178, 4 (2015), 793--813. Google ScholarGoogle ScholarCross RefCross Ref
  72. David Donoho. 2015. 50 years of Data Science. Retrieved from http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf.Google ScholarGoogle Scholar
  73. Bonnie J. Dorr, Craig S. Greenberg, Peter Fontana, Mark A. Przybocki, Marion Le Bras, Cathryn A. Ploehn, Oleg Aulov, Martial Michel, E. Jim Golden, and Wo Chang. 2015. The NIST data science initiative. In Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA’15). 1--10. Google ScholarGoogle ScholarCross RefCross Ref
  74. DSA. 2016. Data Science Association. Retrieved from http://www.datascienceassn.org/.Google ScholarGoogle Scholar
  75. DSAA. 2014. IEEE/ACM/ASA International Conference on Data Science and Advanced Analytics. Retrieved from www.dsaa.co.Google ScholarGoogle Scholar
  76. DSC. 2016a. College 8 University Data Science Degrees. Retrieved from http://datascience.community/colleges.Google ScholarGoogle Scholar
  77. DSC. 2016b. The Data Science Community. Retrieved from http://datasciencebe.com/.Google ScholarGoogle Scholar
  78. DSCentral. 2016. Data Science Central. Retrieved from http://www.datasciencecentral.com/.Google ScholarGoogle Scholar
  79. DSE. 2015. Data Science and Engineering. Retrieved from http://link.springer.com/journal/41019.Google ScholarGoogle Scholar
  80. DSJ. 2014. Data Science Journal.Retrieved from datascience.codata.org.Google ScholarGoogle Scholar
  81. DSKD. 2007. Data Science and Knowledge Discovery Lab, UTS. Retrieved from http://www.uts.edu.au/research-and-teaching/our-research/quantum-computation-and-intelligent-systems/data-sciences-and.Google ScholarGoogle Scholar
  82. David Ewing Duncan. 2009. Experimental Man: What One Man’s Body Reveals about His Future, Your Health, and Our Toxic World. Wiley 8 Sons, New York.Google ScholarGoogle Scholar
  83. Edx. 2016. EDX Courses. Retrieved from https://www.edx.org/course?search_query=data+science.Google ScholarGoogle Scholar
  84. EMC. 2011. Data science revealed: A data-driven glimpse into the burgeoning new field. Retrieved from www.emc.com/collateral/about/news/emc-data-science-study-wp.pdf.Google ScholarGoogle Scholar
  85. EPJDS. 2012. EPJ Data Science. Retrieved from http://epjdatascience.springeropen.com/.Google ScholarGoogle Scholar
  86. EU. 2014. EU Towards a Thriving Data-Driven Economy. Retrieved from https://ec.europa.eu/digital-single-market/en/towards-thriving-data-driven-economy.Google ScholarGoogle Scholar
  87. EU-DSA. 2016. The European Data Science Academy. Retrieved from edsa-project.eu.Google ScholarGoogle Scholar
  88. EU-OD. 2016. The European Union Open Data Portal. Retrieved from https://open-data.europa.eu/.Google ScholarGoogle Scholar
  89. Facebook. 2016. Facebook Data. Retrieved from https://www.facebook.com/careers/teams/data/.Google ScholarGoogle Scholar
  90. James H. Faghmous and Vipin Kumar. 2014. A big data guide to understanding climate change: The case for theory-guided data science. Big Data 2, 3 (2014), 155--163. Google ScholarGoogle ScholarCross RefCross Ref
  91. Joshua Fairfielda and Hannah Shteina. 2014. Big data, big problems: Emerging issues in the ethics of data science and journalism. Journal of Mass Media Ethics 29, 1 (2014), 38--51. Google ScholarGoogle ScholarCross RefCross Ref
  92. Jack Faris, Evelyne Kolker, Alex Szalay, Leon Bradlow, Ewa Deelman, Wu Feng, Judy Qiu, Donna Russell, Elizabeth Stewart, and Eugene Kolker. 2011. Communication and data-intensive science in the beginning of the 21st century. A Journal of Integrative Biology 15, 4 (2011), 213--215.Google ScholarGoogle ScholarCross RefCross Ref
  93. Tom Fawcett. 2016. Mining the quantified self: Personal knowledge discovery as a challenge for data science. Big Data 3, 4 (2016), 249--266. Google ScholarGoogle ScholarCross RefCross Ref
  94. Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. 1996. From data mining to knowledge discovery in databases. AI Magazine 17, 3 (1996), 37--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. William Finzer. 2013. The data science education dilemma. Technology Innovations in Statistics Education 7, 2 (2013).Google ScholarGoogle Scholar
  96. Geoffrey Fox, Siddharth Maini, Howard Rosenbaum, and David J. Wild. 2015. Data science and online education. In Proceedings of the 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom’15). 582--587. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Peter Fox and James Hendler. 2014. The science of data science. Big Data 2, 2 (2014), 68--70. Google ScholarGoogle ScholarCross RefCross Ref
  98. Molly Galetto. 2016. Top 50 Data Science Resources. Retrieved from http://www.ngdata.com/top-data-science-resources/?.Google ScholarGoogle Scholar
  99. GEO. 2016. Gene Expression Omnibus. Retrieved from http://www.ncbi.nlm.nih.gov/geo/.Google ScholarGoogle Scholar
  100. Deepak Ghodke. 2015. Bye Bye 2015: What lies ahead for BI. Retrieved from http://www.ciol.com/bye-bye-2015-what-lies-ahead-for-bi/.Google ScholarGoogle Scholar
  101. Github. 2016a. Data Science Colleges. Retrieved from https://github.com/ryanswanstrom/awesome-datascience-colleges.Google ScholarGoogle Scholar
  102. Github. 2016b. List of Recommender Systems. Retrieved from https://github.com/grahamjenson/list_of_recommender_systems.Google ScholarGoogle Scholar
  103. Michael Gold, Ryan McClarren, and Conor Gaughan. 2013. The lessons Oscar taught us: Data science and media 8 entertainment. Big Data 1, 2 (2013), 105--109. Google ScholarGoogle ScholarCross RefCross Ref
  104. Google. 2016a. Google Bigquery and Cloud Platform. Retrieved from https://cloud.google.com/bigquery/.Google ScholarGoogle Scholar
  105. Google. 2016b. Google Cloud Prediction API. Retrieved from https://cloud.google.com/prediction/docs/.Google ScholarGoogle Scholar
  106. Google. 2016c. Google Online Open Education. Retrieved from https://www.google.com/edu/openonline/.Google ScholarGoogle Scholar
  107. Google. 2016d. Google Trends. (2016). https://www.google.com.au/trends/explore#q=datalyticsz=Etc Retrieved on 14 Novermber 2016.Google ScholarGoogle Scholar
  108. Google. 2016e. Open Mobile Data. Retrieved from https://console.developers.google.com/storage/browser/openmobiledata_public/.Google ScholarGoogle Scholar
  109. Beijing Municipal Government. 2016. Beijing Big Data and Cloud Computing Development Action Plan. Retrieved from http://zhengwu.beijing.gov.cn/gh/dt/t1445533.htm.Google ScholarGoogle Scholar
  110. China Government. 2015. China Big Data. Retrieved from http://www.gov.cn/zhengce/content/2015-09/05/content_10137.htm.Google ScholarGoogle Scholar
  111. Matthew J. Graham. 2012. The art of data science. In Astrostatistics and Data Mining,Springer Series in Astrostatistics, Vol. 2. 47--59. Google ScholarGoogle ScholarCross RefCross Ref
  112. Jim Gray. 2007. eScience—A Transformed Scientific Method. Retrieved from http://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt.Google ScholarGoogle Scholar
  113. GTD. 2016. Global Terrorism Database. Retrieved from https://www.start.umd.edu/gtd/.Google ScholarGoogle Scholar
  114. Akash Gupta, Ahmet Cecen, Sharad Goyal, Amarendra K. Singh, and Surya R. Kalidindi. 2015. Structure-property linkages using a data science approach: Application to a non-metallic inclusion/steel composite system. Acta Materialia 91 (2015), 239--254. Google ScholarGoogle ScholarCross RefCross Ref
  115. David J. Hand. 2015. Statistics and computing: The genesis of data science. Statistics and Computing 25, 4 (2015), 705--711. Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Hardin. 2016. Github. Retrieved from hardin47.github.io/DataSciStatsMaterials/.Google ScholarGoogle Scholar
  117. Johanna Hardin, Roger Hoerl, Nicholas J. Horton, and Deborah Nolan. 2015. Data science in statistics curricula: Preparing students to “Think with Data”. The American Statistician 69, 4 (2015), 343--353. Google ScholarGoogle ScholarCross RefCross Ref
  118. Harlan Harris, Sean Murphy, and Marck Vaisman. 2013. Analyzing the Analyzers: An Introspective Survey of Data Scientists and Their Work. O’Reilly Media. Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. Benjamin T. Hazena, Christopher A. Booneb, Jeremy D. Ezellc, and L. Allison Jones-Farmer. 2014. Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. International Journal of Production Economics 154 (2014), 72--80. Google ScholarGoogle ScholarCross RefCross Ref
  120. Tony Hey, Stewart Tansley, and Kristin Tolle (Eds.). 2009. The Fourth Paradigm: Data-Intensive Scientific Discovery. Retrieved from http://research.microsoft.com/en-us/collaboration/fourthparadigm/.Google ScholarGoogle Scholar
  121. Tony Hey and Anne Trefethen. 2003. The Data Deluge: An e-Science Perspective. John Wiley 8 Sons, Ltd, 809--824.Google ScholarGoogle Scholar
  122. HLSG. 2010. Final report of the high level expert group on scientific data. http://ec.europa.eu/information_society/newsroom/cf/document.cfm?action=display8doc_id=707.Google ScholarGoogle Scholar
  123. HLSG. 2014. An RDA Europe Report. Retrieved from http://www.e-nformation.ro/wp-content/uploads/2014/12/TheDataHarvestReport_-Final.pdf.Google ScholarGoogle Scholar
  124. Horizon. 2014. European Commission Horizon 2020 Big Data Private Public Partnership. Retrieved from http://ec.europa.eu/programmes/horizon2020/en/h2020-section/information-and-communication-technologies.Google ScholarGoogle Scholar
  125. Peter J. Huber. 2011. Data Analysis: What Can Be Learned From the Past 50 Years. John Wiley 8 Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. IASC. 1977. International Association for Statistical Computing. (1977). http://www.iasc-isi.org/.Google ScholarGoogle Scholar
  127. IBM. 2010. Capitalizing on Complexity. Retrieved from http://www-935.ibm.com/services/us/ceo/ceostudy2010/multimedia.html.Google ScholarGoogle Scholar
  128. IBM. 2016a. IBM Analytics and Big Data. Retrieved from http://www.ibm.com/analytics/us/en/orhttp://www-01.ibm.com/software/data/bigdata/.Google ScholarGoogle Scholar
  129. IBM. 2016b. What is a Data Scientist? Retrieved from http://www-01.ibm.com/software/data/infosphere/data-scientist/.Google ScholarGoogle Scholar
  130. IDA. 2014. International Institute of Data 8 Analytics. Retrieved from www.datasciences.org.Google ScholarGoogle Scholar
  131. IEEEBD. 2014. IEEE Big Data Initiative. (2014). http://bigdata.ieee.org/.Google ScholarGoogle Scholar
  132. IFSC-96. 1996. Data Science, Classification, and Related Methods. Retrieved from http://d-nb.info/955715512/04.Google ScholarGoogle Scholar
  133. IJDS. 2016. International Journal of Data Science. (2016). http://www.inderscience.com/jhome.php?jcode=ijds.Google ScholarGoogle Scholar
  134. IJRDS. 2017. International Journal of Research on Data Science. Retrieved from http://www.sciencepublishinggroup.com/journal/index?journalid=310.Google ScholarGoogle Scholar
  135. INFORMS. 2014. Candidate Handbook. Retrieved from https://www.informs.org/Certification-Continuing-Ed/Analytics-Certificati on/Candidate-Handbook.Google ScholarGoogle Scholar
  136. INFORMS. 2016. Institute for Operations Research and the Management Sciences. Retrieved from https://www.informs.org/.Google ScholarGoogle Scholar
  137. Shuichi Iwata. 2008. Scientific “agenda” of data science. Data Science Journal 7, 5 (2008), 54--56. Google ScholarGoogle ScholarCross RefCross Ref
  138. H. V. Jagadish, Johannes Gehrke, Alexandros Labrinidis, Yannis Papakonstantinou, Jignesh M. Patel, Raghu Ramakrishnan, and Cyrus Shahabi. 2014. Big data and its technical challenges. Communications of the ACM 57, 7 (2014), 86--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. H. V. Jagadish. 2015. Big data and science: Myths and reality. Big Data Research 2, 2 (2015), 49--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  140. JDS. 2002. Journal of Data Science. Retrieved from http://www.jds-online.com/.Google ScholarGoogle Scholar
  141. JDSA. 2015. International Journal of Data Science and Analytics (JDSA). Retrieved from http://www.springer.com/41060.Google ScholarGoogle Scholar
  142. JFDS. 2016. The Journal of Finance and Data Science. Retrieved from http://www.keaipublishing.com/en/journals/the-journal-of-finance-and-data-science/.Google ScholarGoogle Scholar
  143. Kaggle. 2016. Kaggle Competition Data. Retrieved from https://www.kaggle.com/competitions.Google ScholarGoogle Scholar
  144. Surya R. Kalidindi. 2015. Data science and cyberinfrastructure: Critical enablers for accelerated development of hierarchical materials. International Materials Reviews 60, 3 (2015), 150--168. Google ScholarGoogle ScholarCross RefCross Ref
  145. KDD89. 1989. IJCAI-89 Workshop on Knowledge Discovery in Databases. Retrieved from http://www.kdnuggets.com/meetings/kdd89/index.html.Google ScholarGoogle Scholar
  146. KDnuggets. 2015. Visualization Software. Retrieved from http://www.kdnuggets.com/software/visualization.html.Google ScholarGoogle Scholar
  147. Kdnuggets. 2016. Kdnuggets. Retrieved from http://www.kdnuggets.com/.Google ScholarGoogle Scholar
  148. K Kelly. 2012. The quantified century. In Quantified Self Conference. Retrieved from http://quantifiedself.com/conference/Palo-Alto-2012.Google ScholarGoogle Scholar
  149. Nawsher Khan, Ibrar Yaqoob, Ibrahim Abaker Targio Hashem, and et al. 2014. Big data: Survey, technologies, opportunities, and challenges. The Scientific World Journal 2014 (2014), 18. Google ScholarGoogle ScholarCross RefCross Ref
  150. John King and Roger Magoulas. 2015. 2015 Data Science Salary Survey. Retrieved from http://duu86o6n09pv.cloudfront.net/reports/2015-data-science-salary-survey.pdf.Google ScholarGoogle Scholar
  151. Ron Kohavi, Neal J. Rothleder, and Evangelos Simoudis. 2002. Emerging trends in business analytics. Communications of the ACM 45, 8 (2002), 45--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  152. AMP Lab. 2016. MLBase. Retrieved from http://mlbase.org/.Google ScholarGoogle Scholar
  153. Alexandros Labrinidis and H. V. Jagadish. 2012. Challenges and opportunities with big data. Proceedings of the VLDB Endowment 5, 12 (2012), 2032--2033. Google ScholarGoogle ScholarDigital LibraryDigital Library
  154. Douglas Laney. 2001. 3D Data Management: Controlling Data Volume, Velocity and Variety. Technical Report, META Group.Google ScholarGoogle Scholar
  155. David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. The parable of Google flu: Traps in big data analysis. Science 343 (2014), 1203--1205. Google ScholarGoogle ScholarCross RefCross Ref
  156. LDC. 2016. Linguistic Data Consortium. Retrieved from https://www.ldc.upenn.edu/about.Google ScholarGoogle Scholar
  157. LinkedIn. 2016. LinkedIn Jobs. Retrieved from https://www.linkedin.com/jobs/data-scientist-jobs.Google ScholarGoogle Scholar
  158. Mike Loukides. 2011. The Evolution of Data Products. O’Reilly, Cambridge.Google ScholarGoogle Scholar
  159. Mike Loukides. 2012. What is Data Science? O’Reilly Media, Sebastopol, CA. http://radar.oreilly.com/2010/06/what-is-data-science.htmldata-scientists.Google ScholarGoogle Scholar
  160. Andrea Manieri, Steve Brewer, Ruben Riestra, Yuri Demchenko, Matthias Hemmje, Tomasz Wiktorski, Tiziana Ferrari, and Jrmy Frey. 2015. Data science professional uncovered: How the EDISON project will contribute to a widely accepted profile for data scientists. In Proceedings of the 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom’15). 588--593. Google ScholarGoogle ScholarDigital LibraryDigital Library
  161. Kate Matsudaira. 2015. The science of managing data science. Communications of the ACM 58, 6 (2015), 44--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. McKinsey. 2011. Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute.Google ScholarGoogle Scholar
  163. Claire Cain Miller. 2013. Data science: The numbers of our lives. New York Times Retrieved from http://www.nytimes.com/2013/04/14/education/edlife/universities-offer-courses-in-a-hot-new-field-data-science.html?pagewanted=all8_r=0.Google ScholarGoogle Scholar
  164. Arthur John Havart Morrell (Ed.). 1968. Information processing. In Proceedings of IFIP Congress 1968. Edinburgh, UK.Google ScholarGoogle Scholar
  165. Peter Murray-Rust. 2007. Data-driven science: A scientist’s view. In NSF/JISC 2007 Digital Repositories Workshop. http://www.sis.pitt.edu/repwkshop/papers/murray.pdf.Google ScholarGoogle Scholar
  166. Peter Naur. 1968. ‘Datalogy’, the science of data and data processes. In Proceedings of IFIP Congress 1968, 1383--1387.Google ScholarGoogle Scholar
  167. Peter Naur. 1974. Concise Survey of Computer Methods. Studentlitteratur, Lund, Sweden.Google ScholarGoogle Scholar
  168. NCSU. 2007a. Institute for Advanced Analytics, North Carolina State University. Retrieved from http://analytics.ncsu.edu/.Google ScholarGoogle Scholar
  169. NCSU. 2007b. Master of Science in Analytics, Institute for Advanced Analytics, North Carolina State University. Retrieved from http://analytics.ncsu.edu/.Google ScholarGoogle Scholar
  170. Michael L. Nelson. 2009. Data-driven science: A new paradigm? EDUCAUSE Review 44, 4 (2009), 6--7.Google ScholarGoogle Scholar
  171. NICTA. 2016. National ICT Australia. Retrieved from https://www.nicta.com.au/.Google ScholarGoogle Scholar
  172. NIST. 2015. NIST Text Retrieval Conference Data. Retrieved from http://trec.nist.gov/data.html.Google ScholarGoogle Scholar
  173. NSB. 2005. Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century. Retrieved from http://www.nsf.gov/pubs/2005/nsb0540/.Google ScholarGoogle Scholar
  174. NSF. 2007. US NSF07-28. Retrieved from http://www.nsf.gov/pubs/2007/nsf0728/nsf0728.pdf.Google ScholarGoogle Scholar
  175. OECD. 2007. OECD Principles and Guidelines for Access to Research Data from Public Funding. Retrieved from https://www.oecd.org/sti/sci-tech/38500813.pdf.Google ScholarGoogle Scholar
  176. OPENedX. 2016. OPENedX Online Education Platform. Retrieved from https://open.edx.org/.Google ScholarGoogle Scholar
  177. Tim O’Reilly. 2005. What is Web 2.0. Retrieved from http://oreilly.com/pub/a/web2/archive/what-is-web-20.html?page=3.Google ScholarGoogle Scholar
  178. D. J. Patil. 2011. Building Data Science Teams. O’Reilly Media.Google ScholarGoogle Scholar
  179. Mark C. Paulk, Bill Curtis, Mary Beth Chrissis, and Charles V. Weber. 1993. Capability maturity model version 1.1. IEEE Software 10, 4 (1993), 18--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  180. Gil Press. 2013. A Very Short History of Data Science. Retrieved from http://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/61ae3ebb69fd.Google ScholarGoogle Scholar
  181. Xuesen Qian. 1991. Revisiting issues on open complex giant systems. International Journal of Pattern Recognition and Artificial Intelligence 4, 1 (1991), 5--8.Google ScholarGoogle Scholar
  182. Xuesen Qian, Jingyuan Yu, and Ruwei Dai. 1993. A new discipline of science—The study of open complex giant system and its methodology. Chinese Journal of Systems Engineering 8 Electronics. 4, 2 (1993), 2--12.Google ScholarGoogle Scholar
  183. RapidMiner. 2016. RapidMiner. (2016). https://rapidminer.com/.Google ScholarGoogle Scholar
  184. Samantha Renae. 2011. Data analytics: Crunching the future. Bloomberg Businessweek (2011). September 8.Google ScholarGoogle Scholar
  185. Solutions Review. 2016. Data Integration and Application Integration Solutions Directory. Retrieved from http://solutionsreview.com/data-integration/data-integration-solutions-directory/.Google ScholarGoogle Scholar
  186. C. Rudin, D. Dunson, R. Irizarry, H. Ji, E. Laber, J. Leek, T. McCormick, Sherri Rose, C. Schafer, M. van der Laan, L. Wasserman, and L. Xue. 2014. Discovery with Data: Leveraging Statistics with Computer Science to Transform Science and Society. Retrieved from http://www.amstat.org/policy/pdfs/BigDataStatisticsJune2014.pdf American Statistical Association.Google ScholarGoogle Scholar
  187. SAS. 2013. Big Data Analytics: An Assessment of Demand for Labour and Skills, 2012-2017. Retrieved from https://www.thetechpartnership.com/globalassets/pdfs/research-2014/bigdata_report_nov14.pdf Report. SAS/The Tech Partnership.Google ScholarGoogle Scholar
  188. SAS. 2016. SAS Retrieved from http://www.sas.com/en_us/insights.html.Google ScholarGoogle Scholar
  189. Tobias Schoenherr and Cheri Speier-Pero. 2015. Data science, predictive analytics, and big data in supply chain management: Current state and future potential. Journal of Business Logistics 36, 1 (2015), 120--132. Google ScholarGoogle ScholarCross RefCross Ref
  190. SIAM. 2016. SIAM career center. (2016). http://jobs.siam.org/home/.Google ScholarGoogle Scholar
  191. Christoph Siart, Simon Kopp, and Jochen Apel. 2015. The interface between data science, research assessment and science support—Highlights from the German perspective and examples from Heidelberg University. In Proceedings of the 2015 IIAI 4th International Congress on Advanced Applied Informatics (IIAI-AAI’15). 472--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  192. Silk. 2016. Data Science University Programs. Retrieved from http://data-science-university-programs.silk.co/.Google ScholarGoogle Scholar
  193. Larry Smarr. 2012. Quantifying your body: A how-to guide from a systems biology perspective. Biotechnology Journal 7, 8 (2012), 980--991. Google ScholarGoogle ScholarCross RefCross Ref
  194. F. Jack Smith. 2006. Data science as an academic discipline. Data Science Journal 5 (2006), 163--164. Google ScholarGoogle ScholarCross RefCross Ref
  195. SSDS. 2015. Springer Series in the Data Sciences. Retrieved from http://www.springer.com/series/13852.Google ScholarGoogle Scholar
  196. Stanford. 2014. Stanford Data Science Initiatives, Stanford University. Retrieved from https://sdsi.stanford.edu/.Google ScholarGoogle Scholar
  197. Thomas R. Stewart and Claude McMillan, Jr. 1987. Descriptive and prescriptive models for judgment and decision making: Implications for knowledge engineering. In Expert Judgment and Expert Systems, Jeryl L. Mumpower, Ortwin Renn, Lawrence D. Phillips, and V. R. R. Uppuluri (Eds.). Springer-Verlag, London, 305--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  198. Michael Stonebraker, Sam Madden, and Pradeep Dubey. 2013. Intel ‘big data’ science and technology center vision and execution plan. SIGMOD Record 42, 1 (2013), 44--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  199. Alma Swan and Sheridan Brown. 2008. The skills, role career structure of data scientists curators: Assessment of current practice future needs. (2008). Technical Report. University of Southampton.Google ScholarGoogle Scholar
  200. Melanie Swan. 2013. The quantified self: Fundamental disruption in big data science and biological discovery. Big Data 1, 2 (2013), 85--99. Google ScholarGoogle ScholarCross RefCross Ref
  201. Technavio. 2016. Top 10 Healthcare Data Analytics Companies. Retrieved from http://www.technavio.com/blog/top-10-healthcare-data-analytics-companies.Google ScholarGoogle Scholar
  202. TFDSAA. 2013. IEEE Task Force on Data Science and Advanced Analytics. Retrieved from http://dsaatf.dsaa.co/.Google ScholarGoogle Scholar
  203. TOBD. 2015. IEEE Transactions on Big Data. Retrieved from https://www.computer.org/web/tbd.Google ScholarGoogle Scholar
  204. Predictive Analytics Today. 2016. 29 Data Preparation Tools and Platforms. Retrieved from http://www.predictiveanalyticstoday.com/data-preparation-tools-and-platforms/.Google ScholarGoogle Scholar
  205. John W. Tukey. 1962. The future of data analysis. The Annals of Mathematical Statistics 33, 1 (1962), 1--67. Google ScholarGoogle ScholarCross RefCross Ref
  206. John W. Tukey. 1977. Exploratory Data Analysis. Pearson.Google ScholarGoogle Scholar
  207. Tutiempo. 2016. Global Climate Data. Retrieved from http://en.tutiempo.net/climate.Google ScholarGoogle Scholar
  208. UCI. 2016. UCI Machine Learning Repository. Retrieved from archive.ics.uci.edu/ml/.Google ScholarGoogle Scholar
  209. Udacity. 2016. Udacity Courses. Retrieved from https://www.udacity.com/courses/data-science.Google ScholarGoogle Scholar
  210. Udemy. 2016. Udemy Courses. Retrieved from https://www.udemy.com/courses/search/?ref=home8src=ukw8q=data+science8lang=en.Google ScholarGoogle Scholar
  211. UK. 2016. UK Big Data. Retrieved from http://www.rcuk.ac.uk/research/infrastructure/big-data/.Google ScholarGoogle Scholar
  212. UK-HM. 2012. UK HM Government. Retrieved from http://data.gov.uk/sites/default/files/Open_data_White_Paper.pdf.Google ScholarGoogle Scholar
  213. UK-OD. 2016. UK Open Data. Retrieved from http://data.gov.uk/.Google ScholarGoogle Scholar
  214. UMichi. 2015. Michigan Institute For Data Science, University of Michigan. Retrieved from http://midas.umich.edu/.Google ScholarGoogle Scholar
  215. UN. 2010. United Nation Global Pulse Projects. Retrieved from http://www.unglobalpulse.org/.Google ScholarGoogle Scholar
  216. US-OD. 2016. US Government Open Data. Retrieved from https://www.data.gov/.Google ScholarGoogle Scholar
  217. USD2D. 2016. US National Consortium for Data Science. Retrieved from data2discovery.org.Google ScholarGoogle Scholar
  218. USDSC. 2016. US Degree Programs in Analytics and Data Science. Retrieved from http://analytics.ncsu.edu/?page_id=4184.Google ScholarGoogle Scholar
  219. USNSF. 2012. US Big Data Research Initiative. Retrieved from http://www.nsf.gov/cise/news/bigdata.jsp.Google ScholarGoogle Scholar
  220. UTS. 2011. Master of Analytics (Research) and Doctor of Philosophy Thesis: Analytics, Advanced Analytics Institute, University of Technology Sydney. Retrieved from http://www.uts.edu.au/research-and-teaching/our-research/advanced-analytics-institute/education-and-research-opportuniti-1.Google ScholarGoogle Scholar
  221. UTSAAI. 2011. Advanced Analytics Institute, University of Technology Sydney. Retrieved from https://analytics.uts.edu.au/.Google ScholarGoogle Scholar
  222. David van Dyk, Montse Fuentes, Michael I. Jordan, Michael Newton, Bonnie K. Ray, Duncan Temple Lang, and Hadley Wickham. 2015. ASA Statement on the Role of Statistics in Data Science. Retrieved from http://magazine.amstat.org/blog/2015/10/01/asa-statement-on-the-role-of-statistics-in-data-science/.Google ScholarGoogle Scholar
  223. Vast. 2016. Visual Analytics Community. Retrieved from http://vacommunity.org/HomePage.Google ScholarGoogle Scholar
  224. Dan Vesset, Benjamin Woo, Henry D. Morris, Richard L. Villars, Gard Little, Jean S. Bozman, Lucinda Borovick, Carl W. Olofson, Susan Feldman, Steve Conway, Matthew Eastwood, and Natalya Yezhkova. 2012. Worldwide Big Data Technology and Services 2012-2015 Forecast. IDC.Google ScholarGoogle Scholar
  225. Ana Viseu and Lucy Suchman. 2010. Wearable Augmentations: Imaginaries of the Informed Body. Berghahn Books, New York, 161--184.Google ScholarGoogle Scholar
  226. Whitehouse. 2015. The White House Names Dr. D. J. Patil as the First U.S. Chief Data Scientist. Retrieved from https://www.whitehouse.gov/blog/2015/02/18/white-house-names-dr-dj-patil-first-us-chief-data-scientist.Google ScholarGoogle Scholar
  227. Wikipedia. 2016a. Comparison of Cluster Software. Retrieved from https://en.wikipedia.org/wiki/Comparison_of_cluster_software.Google ScholarGoogle Scholar
  228. Wikipedia. 2016b. Informatics. (2016). https://en.wikipedia.org/wiki/Informatics.Google ScholarGoogle Scholar
  229. Wikipedia. 2016c. List of Reporting Software. Retrieved from https://en.wikipedia.org/wiki/List_of_reporting_software.Google ScholarGoogle Scholar
  230. WIRED. 2014. How Europe can Seize the Starring Role in Big Data. Retrieved from www.wired.com/insights/2014/09/europe-big-data/.Google ScholarGoogle Scholar
  231. Gary Wolf. 2012. The data-driven life. New York Times. Retrieved from www.nytimes.com/2010/05/02/magazine/02self-measurement-t.html.Google ScholarGoogle Scholar
  232. Jeff Wu. 1997. Statistics = Data Science? Retrieved from http://www2.isye.gatech.edu/∼jeffwu/presentations/datascience.pdf.Google ScholarGoogle Scholar
  233. Yahoo. 2016. Yahoo Finance. Retrieved from finance.yahoo.com.Google ScholarGoogle Scholar
  234. Nathan Yau. 2009. Rise of the Data Scientist. Retrieved from http://flowingdata.com/2009/06/04/rise-of-the-data-scientist/.Google ScholarGoogle Scholar
  235. Chris Yiu. 2012. The Big Data Opportunity. Retrieved from http://www.policyexchange.org.uk/images/publications/thepportunity.pdf.Google ScholarGoogle Scholar
  236. Bin Yu. 2014. IMS presidential address: Let us own data science. IMS Bulletin Online (2014). Oct. 1, 2014.Google ScholarGoogle Scholar

Index Terms

  1. Data Science: A Comprehensive Overview

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader