Abstract
The 21st century has ushered in the age of big data and data economy, in which data DNA, which carries important knowledge, insights, and potential, has become an intrinsic constituent of all data-based organisms. An appropriate understanding of data DNA and its organisms relies on the new field of data science and its keystone, analytics. Although it is widely debated whether big data is only hype and buzz, and data science is still in a very early phase, significant challenges and opportunities are emerging or have been inspired by the research, innovation, business, profession, and education of data science. This article provides a comprehensive survey and tutorial of the fundamental aspects of data science: the evolution from data analysis to data science, the data science concepts, a big picture of the era of data science, the major challenges and directions in data innovation, the nature of data analytics, new industrialization and service opportunities in the data economy, the profession and competency of data education, and the future of data science. This article is the first in the field to draw a comprehensive big picture, in addition to offering rich observations, lessons, and thinking about data science and analytics.
- ACEMS. 2014. The Australian Research Council (ARC) Centre of Excellence for Mathematical and Statistical Frontiers. Retrieved from acems.org.au/.Google Scholar
- Ritu Agarwal and Vasant Dhar. 2014. Editorial-big data, data science, and analytics: The opportunity and challenge for IS research. Information Systems Research 25, 3 (2014), 443--448. Google Scholar
Digital Library
- Xinhua News Agency. 2016. The 13th Five-Year Plan for the National Economic and Social Development of the People’s Republic of China. Retrieved from http://news.xinhuanet.com/politics/2016lh/2016-03/17/c_1118366322.htm.Google Scholar
- AGIMO. 2013. AGIMO Big Data Strategy - Issues Paper. Retrieved from www.finance.gov.au/files/2013/03/Big-Data-Strategy-Issues-Paper1.pdf.Google Scholar
- Paul E. Anderson, James F. Bowring, Rene McCauley, George Pothering, and Christopher W. Starr. 2014. An undergraduate degree in data science: Curriculum and a decade of implementation experience. In Proceedings of the 45th ACM Technical Symposium on Computer Science Education (SIGCSE’14). 145--150. Google Scholar
Digital Library
- ASA. 2015. ASA views on data science. Retrieved from http://magazine.amstat.org/?s=data+science8x=08y=0.Google Scholar
- AU. 1990. Data-matching Program. Retrieved from http://www.comlaw.gov.au/Series/C2004A04095.Google Scholar
- AU. 2010. Declaration of Open Government. Retrieved from http://agimo.gov.au/2010/07/16/declaration-of-open-government/.Google Scholar
- AU. 2013. Attorney-General’s Department. Retrieved from http://www.attorneygeneral.gov.au/Mediareleases/Pages/2013/Seconder/22May2013-AustraliajoinsOpenGovernmentPartnership.aspx.Google Scholar
- AU. 2016. Australia Big Data. Retrieved from http://www.finance.gov.au/big-data/.Google Scholar
- Kayode Ayankoya, André P. Calitz, and Jean Greyling. 2014. Intrinsic relations between data science, big data, business analytics and datafication. ACM International Conference Proceeding Series 28 (2014), 192--198. Google Scholar
Digital Library
- John Bailer, Roger Hoer, David Madigan, Jill Montaquila, and Tommy Wright. 2012. Report of the ASA workgroup on master’s degrees. Retrieved from http://magazine.amstat.org/wp-content/uploads/2013an/masterworkgroup.pdf.Google Scholar
- Ben Baumer. 2015. A data science course for undergraduates: Thinking with data. The American Statistician 69, 4 (2015), 334--342. Google Scholar
Cross Ref
- BDL. 2016a. Big Data Landscape. Retrieved from www.bigdatalandscape.com.Google Scholar
- BDL. 2016b. Big Data Landscape 2016 (Version 3.0). Retrieved from http://mattturck.com/2016/02/01/big-data-landscape/.Google Scholar
- Mark A. Beyer and Douglas Laney. 2012. The Importance of ‘Big Data’: A Definition. Retrieved from https://www.gartner.com/doc/2057415 Gartner.Google Scholar
- Anant Bhardwaj, Souvik Bhattacherjee, Amit Chavan, Amol Deshp, Aaron J. Elmore, Samuel Madden, and Aditya Parameswaran. 2015. Datahub: Collaborative data science 8 dataset version management at scale. In CIDR.Google Scholar
- BigML. 2016. BigML. Retrieved from https://bigml.com/.Google Scholar
- Kirk D. Borne, Suzanne Jacoby, Karen Carney, Andy Connolly, Timothy Eastman, M. Jordan Raddick, J. A. Tyson, and John Wallin. 2010. The revolution in astronomy education: Data science for the masses. Retrieved from http://arxiv.org/pdf/0909.3895v1.pdf.Google Scholar
- Sebastien Boyer, Ben U. Gelman, Benjamin Schreck, and Kalyan Veeramachaneni. 2015. Data science foundry for MOOCs. In Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA’15). 1--10. Google Scholar
Cross Ref
- Leo Breiman. 2001. Statistical modeling: The two cultures. Statistical Science 16, 3 (2001), 199--231. Google Scholar
Cross Ref
- Gavin Brown. 2009. Review of Education in Mathematics, Data Science and Quantitative Disciplines: Report to the Group of Eight Universities. Retrieved from https://go8.edu.au/publication/go8-review-education-mathematics-data-scie nce-and-quantitative-disciplines.Google Scholar
- Linda Burtch. 2014. The Burtch Works Study: Salaries of Data Scientists. Retrieved from http://www.burtchworks.com/files/2014/07/Burtch-Works-Study_DS_final.pdf.Google Scholar
- Kanyarat Bussaban and Phanu Waraporn. 2015. Preparing undergraduate students majoring in computer science and mathematics with data science perspectives and awareness in the age of big data. In Proceedings of the 7th World Conference on Educational Sciences, Vol. 197. 1443--1446. Google Scholar
Cross Ref
- CA. 2016. Canada Capitalizing on Big Data. http://www.sshrc-crsh.gc.ca/news_room-salle_de_presse/latest_news-nouvell es_recentes/big_data_consultation-donnees_massives_consultation-eng.aspx.Google Scholar
- Longbing Cao. 2010a. Domain driven data mining: Challenges and prospects. IEEE Transactions on Knowledge and Data Engineering 22, 6 (2010), 755--769. Google Scholar
Digital Library
- Longbing Cao. 2010b. In-depth behavior understanding and use: The behavior informatics approach. Information Science 180, 17 (2010), 3067--3085. Google Scholar
Digital Library
- Longbing Cao. 2011. Strategic Recommendations on Advanced Data Industry and Services for the Yanhuang Science and Technology Park.Google Scholar
- Longbing Cao. 2014. Non-IIDness learning in behavioral and social data. The Computer Journal 57, 9 (2014), 1358--1370. Google Scholar
Cross Ref
- Longbing Cao. 2015a. Coupling learning of complex interactions. Journal of Information Processing and Management 51, 2 (2015), 167--186. Google Scholar
Cross Ref
- Longbing Cao. 2015b. Metasynthetic Computing and Engineering of Complex Systems. Springer. Google Scholar
Digital Library
- Longbing Cao. 2016a. Data science and analytics: A new era. International Journal of Data Science and Analytics 1, 1 (2016), 1--2. Google Scholar
Cross Ref
- Longbing Cao. 2016b. Data science: Challenges and directions. Technical Report, UTS Advanced Analytics Institute.Google Scholar
- Longbing Cao. 2016c. Data Science: Nature and Pitfalls. Technical Report, UTS Advanced Analytics Institute.Google Scholar
- Longbing Cao. 2016d. Data Science: Profession and Education. Technical Report, UTS Advanced Analytics Institute.Google Scholar
- Longbing Cao. 2017. Understand Data Science (to be published). Springer.Google Scholar
- Longbing Cao and Ruwei Dai. 2008. Open Complex Intelligent Systems. Post Telecom Press.Google Scholar
- Longbing Cao, Ruwei Dai, and Mengchu Zhou. 2009. Metasynthesis: M-space, m-interaction and m-computing for open complex giant systems. IEEE Transactions on Systems, Man, and Cybernetics--Part A 39, 5 (2009), 1007--1021. Google Scholar
Digital Library
- Longbing Cao and Philip S. Yu (Eds). 2012. Behavior Computing: Modeling, Analysis, Mining and Decision. Springer. Google Scholar
Digital Library
- Longbing Cao, Yuming Ou, and Philip S Yu. 2012. Coupled behavior analysis with applications. IEEE Transactions on Knowledge and Data Engineering 24, 8 (2012), 1378--1392. Google Scholar
Digital Library
- Longbing Cao, Philip S. Yu, Chengqi Zhang, and Yanchang Zhao. 2010. Domain Driven Data Mining. Springer. Google Scholar
Cross Ref
- Capterra. 2016a. Top Project Management Tools. Retrieved from http://www.capterra.com/project-management-software/.Google Scholar
- Capterra. 2016b. Top Reporting Software Products. Retrieved from http://www.capterra.com/reporting-software/.Google Scholar
- CBDIO. 2016. China Big Data Industrial Observation. Retrieved from www.cbdio.com.Google Scholar
- CCF-BDTF. 2013. China Computer Federation Task Force on Big Data. Retrieved from http://www.bigdataforum.org.cn/.Google Scholar
- John M. Chambers. 1993. Greater or lesser statistics: A choice for future research. Statistics and Computing 3, 4 (1993), 182--184. Google Scholar
Cross Ref
- Swami Chandrasekaran. 2013. Becoming a Data Scientist. Retrieved from http://nirvacana.com/thoughts/becoming-a-data-scientist/.Google Scholar
- Hsinchun Chen, Roger H. L. Chiang, and Veda C. Storey. 2012. Business intelligence and analytics: From big data to big impact. MIS Quarterly 36, 4 (2012), 1165--1188. Google Scholar
Cross Ref
- China Information Security. 2015. Big Data Strategies and Actions in Major Countries. Retrieved from http://www.cac.gov.cn/2015-07/03/c_1115812491.htm.Google Scholar
- Thomas R. Clancy, Kathryn H. Bowles, Lillee Gelinas, Ida Androwich, Connie Delaney, Susan Matney, Joyce Sensmeier, Judith Warren, John Welton, and Bonnie Westra. 2014. A call to action: Engage in big data science. Nursing Outlook 62, 1 (2014), 64--65. Google Scholar
Cross Ref
- Classcentral. 2016. Data Science and Big Data—Free Online Courses. Retrieved from https://www.class-central.com/subject/data-science.Google Scholar
- Kelly Clay. 2013. CES 2013: The Year of The Quantified Self? Retrieved from http://www.forbes.com/sites/kellyclay/2013/01/06/ces-2013-the-year-of-the-quantified-self/♯4cf4d2b55e74.Google Scholar
- William S. Cleveland. 2001. Data science: An action plan for expanding the technical areas of the field of statistics. International Statistical Review 69, 1 (2001), 21--26. Google Scholar
Cross Ref
- CMIST. 2016. China Will Establish A Series of National Labs. Retrieved from http://news.sciencenet.cn/htmlnews/2016/4/344404.shtm.Google Scholar
- CNSF. 2015. National Science Foundation China. Retrieved from http://www.nsfc.gov.cn/.Google Scholar
- European Commission. 2014. Commission urges governments to embrace potential of big data. Retrieved from europa.eu/rapid/press-release_IP-14-769_en.htm.Google Scholar
- Coursera. 2016. Coursera. Retrieved from www.coursera.org/data-science.Google Scholar
- Kevin Crowston and Jian Qin. 2011. A capability maturity model for scientific data management: Evidence from the literature. Proceedings of the Association for Information Science and Technology 48, 10 (2011), 1--9. Google Scholar
Digital Library
- CSC. 2012. Big data universe beginning to explode. Retrieved from http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explode.Google Scholar
- CSNSTC. 2009. Harnessing the Power of Digital Data for Science and Society: Report of the Interagency Working Group on Digital Data to the Committee on Science of the National Science and Technology Council. Retrieved from https://www.nitrd.gov/About/Harnessing_Power_Web.pdf.Google Scholar
- DABS. 2016. Data Analytics Book Series. Retrieved from http://www.springer.com/series/15063.Google Scholar
- DARPA. 2016. DARPA Xdata program. Retrieved from www.darpa.mil/program/xdata.Google Scholar
- Data61. 2016. Data61. Retrieved from https://www.data61.csiro.au/.Google Scholar
- DataRobot. 2016. DataRobot. Retrieved from https://www.datarobot.com/.Google Scholar
- Datasciences.org. 2005. Homepage. Retrieved from www.datasciences.org.Google Scholar
- Thomas H. Davenport and D. J. Patil. 2012. Data scientist: The sexiest job of the 21st century. Harvard Business Review (2012), 70--76.Google Scholar
- Jessica Davis. 2016. 10 Programming Languages And Tools Data Scientists Used. Retrieved from http://www.informationweek.com/devops/programming-languages/10-programming-languages-and-tools-data-scientists-use-now/d/d-id/1326034.Google Scholar
- Devendra Desale. 2015. Top 30 Social Network Analysis and Visualization Tools. Retrieved from http://www.kdnuggets.com/2015/06/top-30-social-network-analysis-visualization-tools.html.Google Scholar
- Vasant Dhar. 2013. Data science and prediction. Communications of the ACM 56, 12 (2013), 64--73. Google Scholar
Digital Library
- Herman A. Dierick and Fabrizio Gabbiani. 2015. Drosophila neurobiology: No escape from ‘Big Data’ science. Current Biology 25, 14 (2015), 606--608. Google Scholar
Cross Ref
- Peter J. Diggle. 2015. Statistics: A data science for the 21st century. Journal of the Royal Statistical Society: Series A (Statistics in Society) 178, 4 (2015), 793--813. Google Scholar
Cross Ref
- David Donoho. 2015. 50 years of Data Science. Retrieved from http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf.Google Scholar
- Bonnie J. Dorr, Craig S. Greenberg, Peter Fontana, Mark A. Przybocki, Marion Le Bras, Cathryn A. Ploehn, Oleg Aulov, Martial Michel, E. Jim Golden, and Wo Chang. 2015. The NIST data science initiative. In Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA’15). 1--10. Google Scholar
Cross Ref
- DSA. 2016. Data Science Association. Retrieved from http://www.datascienceassn.org/.Google Scholar
- DSAA. 2014. IEEE/ACM/ASA International Conference on Data Science and Advanced Analytics. Retrieved from www.dsaa.co.Google Scholar
- DSC. 2016a. College 8 University Data Science Degrees. Retrieved from http://datascience.community/colleges.Google Scholar
- DSC. 2016b. The Data Science Community. Retrieved from http://datasciencebe.com/.Google Scholar
- DSCentral. 2016. Data Science Central. Retrieved from http://www.datasciencecentral.com/.Google Scholar
- DSE. 2015. Data Science and Engineering. Retrieved from http://link.springer.com/journal/41019.Google Scholar
- DSJ. 2014. Data Science Journal.Retrieved from datascience.codata.org.Google Scholar
- DSKD. 2007. Data Science and Knowledge Discovery Lab, UTS. Retrieved from http://www.uts.edu.au/research-and-teaching/our-research/quantum-computation-and-intelligent-systems/data-sciences-and.Google Scholar
- David Ewing Duncan. 2009. Experimental Man: What One Man’s Body Reveals about His Future, Your Health, and Our Toxic World. Wiley 8 Sons, New York.Google Scholar
- Edx. 2016. EDX Courses. Retrieved from https://www.edx.org/course?search_query=data+science.Google Scholar
- EMC. 2011. Data science revealed: A data-driven glimpse into the burgeoning new field. Retrieved from www.emc.com/collateral/about/news/emc-data-science-study-wp.pdf.Google Scholar
- EPJDS. 2012. EPJ Data Science. Retrieved from http://epjdatascience.springeropen.com/.Google Scholar
- EU. 2014. EU Towards a Thriving Data-Driven Economy. Retrieved from https://ec.europa.eu/digital-single-market/en/towards-thriving-data-driven-economy.Google Scholar
- EU-DSA. 2016. The European Data Science Academy. Retrieved from edsa-project.eu.Google Scholar
- EU-OD. 2016. The European Union Open Data Portal. Retrieved from https://open-data.europa.eu/.Google Scholar
- Facebook. 2016. Facebook Data. Retrieved from https://www.facebook.com/careers/teams/data/.Google Scholar
- James H. Faghmous and Vipin Kumar. 2014. A big data guide to understanding climate change: The case for theory-guided data science. Big Data 2, 3 (2014), 155--163. Google Scholar
Cross Ref
- Joshua Fairfielda and Hannah Shteina. 2014. Big data, big problems: Emerging issues in the ethics of data science and journalism. Journal of Mass Media Ethics 29, 1 (2014), 38--51. Google Scholar
Cross Ref
- Jack Faris, Evelyne Kolker, Alex Szalay, Leon Bradlow, Ewa Deelman, Wu Feng, Judy Qiu, Donna Russell, Elizabeth Stewart, and Eugene Kolker. 2011. Communication and data-intensive science in the beginning of the 21st century. A Journal of Integrative Biology 15, 4 (2011), 213--215.Google Scholar
Cross Ref
- Tom Fawcett. 2016. Mining the quantified self: Personal knowledge discovery as a challenge for data science. Big Data 3, 4 (2016), 249--266. Google Scholar
Cross Ref
- Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. 1996. From data mining to knowledge discovery in databases. AI Magazine 17, 3 (1996), 37--54.Google Scholar
Digital Library
- William Finzer. 2013. The data science education dilemma. Technology Innovations in Statistics Education 7, 2 (2013).Google Scholar
- Geoffrey Fox, Siddharth Maini, Howard Rosenbaum, and David J. Wild. 2015. Data science and online education. In Proceedings of the 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom’15). 582--587. Google Scholar
Digital Library
- Peter Fox and James Hendler. 2014. The science of data science. Big Data 2, 2 (2014), 68--70. Google Scholar
Cross Ref
- Molly Galetto. 2016. Top 50 Data Science Resources. Retrieved from http://www.ngdata.com/top-data-science-resources/?.Google Scholar
- GEO. 2016. Gene Expression Omnibus. Retrieved from http://www.ncbi.nlm.nih.gov/geo/.Google Scholar
- Deepak Ghodke. 2015. Bye Bye 2015: What lies ahead for BI. Retrieved from http://www.ciol.com/bye-bye-2015-what-lies-ahead-for-bi/.Google Scholar
- Github. 2016a. Data Science Colleges. Retrieved from https://github.com/ryanswanstrom/awesome-datascience-colleges.Google Scholar
- Github. 2016b. List of Recommender Systems. Retrieved from https://github.com/grahamjenson/list_of_recommender_systems.Google Scholar
- Michael Gold, Ryan McClarren, and Conor Gaughan. 2013. The lessons Oscar taught us: Data science and media 8 entertainment. Big Data 1, 2 (2013), 105--109. Google Scholar
Cross Ref
- Google. 2016a. Google Bigquery and Cloud Platform. Retrieved from https://cloud.google.com/bigquery/.Google Scholar
- Google. 2016b. Google Cloud Prediction API. Retrieved from https://cloud.google.com/prediction/docs/.Google Scholar
- Google. 2016c. Google Online Open Education. Retrieved from https://www.google.com/edu/openonline/.Google Scholar
- Google. 2016d. Google Trends. (2016). https://www.google.com.au/trends/explore#q=datalyticsz=Etc Retrieved on 14 Novermber 2016.Google Scholar
- Google. 2016e. Open Mobile Data. Retrieved from https://console.developers.google.com/storage/browser/openmobiledata_public/.Google Scholar
- Beijing Municipal Government. 2016. Beijing Big Data and Cloud Computing Development Action Plan. Retrieved from http://zhengwu.beijing.gov.cn/gh/dt/t1445533.htm.Google Scholar
- China Government. 2015. China Big Data. Retrieved from http://www.gov.cn/zhengce/content/2015-09/05/content_10137.htm.Google Scholar
- Matthew J. Graham. 2012. The art of data science. In Astrostatistics and Data Mining,Springer Series in Astrostatistics, Vol. 2. 47--59. Google Scholar
Cross Ref
- Jim Gray. 2007. eScience—A Transformed Scientific Method. Retrieved from http://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt.Google Scholar
- GTD. 2016. Global Terrorism Database. Retrieved from https://www.start.umd.edu/gtd/.Google Scholar
- Akash Gupta, Ahmet Cecen, Sharad Goyal, Amarendra K. Singh, and Surya R. Kalidindi. 2015. Structure-property linkages using a data science approach: Application to a non-metallic inclusion/steel composite system. Acta Materialia 91 (2015), 239--254. Google Scholar
Cross Ref
- David J. Hand. 2015. Statistics and computing: The genesis of data science. Statistics and Computing 25, 4 (2015), 705--711. Google Scholar
Digital Library
- Hardin. 2016. Github. Retrieved from hardin47.github.io/DataSciStatsMaterials/.Google Scholar
- Johanna Hardin, Roger Hoerl, Nicholas J. Horton, and Deborah Nolan. 2015. Data science in statistics curricula: Preparing students to “Think with Data”. The American Statistician 69, 4 (2015), 343--353. Google Scholar
Cross Ref
- Harlan Harris, Sean Murphy, and Marck Vaisman. 2013. Analyzing the Analyzers: An Introspective Survey of Data Scientists and Their Work. O’Reilly Media. Google Scholar
Digital Library
- Benjamin T. Hazena, Christopher A. Booneb, Jeremy D. Ezellc, and L. Allison Jones-Farmer. 2014. Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. International Journal of Production Economics 154 (2014), 72--80. Google Scholar
Cross Ref
- Tony Hey, Stewart Tansley, and Kristin Tolle (Eds.). 2009. The Fourth Paradigm: Data-Intensive Scientific Discovery. Retrieved from http://research.microsoft.com/en-us/collaboration/fourthparadigm/.Google Scholar
- Tony Hey and Anne Trefethen. 2003. The Data Deluge: An e-Science Perspective. John Wiley 8 Sons, Ltd, 809--824.Google Scholar
- HLSG. 2010. Final report of the high level expert group on scientific data. http://ec.europa.eu/information_society/newsroom/cf/document.cfm?action=display8doc_id=707.Google Scholar
- HLSG. 2014. An RDA Europe Report. Retrieved from http://www.e-nformation.ro/wp-content/uploads/2014/12/TheDataHarvestReport_-Final.pdf.Google Scholar
- Horizon. 2014. European Commission Horizon 2020 Big Data Private Public Partnership. Retrieved from http://ec.europa.eu/programmes/horizon2020/en/h2020-section/information-and-communication-technologies.Google Scholar
- Peter J. Huber. 2011. Data Analysis: What Can Be Learned From the Past 50 Years. John Wiley 8 Sons. Google Scholar
Digital Library
- IASC. 1977. International Association for Statistical Computing. (1977). http://www.iasc-isi.org/.Google Scholar
- IBM. 2010. Capitalizing on Complexity. Retrieved from http://www-935.ibm.com/services/us/ceo/ceostudy2010/multimedia.html.Google Scholar
- IBM. 2016a. IBM Analytics and Big Data. Retrieved from http://www.ibm.com/analytics/us/en/orhttp://www-01.ibm.com/software/data/bigdata/.Google Scholar
- IBM. 2016b. What is a Data Scientist? Retrieved from http://www-01.ibm.com/software/data/infosphere/data-scientist/.Google Scholar
- IDA. 2014. International Institute of Data 8 Analytics. Retrieved from www.datasciences.org.Google Scholar
- IEEEBD. 2014. IEEE Big Data Initiative. (2014). http://bigdata.ieee.org/.Google Scholar
- IFSC-96. 1996. Data Science, Classification, and Related Methods. Retrieved from http://d-nb.info/955715512/04.Google Scholar
- IJDS. 2016. International Journal of Data Science. (2016). http://www.inderscience.com/jhome.php?jcode=ijds.Google Scholar
- IJRDS. 2017. International Journal of Research on Data Science. Retrieved from http://www.sciencepublishinggroup.com/journal/index?journalid=310.Google Scholar
- INFORMS. 2014. Candidate Handbook. Retrieved from https://www.informs.org/Certification-Continuing-Ed/Analytics-Certificati on/Candidate-Handbook.Google Scholar
- INFORMS. 2016. Institute for Operations Research and the Management Sciences. Retrieved from https://www.informs.org/.Google Scholar
- Shuichi Iwata. 2008. Scientific “agenda” of data science. Data Science Journal 7, 5 (2008), 54--56. Google Scholar
Cross Ref
- H. V. Jagadish, Johannes Gehrke, Alexandros Labrinidis, Yannis Papakonstantinou, Jignesh M. Patel, Raghu Ramakrishnan, and Cyrus Shahabi. 2014. Big data and its technical challenges. Communications of the ACM 57, 7 (2014), 86--94. Google Scholar
Digital Library
- H. V. Jagadish. 2015. Big data and science: Myths and reality. Big Data Research 2, 2 (2015), 49--52. Google Scholar
Digital Library
- JDS. 2002. Journal of Data Science. Retrieved from http://www.jds-online.com/.Google Scholar
- JDSA. 2015. International Journal of Data Science and Analytics (JDSA). Retrieved from http://www.springer.com/41060.Google Scholar
- JFDS. 2016. The Journal of Finance and Data Science. Retrieved from http://www.keaipublishing.com/en/journals/the-journal-of-finance-and-data-science/.Google Scholar
- Kaggle. 2016. Kaggle Competition Data. Retrieved from https://www.kaggle.com/competitions.Google Scholar
- Surya R. Kalidindi. 2015. Data science and cyberinfrastructure: Critical enablers for accelerated development of hierarchical materials. International Materials Reviews 60, 3 (2015), 150--168. Google Scholar
Cross Ref
- KDD89. 1989. IJCAI-89 Workshop on Knowledge Discovery in Databases. Retrieved from http://www.kdnuggets.com/meetings/kdd89/index.html.Google Scholar
- KDnuggets. 2015. Visualization Software. Retrieved from http://www.kdnuggets.com/software/visualization.html.Google Scholar
- Kdnuggets. 2016. Kdnuggets. Retrieved from http://www.kdnuggets.com/.Google Scholar
- K Kelly. 2012. The quantified century. In Quantified Self Conference. Retrieved from http://quantifiedself.com/conference/Palo-Alto-2012.Google Scholar
- Nawsher Khan, Ibrar Yaqoob, Ibrahim Abaker Targio Hashem, and et al. 2014. Big data: Survey, technologies, opportunities, and challenges. The Scientific World Journal 2014 (2014), 18. Google Scholar
Cross Ref
- John King and Roger Magoulas. 2015. 2015 Data Science Salary Survey. Retrieved from http://duu86o6n09pv.cloudfront.net/reports/2015-data-science-salary-survey.pdf.Google Scholar
- Ron Kohavi, Neal J. Rothleder, and Evangelos Simoudis. 2002. Emerging trends in business analytics. Communications of the ACM 45, 8 (2002), 45--48. Google Scholar
Digital Library
- AMP Lab. 2016. MLBase. Retrieved from http://mlbase.org/.Google Scholar
- Alexandros Labrinidis and H. V. Jagadish. 2012. Challenges and opportunities with big data. Proceedings of the VLDB Endowment 5, 12 (2012), 2032--2033. Google Scholar
Digital Library
- Douglas Laney. 2001. 3D Data Management: Controlling Data Volume, Velocity and Variety. Technical Report, META Group.Google Scholar
- David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. The parable of Google flu: Traps in big data analysis. Science 343 (2014), 1203--1205. Google Scholar
Cross Ref
- LDC. 2016. Linguistic Data Consortium. Retrieved from https://www.ldc.upenn.edu/about.Google Scholar
- LinkedIn. 2016. LinkedIn Jobs. Retrieved from https://www.linkedin.com/jobs/data-scientist-jobs.Google Scholar
- Mike Loukides. 2011. The Evolution of Data Products. O’Reilly, Cambridge.Google Scholar
- Mike Loukides. 2012. What is Data Science? O’Reilly Media, Sebastopol, CA. http://radar.oreilly.com/2010/06/what-is-data-science.htmldata-scientists.Google Scholar
- Andrea Manieri, Steve Brewer, Ruben Riestra, Yuri Demchenko, Matthias Hemmje, Tomasz Wiktorski, Tiziana Ferrari, and Jrmy Frey. 2015. Data science professional uncovered: How the EDISON project will contribute to a widely accepted profile for data scientists. In Proceedings of the 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom’15). 588--593. Google Scholar
Digital Library
- Kate Matsudaira. 2015. The science of managing data science. Communications of the ACM 58, 6 (2015), 44--47. Google Scholar
Digital Library
- McKinsey. 2011. Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute.Google Scholar
- Claire Cain Miller. 2013. Data science: The numbers of our lives. New York Times Retrieved from http://www.nytimes.com/2013/04/14/education/edlife/universities-offer-courses-in-a-hot-new-field-data-science.html?pagewanted=all8_r=0.Google Scholar
- Arthur John Havart Morrell (Ed.). 1968. Information processing. In Proceedings of IFIP Congress 1968. Edinburgh, UK.Google Scholar
- Peter Murray-Rust. 2007. Data-driven science: A scientist’s view. In NSF/JISC 2007 Digital Repositories Workshop. http://www.sis.pitt.edu/repwkshop/papers/murray.pdf.Google Scholar
- Peter Naur. 1968. ‘Datalogy’, the science of data and data processes. In Proceedings of IFIP Congress 1968, 1383--1387.Google Scholar
- Peter Naur. 1974. Concise Survey of Computer Methods. Studentlitteratur, Lund, Sweden.Google Scholar
- NCSU. 2007a. Institute for Advanced Analytics, North Carolina State University. Retrieved from http://analytics.ncsu.edu/.Google Scholar
- NCSU. 2007b. Master of Science in Analytics, Institute for Advanced Analytics, North Carolina State University. Retrieved from http://analytics.ncsu.edu/.Google Scholar
- Michael L. Nelson. 2009. Data-driven science: A new paradigm? EDUCAUSE Review 44, 4 (2009), 6--7.Google Scholar
- NICTA. 2016. National ICT Australia. Retrieved from https://www.nicta.com.au/.Google Scholar
- NIST. 2015. NIST Text Retrieval Conference Data. Retrieved from http://trec.nist.gov/data.html.Google Scholar
- NSB. 2005. Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century. Retrieved from http://www.nsf.gov/pubs/2005/nsb0540/.Google Scholar
- NSF. 2007. US NSF07-28. Retrieved from http://www.nsf.gov/pubs/2007/nsf0728/nsf0728.pdf.Google Scholar
- OECD. 2007. OECD Principles and Guidelines for Access to Research Data from Public Funding. Retrieved from https://www.oecd.org/sti/sci-tech/38500813.pdf.Google Scholar
- OPENedX. 2016. OPENedX Online Education Platform. Retrieved from https://open.edx.org/.Google Scholar
- Tim O’Reilly. 2005. What is Web 2.0. Retrieved from http://oreilly.com/pub/a/web2/archive/what-is-web-20.html?page=3.Google Scholar
- D. J. Patil. 2011. Building Data Science Teams. O’Reilly Media.Google Scholar
- Mark C. Paulk, Bill Curtis, Mary Beth Chrissis, and Charles V. Weber. 1993. Capability maturity model version 1.1. IEEE Software 10, 4 (1993), 18--27. Google Scholar
Digital Library
- Gil Press. 2013. A Very Short History of Data Science. Retrieved from http://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/61ae3ebb69fd.Google Scholar
- Xuesen Qian. 1991. Revisiting issues on open complex giant systems. International Journal of Pattern Recognition and Artificial Intelligence 4, 1 (1991), 5--8.Google Scholar
- Xuesen Qian, Jingyuan Yu, and Ruwei Dai. 1993. A new discipline of science—The study of open complex giant system and its methodology. Chinese Journal of Systems Engineering 8 Electronics. 4, 2 (1993), 2--12.Google Scholar
- RapidMiner. 2016. RapidMiner. (2016). https://rapidminer.com/.Google Scholar
- Samantha Renae. 2011. Data analytics: Crunching the future. Bloomberg Businessweek (2011). September 8.Google Scholar
- Solutions Review. 2016. Data Integration and Application Integration Solutions Directory. Retrieved from http://solutionsreview.com/data-integration/data-integration-solutions-directory/.Google Scholar
- C. Rudin, D. Dunson, R. Irizarry, H. Ji, E. Laber, J. Leek, T. McCormick, Sherri Rose, C. Schafer, M. van der Laan, L. Wasserman, and L. Xue. 2014. Discovery with Data: Leveraging Statistics with Computer Science to Transform Science and Society. Retrieved from http://www.amstat.org/policy/pdfs/BigDataStatisticsJune2014.pdf American Statistical Association.Google Scholar
- SAS. 2013. Big Data Analytics: An Assessment of Demand for Labour and Skills, 2012-2017. Retrieved from https://www.thetechpartnership.com/globalassets/pdfs/research-2014/bigdata_report_nov14.pdf Report. SAS/The Tech Partnership.Google Scholar
- SAS. 2016. SAS Retrieved from http://www.sas.com/en_us/insights.html.Google Scholar
- Tobias Schoenherr and Cheri Speier-Pero. 2015. Data science, predictive analytics, and big data in supply chain management: Current state and future potential. Journal of Business Logistics 36, 1 (2015), 120--132. Google Scholar
Cross Ref
- SIAM. 2016. SIAM career center. (2016). http://jobs.siam.org/home/.Google Scholar
- Christoph Siart, Simon Kopp, and Jochen Apel. 2015. The interface between data science, research assessment and science support—Highlights from the German perspective and examples from Heidelberg University. In Proceedings of the 2015 IIAI 4th International Congress on Advanced Applied Informatics (IIAI-AAI’15). 472--476. Google Scholar
Digital Library
- Silk. 2016. Data Science University Programs. Retrieved from http://data-science-university-programs.silk.co/.Google Scholar
- Larry Smarr. 2012. Quantifying your body: A how-to guide from a systems biology perspective. Biotechnology Journal 7, 8 (2012), 980--991. Google Scholar
Cross Ref
- F. Jack Smith. 2006. Data science as an academic discipline. Data Science Journal 5 (2006), 163--164. Google Scholar
Cross Ref
- SSDS. 2015. Springer Series in the Data Sciences. Retrieved from http://www.springer.com/series/13852.Google Scholar
- Stanford. 2014. Stanford Data Science Initiatives, Stanford University. Retrieved from https://sdsi.stanford.edu/.Google Scholar
- Thomas R. Stewart and Claude McMillan, Jr. 1987. Descriptive and prescriptive models for judgment and decision making: Implications for knowledge engineering. In Expert Judgment and Expert Systems, Jeryl L. Mumpower, Ortwin Renn, Lawrence D. Phillips, and V. R. R. Uppuluri (Eds.). Springer-Verlag, London, 305--320. Google Scholar
Digital Library
- Michael Stonebraker, Sam Madden, and Pradeep Dubey. 2013. Intel ‘big data’ science and technology center vision and execution plan. SIGMOD Record 42, 1 (2013), 44--49. Google Scholar
Digital Library
- Alma Swan and Sheridan Brown. 2008. The skills, role career structure of data scientists curators: Assessment of current practice future needs. (2008). Technical Report. University of Southampton.Google Scholar
- Melanie Swan. 2013. The quantified self: Fundamental disruption in big data science and biological discovery. Big Data 1, 2 (2013), 85--99. Google Scholar
Cross Ref
- Technavio. 2016. Top 10 Healthcare Data Analytics Companies. Retrieved from http://www.technavio.com/blog/top-10-healthcare-data-analytics-companies.Google Scholar
- TFDSAA. 2013. IEEE Task Force on Data Science and Advanced Analytics. Retrieved from http://dsaatf.dsaa.co/.Google Scholar
- TOBD. 2015. IEEE Transactions on Big Data. Retrieved from https://www.computer.org/web/tbd.Google Scholar
- Predictive Analytics Today. 2016. 29 Data Preparation Tools and Platforms. Retrieved from http://www.predictiveanalyticstoday.com/data-preparation-tools-and-platforms/.Google Scholar
- John W. Tukey. 1962. The future of data analysis. The Annals of Mathematical Statistics 33, 1 (1962), 1--67. Google Scholar
Cross Ref
- John W. Tukey. 1977. Exploratory Data Analysis. Pearson.Google Scholar
- Tutiempo. 2016. Global Climate Data. Retrieved from http://en.tutiempo.net/climate.Google Scholar
- UCI. 2016. UCI Machine Learning Repository. Retrieved from archive.ics.uci.edu/ml/.Google Scholar
- Udacity. 2016. Udacity Courses. Retrieved from https://www.udacity.com/courses/data-science.Google Scholar
- Udemy. 2016. Udemy Courses. Retrieved from https://www.udemy.com/courses/search/?ref=home8src=ukw8q=data+science8lang=en.Google Scholar
- UK. 2016. UK Big Data. Retrieved from http://www.rcuk.ac.uk/research/infrastructure/big-data/.Google Scholar
- UK-HM. 2012. UK HM Government. Retrieved from http://data.gov.uk/sites/default/files/Open_data_White_Paper.pdf.Google Scholar
- UK-OD. 2016. UK Open Data. Retrieved from http://data.gov.uk/.Google Scholar
- UMichi. 2015. Michigan Institute For Data Science, University of Michigan. Retrieved from http://midas.umich.edu/.Google Scholar
- UN. 2010. United Nation Global Pulse Projects. Retrieved from http://www.unglobalpulse.org/.Google Scholar
- US-OD. 2016. US Government Open Data. Retrieved from https://www.data.gov/.Google Scholar
- USD2D. 2016. US National Consortium for Data Science. Retrieved from data2discovery.org.Google Scholar
- USDSC. 2016. US Degree Programs in Analytics and Data Science. Retrieved from http://analytics.ncsu.edu/?page_id=4184.Google Scholar
- USNSF. 2012. US Big Data Research Initiative. Retrieved from http://www.nsf.gov/cise/news/bigdata.jsp.Google Scholar
- UTS. 2011. Master of Analytics (Research) and Doctor of Philosophy Thesis: Analytics, Advanced Analytics Institute, University of Technology Sydney. Retrieved from http://www.uts.edu.au/research-and-teaching/our-research/advanced-analytics-institute/education-and-research-opportuniti-1.Google Scholar
- UTSAAI. 2011. Advanced Analytics Institute, University of Technology Sydney. Retrieved from https://analytics.uts.edu.au/.Google Scholar
- David van Dyk, Montse Fuentes, Michael I. Jordan, Michael Newton, Bonnie K. Ray, Duncan Temple Lang, and Hadley Wickham. 2015. ASA Statement on the Role of Statistics in Data Science. Retrieved from http://magazine.amstat.org/blog/2015/10/01/asa-statement-on-the-role-of-statistics-in-data-science/.Google Scholar
- Vast. 2016. Visual Analytics Community. Retrieved from http://vacommunity.org/HomePage.Google Scholar
- Dan Vesset, Benjamin Woo, Henry D. Morris, Richard L. Villars, Gard Little, Jean S. Bozman, Lucinda Borovick, Carl W. Olofson, Susan Feldman, Steve Conway, Matthew Eastwood, and Natalya Yezhkova. 2012. Worldwide Big Data Technology and Services 2012-2015 Forecast. IDC.Google Scholar
- Ana Viseu and Lucy Suchman. 2010. Wearable Augmentations: Imaginaries of the Informed Body. Berghahn Books, New York, 161--184.Google Scholar
- Whitehouse. 2015. The White House Names Dr. D. J. Patil as the First U.S. Chief Data Scientist. Retrieved from https://www.whitehouse.gov/blog/2015/02/18/white-house-names-dr-dj-patil-first-us-chief-data-scientist.Google Scholar
- Wikipedia. 2016a. Comparison of Cluster Software. Retrieved from https://en.wikipedia.org/wiki/Comparison_of_cluster_software.Google Scholar
- Wikipedia. 2016b. Informatics. (2016). https://en.wikipedia.org/wiki/Informatics.Google Scholar
- Wikipedia. 2016c. List of Reporting Software. Retrieved from https://en.wikipedia.org/wiki/List_of_reporting_software.Google Scholar
- WIRED. 2014. How Europe can Seize the Starring Role in Big Data. Retrieved from www.wired.com/insights/2014/09/europe-big-data/.Google Scholar
- Gary Wolf. 2012. The data-driven life. New York Times. Retrieved from www.nytimes.com/2010/05/02/magazine/02self-measurement-t.html.Google Scholar
- Jeff Wu. 1997. Statistics = Data Science? Retrieved from http://www2.isye.gatech.edu/∼jeffwu/presentations/datascience.pdf.Google Scholar
- Yahoo. 2016. Yahoo Finance. Retrieved from finance.yahoo.com.Google Scholar
- Nathan Yau. 2009. Rise of the Data Scientist. Retrieved from http://flowingdata.com/2009/06/04/rise-of-the-data-scientist/.Google Scholar
- Chris Yiu. 2012. The Big Data Opportunity. Retrieved from http://www.policyexchange.org.uk/images/publications/thepportunity.pdf.Google Scholar
- Bin Yu. 2014. IMS presidential address: Let us own data science. IMS Bulletin Online (2014). Oct. 1, 2014.Google Scholar
Index Terms
- Data Science: A Comprehensive Overview
Recommendations
Big data and data science: what should we teach?
The era of big data has arrived. Big data bring us the data-driven paradigm and enlighten us to challenge new classes of problems we were not able to solve in the past. We are beginning to see the impacts of big data in every aspect of our lives and ...
A Brief Survey on Big Data in Healthcare
This article presents a brief introduction to big data and big data analytics and also their roles in the healthcare system. A definite range of scientific researches about big data analytics in the healthcare system have been reviewed. The definition ...
Preparing the next generation for the cognitive era
NFAIS 2016 Annual Conference: Data sparks discovery of tomorrow’s global knowledgeAfter decades of data scarcity, we are finally arriving in the era of data abundance. Cognitive systems such as IBM’s Watson, a cloud-based cognition service platform, can help us utilize all the data available to make better decisions in business, ...





Comments