DATA SCIENCE
1 Introduction
How can we effectively and efficiently teach data science to students with little to no background in computing and statistical thinking? How can we equip them with the skills and tools for reasoning with various types of data and leave them wanting to learn more? This article describes an introductory data science course that is our (working) answer to these questions.
At its core, the course focuses on data acquisition and wrangling, exploratory data analysis, data visualization, inference, modeling, and effective communication of results.
ABSTRACT
The proliferation of vast quantities of available datasets that are large and complex in nature has challenged universities to keep up with the demand for graduates trained in both the statistical and the computational set of skills required to effectively plan, acquire, manage, analyze, and communicate the findings of such data. To keep up with this demand, attracting students early on to data science as well as providing them a solid foray into the field becomes increasingly important. We present a case study of an introductory undergraduate course in data science that is designed to address these needs. Offered at Duke University, this course has no prerequisites and serves a wide audience of aspiring statistics and data.
Recommended by LinkedIn
2 Background and Related Work
An exact characterization of what the field of data science is meant to encompass is still debated. However, in this article, we define data science as the “science of planning for, acquisition, management, analysis of, and inference from four of the most recent curriculum guidelines for undergraduate programs in data science, statistics, and computer science to assess how the case study course ranks up against them.
While the 2013 Computer Science Curricula of the Association for Computing Machinery (ACM) Sahami et al. do not mention suggestions for integrating data science into a computer science major, the 2019 report by the ACM Task Force on Data Science Education suggestions of core competencies a graduating data science student should leave with. Each competency corresponds to one of nine data science knowledge areas: computing fundamentals; data acquirement and governance; data management, storage, and retrieval; data privacy, security, and integrity; machine learning; data mining; big data; analysis and presentation; and professionalism. The report also suggests that a full data science curriculum should integrate courses in “calculus, discrete structures, probability theory, elementary statistics, advanced topics in statistics, and linear algebra.” We note, however, that this document was released as a draft at the time of writing this article.