Introducing Data Science Techniques by Connecting Database Concepts and dplyr

Research output: Contribution to journalArticle

Abstract

Early exposure to data science skills, such as relational databases, is essential for students in statistics as well as many other disciplines in an increasingly data driven society. The goal of the presented pedagogy is to introduce undergraduate students to fundamental database concepts and to illuminate the connection between these database concepts and the functionality provided by the dplyr package for R. Specifically, students are introduced to relational database concepts using visualizations that are specifically designed for students with no data science or computing background. These educational tools, which are freely available on the Web, engage students in the learning process through a dynamic presentation that gently introduces relational databases and how to ask questions of data stored in a relational database. The visualizations are specifically designed for self-study by students, including a formative self-assessment feature. Students are then assigned a corresponding statistics lesson to utilize statistical software in R within the dplyr framework and to emphasize the need for these database skills. This article describes a pilot experience of introducing this pedagogy into a calculus-based introductory statistics course for mathematics and statistics majors, and provides a brief evaluation of the student perspective of the experience. Supplementary materials for this article are available online.

Original languageEnglish (US)
JournalJournal of Statistics Education
DOIs
StateAccepted/In press - Jan 1 2019

Fingerprint

Relational Database
Statistics
Pedagogy
science
Visualization
student
statistics
Self-assessment
Statistical Software
Learning Process
Data-driven
visualization
Calculus
self-study
Concepts
Data base
Relational database
Computing
Evaluation
self-assessment

Keywords

  • Data science
  • Databases
  • Education
  • Teaching tool

ASJC Scopus subject areas

  • Statistics and Probability
  • Education
  • Statistics, Probability and Uncertainty

Cite this

@article{54d9b06c97c642169540422913892c72,
title = "Introducing Data Science Techniques by Connecting Database Concepts and dplyr",
abstract = "Early exposure to data science skills, such as relational databases, is essential for students in statistics as well as many other disciplines in an increasingly data driven society. The goal of the presented pedagogy is to introduce undergraduate students to fundamental database concepts and to illuminate the connection between these database concepts and the functionality provided by the dplyr package for R. Specifically, students are introduced to relational database concepts using visualizations that are specifically designed for students with no data science or computing background. These educational tools, which are freely available on the Web, engage students in the learning process through a dynamic presentation that gently introduces relational databases and how to ask questions of data stored in a relational database. The visualizations are specifically designed for self-study by students, including a formative self-assessment feature. Students are then assigned a corresponding statistics lesson to utilize statistical software in R within the dplyr framework and to emphasize the need for these database skills. This article describes a pilot experience of introducing this pedagogy into a calculus-based introductory statistics course for mathematics and statistics majors, and provides a brief evaluation of the student perspective of the experience. Supplementary materials for this article are available online.",
keywords = "Data science, Databases, Education, Teaching tool",
author = "Broatch, {Jennifer E.} and Suzanne Dietrich and Don Goelman",
year = "2019",
month = "1",
day = "1",
doi = "10.1080/10691898.2019.1647768",
language = "English (US)",
journal = "Journal of Statistics Education",
issn = "1069-1898",
publisher = "American Statistical Association",

}

TY - JOUR

T1 - Introducing Data Science Techniques by Connecting Database Concepts and dplyr

AU - Broatch, Jennifer E.

AU - Dietrich, Suzanne

AU - Goelman, Don

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Early exposure to data science skills, such as relational databases, is essential for students in statistics as well as many other disciplines in an increasingly data driven society. The goal of the presented pedagogy is to introduce undergraduate students to fundamental database concepts and to illuminate the connection between these database concepts and the functionality provided by the dplyr package for R. Specifically, students are introduced to relational database concepts using visualizations that are specifically designed for students with no data science or computing background. These educational tools, which are freely available on the Web, engage students in the learning process through a dynamic presentation that gently introduces relational databases and how to ask questions of data stored in a relational database. The visualizations are specifically designed for self-study by students, including a formative self-assessment feature. Students are then assigned a corresponding statistics lesson to utilize statistical software in R within the dplyr framework and to emphasize the need for these database skills. This article describes a pilot experience of introducing this pedagogy into a calculus-based introductory statistics course for mathematics and statistics majors, and provides a brief evaluation of the student perspective of the experience. Supplementary materials for this article are available online.

AB - Early exposure to data science skills, such as relational databases, is essential for students in statistics as well as many other disciplines in an increasingly data driven society. The goal of the presented pedagogy is to introduce undergraduate students to fundamental database concepts and to illuminate the connection between these database concepts and the functionality provided by the dplyr package for R. Specifically, students are introduced to relational database concepts using visualizations that are specifically designed for students with no data science or computing background. These educational tools, which are freely available on the Web, engage students in the learning process through a dynamic presentation that gently introduces relational databases and how to ask questions of data stored in a relational database. The visualizations are specifically designed for self-study by students, including a formative self-assessment feature. Students are then assigned a corresponding statistics lesson to utilize statistical software in R within the dplyr framework and to emphasize the need for these database skills. This article describes a pilot experience of introducing this pedagogy into a calculus-based introductory statistics course for mathematics and statistics majors, and provides a brief evaluation of the student perspective of the experience. Supplementary materials for this article are available online.

KW - Data science

KW - Databases

KW - Education

KW - Teaching tool

UR - http://www.scopus.com/inward/record.url?scp=85073829622&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073829622&partnerID=8YFLogxK

U2 - 10.1080/10691898.2019.1647768

DO - 10.1080/10691898.2019.1647768

M3 - Article

AN - SCOPUS:85073829622

JO - Journal of Statistics Education

JF - Journal of Statistics Education

SN - 1069-1898

ER -