Modernizing the undergraduate regression analysis course

eCOTS 2022

Maria Tackett, Mine Γ‡etinkaya-Rundel, Rick Presman

Welcome

Introductions

Headshots of Maria Tackett, Mine Γ‡etinkaya-Rundel, and Rick Presman

Dr. Maria Tackett

Dr. Mine Γ‡etinkaya-Rundel

Rick Presman

Hex logo for workshop

One link for all materials

πŸ”— bit.ly/modern-regression

Session agenda

  • Background + motivation
  • Computing with tidymodels
  • Live demo + activity
  • Tips + putting it all together

Background + motivation

2014 ASA Curriculum Guidelines

β€œβ€¦concepts and approaches for working with complex data…and analyzing non-textbook data.”


β€œβ€¦students’ analyses should be undertaken in a well-documented and reproducible way”


β€œβ€¦construct effective visual displays and compelling written summaries” and β€œdemonstrate ability to collaborate in teams…”

Full 2014 ASA Curriculum Guidelines Report

Assessing final projects

  • Final group project throughout second half of the course

    • Use regression analysis to analyze a data set of their choice
    • Produce a written report and presentation
  • Noticed students had challenges…

    • Preparing the data for analysis

    • Effectively summarizing model results

    • Making analysis decisions

How can we help students better use their conceptual knowledge and skills to analyze real-world data?

Inspired by introductory courses

Innovations in introductory courses in line with recommendations in the 2016 GAISE report.

Goal: Develop learning experiences that continue cultivating these skills beyond the introductory course

STA 210: Regression Analysis

A course primarily on linear and logistic regression with a focus on application.

  • Students: 90+ students from a range of disciplines who have taken introductory statistics, data science, or probability

  • Class meetings: 2 lectures with in-class activities and 1 lab

  • Assessments: Labs, homework, exams, final group project

Modernizing the course

Facilitate opportunities for students to…

  • Regularly engage with real-world applications and complex data

  • Develop proficiency using professional statistical software and using a reproducible workflow

  • Develop important non-technical skills, specifically written communication and teamwork

  • Identify appropriate methods based on the primary analysis objective - inference or prediction

Remainder of session

  • Tidymodels overview and demonstration

  • Hands-on activity with tidymodels and writing exercises

    • Goal: Get a glimpse of the in-class student experience
  • Tips + putting it all together

Computing using tidymodels

What is tidymodels?

The tidymodels framework is a collection of packages for modeling and machine learning using tidyverse principles.

Hex logos for tidymodels and a few of the packages in tidymodels

Getting started with tidymodels

# intall.packages("tidymodels")
library(tidymodels)
── Attaching packages ────────────────────────────────────── tidymodels 0.2.0 ──
βœ” broom        0.8.0          βœ” recipes      0.2.0     
βœ” dials        0.1.1          βœ” rsample      0.1.1     
βœ” dplyr        1.0.9          βœ” tibble       3.1.7     
βœ” ggplot2      3.3.6          βœ” tidyr        1.2.0.9000
βœ” infer        1.0.0          βœ” tune         0.2.0     
βœ” modeldata    0.1.1          βœ” workflows    0.2.6     
βœ” parsnip      0.2.1          βœ” workflowsets 0.2.1     
βœ” purrr        0.3.4          βœ” yardstick    0.0.9     
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
βœ– purrr::discard() masks scales::discard()
βœ– dplyr::filter()  masks stats::filter()
βœ– dplyr::lag()     masks stats::lag()
βœ– recipes::step()  masks stats::step()
β€’ Dig deeper into tidy modeling with R at https://www.tmwr.org

Tidymodels flow

Overall model creation process
Testing
Data
Training
Model assessment
Final fit
Between model selection
Model and feature development
Model and feature development
Workflow
Model specification
Recipe / formula
Model / feature tuning
Within model selection

Live demo + activity

Live demo + activity

  • Go to bit.ly/modern-regression and click on the Activity link on the left.
  • Join the RStudio Cloud workspace linked on top of the activity document.
  • Follow along!

Tips + putting it all together

Complex data

  • Use authentic data sets that require some wrangling
    • Reinforces exploratory data analysis and that raw data is not always β€œready to go” without some preparation
    • Extent of data wrangling differs based on type of assignment and time to complete it
  • Avoid using too many examples with violations in assumptions or where regression is not useful to analyze the data

Resources for finding data

Teamwork

  • Teams of 3 or 4 students assigned based on
    • previous statistics and computing experience
    • major or academic interests
    • trying to give each student at least one potential point of connection with their teammates
  • Groups work together throughout the semester on weekly lab assignments and the final project

Teamwork

  • The first team assignment includes

    • Completing a team agreement

    • Coming up with a fun team name!

  • Teamwork is assessed based on contribution and collaboration
    • GitHub commit history on assignments to assess contribution
    • Periodic team feedback to assess collaboration

Putting it all together

Skills from the 2014 ASA Curriculum Guidelines

  • β€œβ€¦concepts and approaches for working with complex data …and analyzing non-textbook data.”
  • β€œβ€¦students’ analyses should be undertaken in a well-documented and reproducible way”
  • β€œβ€¦construct effective visual displays and compelling written summaries” and β€œdemonstrate ability to collaborate in teams…”

It is important that students not only develop these skills but also learn how to use them in practice.

Questions?