Authentic Data Science Assessments in a Computer-Based Testing Environment

Katie Burak

University of British Columbia

Collaborators

Katie Burak
Assistant Professor of Teaching
Department of Statistics, UBC

Hedayat Zarkoob

Hedayat Zarkoob
Postdoctoral Research and Teaching Fellow
Department of Computer Science, UBC

Firas Moosvi

Firas Moosvi
Lecturer
Department of Computer Science, UBC

MDS Academic team

We would also like to acknowledge the academic team of UBC’s MDS program for their contributions to the courses that informed this project:

Varada Kolhatkar
Tiffany Timbers
Prajeet Bajpai
Daniel Chen
Gittu George

Payman Nickchi
Joel Östblom
Alexi Rodríguez-Arelis
Andy Tai

Motivation

Have you ever tried to assess a coding-based learning objective in an LMS platform like Canvas or Blackboard?

Limitations of Traditional Platforms

Lack of an IDE (e.g., RStudio, Jupyter Notebooks) forces students to code in unfamiliar environments
Inauthentic assessment of student skills
Inability to use testing frameworks
No immediate feedback for students
We need a way for students to use computing environments they’re comfortable with, while maintaining exam integrity…

Computer-Based Testing Facility

UBC’s Computer-Based Testing Facility (CBTF) is platform agnostic and helps instructors run digital assessments at scale

Source: https://cbtf.ubc.ca/

Why a Computer-Based Testing Environment?

The demand for applied statistics and data science education is growing and so is the need for effective assessment tools.
Authentic assessments should mirror real-world applications.
Providing access to proper coding environments and IDEs helps bridge the gap between academic assessments and real-world tools.
Helps address concerns about the irresponsible use of GenAI by students.
Promotes equity and inclusion by standardizing the computing environment, ensuring assessments measure understanding rather than access to personal hardware or resources.

PrairieLearn

Developed at the University of Illinois, PrairieLearn supports dynamic question banks with randomized variants (West, Herman & Zilles, 2015).
Enables the creation of realistic, code-based assessments.
Supports automated feedback for students.
Automatically evaluate coding functionality through test functions.
Promotes efficiency in exam creation by allowing for question banks that evolve and expand over time.

Asynchronous Assessment

Randomized asynchronous assessments provide several key benefits, including:

Increased flexibility for students
Scalable question banks that can be expanded and refined over time
Reduced instructional workload by eliminating the need to create new exams each year
Reduced time allocated to manually grading questions

Master of Data Science Program

We implemented asynchronous computer-based assessments in UBC’s Master of Data Science (MDS) program this year.
Leveraging data from the MDS program, our objectives are to explore the impact of asynchronous assessments facilitated by PrairieLearn on student learning and performance.

MDS Program Overview

MDS is structured as an intense 10-month cohort-based program.
Attracts a diverse cohort from various academic backgrounds.
Two semesters are divided into six “blocks” each consisting of four month-long courses covering topics in statistics, machine learning, computing and data science.
The program wraps up with a 2-month industry capstone project.
Data tracks students across multiple courses over time.

flowchart LR
    A[Block 1] --> B[Block 2]
    B --> C[Block 3]
    C --> D[Block 4]
    D --> E[Block 5]
    E --> F[Block 6]
    F --> G{Capstone}

Canvas –> PrairieLearn

Mapped existing exam questions to course learning objectives for clear organization and efficient assessment design.
Incorporated randomization in questions:
- Variable parameters for dynamic question content.
- Created multiple variants maintaining consistent difficulty levels.
Let’s take a look at a simple sample question!

The Data

Deidentified data of 150 students from the first two blocks of MDS.
2 PrairieLearn quizzes per course taken in a 5-day quiz window.
Looked at 6 courses over a 2-month span:
- Programming for Data Science
- Data Wrangling
- Descriptive Statistics and Probability for Data Science
- Algorithms and Data Structures
- Statistical Inference and Computation I
- Supervised Learning I

Results

Discussion

We found that performance was overall better for students who wrote their exams earlier in the quiz window.
Quiz performance remains fairly consistent over time, possibly suggesting that students are not attempting to “game” the system.
These results are in line with a study conducted at the University of Illinois using the PrairieLearn system titled “How Much Randomization is Needed to Deter Collaborative Cheating on Asynchronous Exams?”
Folks found that using randomized sets of three or four problems with varying parameters is an effective way to reduce collaborative cheating.
Over time, we aim to expand our question banks and introduce more layers of randomization.

Chen, B., West, M., & Zilles, C. (2018, June). How much randomization is needed to deter collaborative cheating on asynchronous exams?. In Proceedings of the fifth annual ACM conference on learning at scale (pp. 1-10).

Thank you! Questions?