Sean Kross

Visualize Data Analysis Pipelines with Tidy Data Tutor

The data frame is one of the most important and fundamental data structures in R. It is no coincidence that one of the leading domain specific languages in R, the Tidyverse, is designed to center the transformation and manipulation of data frames. A key abstraction of the Tidyverse is the use of individual functions that make a change to a data frame, coupled with a pipe operator, which allows people to write sophisticated yet modular data processing pipelines. However within these pipelines it is not always intuitively clear how each operation is changing the underlying data frame, especially as pipelines become long and complex. To explain each step in a pipeline data science instructors resort to hand-drawing diagrams or making presentation slides to illustrate the semantics of operations such as filtering, sorting, reshaping, pivoting, grouping, and joining. These diagrams are time-consuming to create and do not synchronize with real code or data that students are learning about. In this talk I will introduce Tidy Data Tutor, a step-by-step visual representation engine of data frame transformations that can help instructors to explain these operations. Tidy Data Tutor illustrates the row, column, and cell-wise relationships between an operation’s input and output data frames. We hope the Tidy Data Tutor project can augment data science education by providing an interactive and dynamic visualization tool that streamlines the explanation of data frame operations and fosters a deeper understanding of Tidyverse concepts for students.



Sean Kross headshot
Pronouns: he/him
Seattle, WA, USA
Sean Kross, PhD is a Staff Scientist at the Fred Hutch Data Science Lab. His work is focused on understanding data science as a practice, building a better developer experience for data scientists, and creating better outcomes in digital education. He approaches these challenges with computational, statistical, ethnographic, and design-driven methods.