2021 Lightning Talks

Asha Yadav

Pronouns: she/her/hers
Eugene, OR

Session: Lightning Talks

Data-Informed Decision Making in Part C Early Intervention using Shiny R

Data usage and data-based decision-making are indispensable for improving the Part C Early Intervention (EI) services and achieving positive outcomes for children with disabilities and their families. The aggregated Part C data that states report to meet federal reporting requirements are collected and stored at the EI program level. Despite having a robust data systems framework for Part C EI and ECSE services, devised by Early Childhood Technical Assistance (ECTA) to guide states on data governance, management, and utilization, there are several unexamined assumptions and gaps at the EI program level. There is a lack of evidence in the literature on what data analytic tools are available to EI local programs and how they utilize data for informed decision-making to monitor and improve program practices and strategies to achieve positive outcomes for children and families. In this article, we present a case study of a local EI program where barriers restricting limited data use were removed by the use of open-source data science programming in R. The study used Part C county-level data to address the need for use of data for operational decision making, accountability, and monitoring at the EI program level. The data were explored using interactive data visualization and results were disseminated to stakeholders through an interactive, data-driven dashboard in Shiny R. Collaborative methodological approach was adopted for data analysis. Hence, through this case study, we discuss the challenges, learnings, and future directions to improve data utilization using open-source data science tools and their use at the Part C EI program level to provide high-quality services to young children with or at-risk of disabilities and their families.

Bio: Asha has extensive international experience in working with children with disabilities and their families from marginalized communities such as asylum seekers, single-parent homes, traveler communities, and children in foster care from diverse ethnicities, in particular, Asian, Middle Eastern, European, and North African, living in the UK, Europe, and India. Asha has represented families of disabled children on several advisory boards and forums including the National Network of Parent Carer Forum (England), and the Council for Disabled Children, UK. She has taken lead on the implementation of large-scale collaborative projects putting policy into practice such as the Every Disabled Child Matters agenda in the UK and EHCP (Education, Health, and Care Plan) under the Children and Families Act 2014 in England. Asha’s research interest focuses on understanding the influence of ecological factors on family relationships, and early social-emotional development within the context of a culture in Early Intervention and Early Childhood Special Education. She is currently pursuing a Ph.D. in Special Education and Clinical Sciences with a concentration in Quantitative Analysis, Data Science in R, and Implementation Science from the University of Oregon.

Dror Berel


Session: Lightning Talks

Been there, done that. Practices I adopted from R that will guide me through learning additional programming languages.

Adding another programming language for your analytical skill is an important tool set. However, after getting used to R unique dialect, it may be tricky to adapt a different one. It also does not mean you need to neglect the practices that have already proved to be useful in your day-to-day work.

In this talk I will share the painful-but-empowering process I go through by learning Python after years of practicing R. I will share the mindset that helped me to stretch out of my R comfort zone and list best practices that I adopted from R, that will guide me through my journey. Some examples are: use of list comprehensions and Hierarchical indexing (MultiIndex).

Bio: Dror Berel enjoys analyzing and exploring data, especially using his favorite open-source tool: R. With a background in both academia and industry, he was an early adopter of R and started using it nearly 20 years ago; he's thrilled to see how it's grown since then. He also likes to blog about R at https://drorberel.medium.com/

Edgar Zamora

Big Bend Community College, Moses Lake, WA

Session: Lightning Talks

How Did I Do That?: Redesigning A Workflow In R

Have you ever found yourself asking “What does this mean?” or “How did I do that?” when it comes time to reproduce your weekly, monthly, or annual report(s)? If so, taking a closer look at your current workflow may help identify areas that too time-consuming. With data science techniques in mind, this talk demonstrates how you can use R to redesign a workflow to become more efficient, consistent, and reproducible. Using features available in RStudio and R packages like {dbplyr}, {bookdown}, and {janitor} you will learn tools to apply to your current workflow.

Bio: Edgar Zamora is a Data Consultant at Big Bend Community College in Moses Lake, WA. His experience and love for R began while in graduate school at the University of Oklahoma where he explored American political behavior. In his current position, he uses R daily to provide data to faculty and staff to aid in building a strategic plan to help students succeed in college. Through his work, he strives to improve workflows, build R packages, and connect to databases to help support his work and other R users.

Erin Dahl

Pronouns: she/her
OHSU, Portland, OR

Session: Lightning Talks

Do you see what I see? Introducing microshades: An R package for improving color accessibility and organization of complex data

Approximately 300 million people in the world have Color Vision Deficiency (CVD), which is comparable to the most recent estimate of the US population. Individuals with CVD do not experience a complete loss of color vision, though the ability to distinguish different colors is reduced. When creating figures and graphics that use color, it is important to consider that people with CVD will interact with this material, and may not perceive all of the information tied to the colors correctly. There are multiple CVD friendly color palettes available on R to apply to visuals, however they are restricted to 8 different colors. When working with complex data, such as microbiome data, this is insufficient. To overcome this limitation, we developed the microshades R package to provide custom color shading palettes that improve accessibility and data organization. Each color palette contains six base colors with five incremental light to dark shades, for a total of 30 available colors per palette type that can be directly applied to any plot. This package includes two crafted color palettes, microshades_cvd_palettes and microshades_palettes. The microshades_cvd_palettes contain colors that are universally CVD friendly. The individual microshades_palettes are CVD friendly, but when used in conjunction with multiple microshades_palettes, are not universally accessible. In addition to color palettes, the microshades package contains functions to aid in data visualization including functions for creating stacked bar plots organized by a data-driven hierarchy. The microshades package can be used in conjunction with common microbiome R packages, such as phyloseq, to enhance microbiome data visualization. In the case of microbiome data, the base colors correspond with a higher order taxonomic group (e.g. phylum) and shades of the base color represent subgroups of the taxonomic group (e.g. genus). Subgroup shading is determined by the abundance in the dataset. Darker shades indicate the most abundant subgroup for each group, and lighter shades represent less abundant subgroups. To further assist users with data storytelling, we have functions to sort data both vertically and horizontally based on ranked abundance or user specification. The accessibility and advanced color organization features described help data reviewers and consumers notice visual patterns and trends easier. Examples of microshades in action are available on our website, for both microbiome and other datasets. https://karstenslab.github.io/microshades

Bio: Erin Dahl recently graduated with her B.S. in Bioinformatics from Pacific University and is now working as a Research Assistant in the Karstens Lab at OHSU. As a member of the Karstens’ Lab, she has worked hard to improve data visualization and is excited to share her work on the new microshades R package.

Howard Baek

Pronouns: he/his
University of Washington, Seattle, WA

Session: Lightning Talks

Feedback Report: A Web-Based Shiny Dashboard for displaying Patient-Reported Outcome Measures for Patients in Addiction Treatment

We have developed a web-based Shiny Dashboard, “Feedback Report”, for clinicians and patients in addiction treatment for clinical monitoring of patient progress and goals over time. Feedback Report imports data from a secure REDCap database containing patient-reported outcome measures reported longitudinally by patients on a weekly basis during their addiction treatment. It then generates an interactive dashboard containing plotly line graphs that illustrate changes in patient-reported progress and goal measures across multiple domains (e.g., drinking, craving, coping skills, depression). Feedback Report operates as an interactive dashboard where clinicians can review progress for multiple patients and can overlay progress on multiple domains within a single graph to see associations between different outcome domains over time. We designed Feedback Report to be easily modifiable so clinics can choose which measures can be administered and graphed, including custom measures selected or created by users. This is enabled by the R package purrr that we applied to iterate over the data and update the plots and table in real time. Also, the appearance of the graphs and table can be configured with a Google sheet that is connected to the dashboard with the R package googlesheets4. Layouts that are optimized for both computer and smartphone (using shinyMobile) are available.

Bio: My name is Howard Baek and I am a Statistics major with a Mathematics minor at the University of Washington. As a Research Assistant at the Behavioral Research In Technology and Engineering (BRiTE) Center, housed at the University of Washington's School of Medicine, I developed a Shiny Dashboard (“Feedback Report”) that allows patients and clinicians in addiction treatment to monitor patients’ progress and goals over time. This dashboard implements Plotly graphs that illustrate changes in patient-reported progress and goal measures over time. For optimal viewing on a smartphone, I programmed a shinyMobile application of the dashboard.

Jill Levine

Victoria, BC

Session: Lightning Talks

The Clio Package: Clean and Simple Historical Statistics

At Seshat: Global History Databank, we are interested in the studying how social and political organization of human societies and how civilizations have changed over time. We systematically collect what is currently known through secondary source research to construct of large datasets focusing on a large set of variables including those on governance, religion, infrastructure, agriculture and more. One of our projects, Consequences of Crisis, looks at how different societies from prehistory to the more recent times respond to periods of pressure and crisis—namely whether crisis periods result in major war, minor turbulence, or something in between. I serve as a researcher at Seshat and am working on data collection, cleaning, and analysis for the Consequences of Crisis project. In my work so far, I have found a number of useful open source outside datasets of historical statistics, from population, to military deaths, to gender equality. Some of these outset datasets are extremely messy and some need little or no cleaning before they can be used in analysis. One of the best dataset libraries is from Clio Infra (https://clio-infra.eu/), created by the Netherlands Organization for Scientific Research in 2010. These data are available as csv downloads, but downloading them individually can get cumbersome. A PhD student named Bas Michelson has created pretty incredible remote package that lets users easily download multiple Clio Infra datasets and filter for specific countries or time periods in a few simple steps. This talk will introduce aspects of the package, and give examples of simple visualizations that could be done with Clio Infra data. This talk is appropriate for R beginners and anyone else who is interested!

Bio: Jill Levine is a Research Assistant and Project Manager at Seshat: Global History Databank, and a recent MA graduate in History from the University of Victoria. She is interested in the collection, cleaning, management, and visualization of historical data. She lives in Victoria, BC with several housemates and several dogs.

Katie Jolly

Pronouns: she/her
Seattle, WA

Session: Lightning Talks

Designing graphics to post online: What I’ve learned from the (sometimes helpful) comments on my maps from Twitter and Reddit

I enjoy designing maps and other data visualizations. It’s something that takes a lot of practice and I’ve found that I can learn a lot from posting mine online. As data visualization designers, it can be easy to forget what it’s like to see a dataset for the first time. Social media gives us a great way to collect that kind of feedback from a lot of people at once. One of the great things about social media is that I can get feedback from domain experts, interested hobbyists, designers, and other people who just happen to see my post all at the same time. I use my Twitter account to share work I’ve done or things I have in progress. Fairly reliably– and especially if prompted– people will respond with comments. In this talk I will discuss some of the lessons I’ve learned from these comments: creating better legends, choosing colors, and using text to add more information, for example. People listening to this talk will learn strategies for evaluating and learning from their audience, whether they’re creating graphics for the first time or have been publishing a blog for years. I think the ideal audience is beginners to intermediate, but anyone can walk away with new ideas for their work.

Bio: Katie works as a data analyst at Ookla. She enjoys working on data visualization, particularly map design.

Keisha R Harrison

Oregon State University, Corvallis, OR

Session: Lightning Talks

Defining Complex Microbial Systems within Kombucha Fermentations Using METACODER and PYHLOSEQ Libraries

Kombucha, an increasingly popular beverage in North America, is made from the fermentation of sweetened tea and an inoculum of bacteria and yeast, commonly referred to as a symbiotic culture of bacteria and yeast (SCOBY) (Villarreal‐Soto et al., 2018). To better characterize the Kombucha SCOBY diversity, metagenomic approaches were used to evaluate microbial spatial homogeneity within a commercial SCOBY and taxonomic diversity across a large sample population (n= 104) of SCOBY used by Kombucha brewers. Data obtained from the metabarcoding of the 16S and ITS ribosomal RNA genes was analyzed using the microbiome profiler Phyloseq (McMurdie & Holmes, 2013) to show that the global microbial community is dominated by Brettanomyces bruxellensis (Br.), Br. anamolus, Komagateibacter xylinus (Kom.), and Kom. rhateicus. Subsequently K-means clustering and the visualization package MetaCoder (Foster & Sharpton, 2017) were used to delineate the population into four SCOBY types and describe significant microbial structural differences between the clusters designations. Cluster communities that lacked the dominant genera of bacteria and yeast presented a higher abundance of a combination of Zygosaccharomyces, Starmerella, Lachancea and Lactobacillacae, suggesting a compensatory pattern of co-occurrence. Combining these two R packages was a novel approach to identifying Kombucha starter culture types from a global population.

Bio: I'm a PhD candidate at Oregon State University in Food Science and Technology with a focus on Microbial Ecology and Microbiome Data Analysis. Her work involves the study of cross kingdom microbial interactions in fermentation systems, specifically using Kombucha as a model system. Currently, she is using R microbiome packages, including Phyloseq, to process large sequence information from 16S and ITS amplification datasets.

Kevin Floyd

Simon Fraser University Alumni

Session: Lightning Talks

March Sadness: Building a Hands-on College Basketball Simulator in R

In March 2020, the sports world shut down at the onset of the COVID-19 pandemic. Among the casualties was the 2020 NCAA men's basketball tournament, canceled a week before it was scheduled to begin. In lieu of actual “March Madness”, I created a college basketball simulator in R based on player statistics, team trends, and, for added randomness, a single random seed. Repeating this pseudo-stochastic model of possessions in a basketball game, including some strategic decisions when necessary, I simulated the 2020 NCAA tournament that never was and the actual 2021 NCAA tournament. The inherent randomness in the simulations produced exciting results, some of which proved to be prescient, others wildly inaccurate. Reporting on the simulated games over social media as if they were real proved to be a fun diversion for myself and my inner circle during the uncertainty of the early pandemic. This simulator doesn't solve the world's problems, but it is an illustration of the versatility of the R language and the types of programs one can create with it.

Bio: Kevin Floyd is a data analyst for ICON Health and Fitness and a COVID-19 data specialist for the Bear River Health Department, both in Logan, Utah. He got his MSc. in Statistics from Simon Fraser University in 2019, supervised by Dr. Tom Loughin, and his B.S. in statistics from Utah State University in 2016. Kevin's academic and extracurricular statistical work focuses on applications in sports, particularly men's and women's college basketball.

Reiko Okamoto

Pronouns: she/her
Government of British Columbia

Session: Lightning Talks

Bringing analysts and technical writers together through internal packages

Restructuring frequently used code snippets into well-organized internal packages is easier said than done. In this presentation, we share useful tools for package development and our approach to planning a package before writing the first line of code. We will discuss how to overcome challenges in this pre-coding phase, which is not as well documented as the technical aspects of the package development process. How do you find out what is it exactly that your team needs? How do you encourage team members to adopt the package and spread R culture? We answer these questions through a case study reflecting on our team’s experience creating a ggplot2 extension to meet the organization’s style guide for healthcare utilization reports. This package allows our analysts to focus on identifying trends in the data to share with the writing team instead of modifying their plots to get their look and feel right each time. Furthermore, this package has fostered an environment of collaboration by acting as a bridge between the analysts and technical writers in our organization.

Bio: Reiko is a data analyst with the Government of British Columbia. In her role, she creates various tools using R to support her team's work in monitoring trends in population health and health inequalities in the province. When her computer is turned off, she splits her time as a recreational bassist, aspiring Star Baker and occasional trainspotter.

Sean Kross

Pronouns: he/him/his
The University of California, San Diego, CA

Session: Lightning Talks

Create a Personal Website in 5 Minutes with Postcards

Modern professional identities are scattered all over the web, split between several profiles, repositories, and applications. Professional networking sites aim to solve this issue, however they often try to lock their users’ data into their ecosystem while also monetizing that data through methods that can be problematic. In a reaction against social and professional networking sites, people often create and host their own websites. However, creating a personal website comes with other costs, including the need to learn and maintain new technologies. To address these issues, the Postcards package is designed to allow people who are already comfortable with R Markdown to create a personal or project website from a single R Markdown document. In this lightning talk I will demonstrate how to create and deploy a Postcards website for free in less than five minutes.

Bio: Sean Kross is a graduate student at the University of California San Diego where he studies human computer interaction, data science as a practice, and digital education. His interests are centered around understanding challenges that people doing data science face in the real world, expanding online educational opportunities to new audiences, and building tools to make the future of work and learning possible. Sean is also a frequent consultant for data analysis and software development projects, in addition to being an advocate for open data and a maintainer of several popular open source software repositories.

Ted Laderas

Pronouns: he/him

Session: Lightning Talks

Using gRatitude to learn the tidyverse together

Fostering a supportive culture in the classroom is not just about instructor behavior. Students need to be supportive of each other as well. In this talk, I talk about a function of the week assignment in our R Programming course that helped students support each other in their learning. Of a list of lesser known tidyverse functions deemed highly useful by R users, each student was tasked with learning about a function and presenting an example of how to use it to the other students. Throughout the presentations, fellow students were supportive of each other and grateful for the lessons learned. In the end, the assignment helped students learn more tidyverse, and each student generated a document that was shared online. I will end the talk with discussing how to use this assignment in other programming courses. Function presentations can be viewed here: https://sph-r-programming.netlify.app/functions/

Bio: I teach R and Data Science to a variety of audiences, including clinicians, statisticians, and basic scientists. I'm a co-founder of Cascadia-R, and I try to make learning less lonely through communities of practice.