Nikhita Damaraju

R-tificial intelligence: A guide to using R for ML

Regular talk, 9:55-10:10

If you've ever been told that “R is not for ML” or found the transition from data processing to ML algorithms cumbersome, this session is for you. The discussion begins by acknowledging the statistical foundations of R users in academia, often rooted in statistics classes. However, I will discuss the need for ML skills, emphasizing that ML algorithms can sometimes, offer superior solutions to real-world problems compared to traditional statistical approaches. Through compelling examples, such as predicting heart attacks, forecasting machine failures, and customer product recommendations, I will highlight scenarios where ML can outshine conventional statistical methods. The talk is structured into three main parts, each designed to equip participants with the knowledge and skills to harness the predictive modeling capabilities of R. The first part delves into the rich landscape of R packages tailored for ML tasks. Classification, regression, and clustering are explored within the broader categories of supervised and unsupervised learning. Participants will gain insights into selecting the right packages for specific tasks, fostering an understanding of the versatility R brings to ML endeavors. Moving to the second part, the discussion transitions to general best practices for data processing and preparation. Emphasis is placed on the importance of steps like making training and validation datasets, practical tips and techniques to optimize data for ML model training. The final part of the talk focuses on the evaluation of ML models using an example dataset. I will discuss the process of assessing model performance and making informed decisions based on the results. Additionally, I will offer suggestions for effective visualization techniques, enabling participants to communicate their findings in a clear and compelling manner to their teams. By the end of this session, participants will not only understand the seamless integration of data preprocessing, model training, and evaluation in R but also be equipped with practical knowledge to navigate the ML landscape using R packages.

Nikhita Damaraju
Pronouns: she/her
Seattle, WA
Nikhita is pursuing her PhD at the University of Washington's Institute for Public Health Genetics. Before this, she completed an MS in Biostatistics from Columbia University and a dual degree (BS-MS) from the Department of Biotechnology at the Indian Institute of Technology Madras. Over the past six years, through a mix of coursework, teaching, and research projects, Nikhita has developed her experience in using various R packages for tasks like data wrangling, machine learning, bioinformatics, and data visualization. She is deeply interested in the intersection of Data Science and Biology within the realm of Public Health and aspires to contribute to the field of Precision Medicine in the future.