David Smith

    Conference Master Of Ceremonies

    David is a Cloud Advocate for Microsoft, specializing in the topics of artificial intelligence and machine learning. Since 2009 he has been the editor of the Revolutions blog http://blog.revolutionanalytics.com where he writes regularly about applications of data science with a focus on R, and is also a founding member of the R Consortium. Follow David on Twitter as @revodavid.

    Raphael Gottardo

    Keynote Speaker

    The Role of Data Science in Translational Research at Fred Hutch

    With recent advances in biomedical technologies, researchers at the Hutch (and elsewhere) are generating datasets at a scale we couldn’t even envision five years ago. This is in addition to the amount of publicly available data that is growing at exponential pace, that could be mined for additional biological information. At the same time, recent advances in statistics, machine learning and artificial intelligence are revolutionizing the way we think about data, and the way we analyze data. In November 2018, the Fred Hutch launched the Translational Data Science Integrated Research Center (TDS-IRC), a cross-divisional, collaborative research effort that will enable the Hutch to leverage these recent advances — and spur future innovation — in large-scale biological experiments, computational methods and infrastructure. During this presentation I will give an overview of current data science efforts at the Hutch and discuss future directions using concrete examples from our work on infectious diseases, vaccines and cancer immunotherapy.

    Biography

    Dr. Gottardo is a pioneer in developing and applying statistical methods and software tools to distill actionable insights from large and complex biological data sets.In partnership with scientists and clinicians, he works to understand such diseases as cancer, HIV, malaria, and tuberculosis and inform the development of vaccines and treatments. He is a leader in forming interdisciplinary collaborations across the Hutch, as well as nationally and internationally, to address important research questions, particularly in the areas of vaccine research, human immunology, and immunotherapy. As director of the Translational Data Science Integrated Research Center, he fosters interaction between the Hutch’s experimental and clinical researchers and their computational and quantitative science colleagues with the goal of transforming patient care through data-driven research. Dr. Gottardo partners closely with the cancer immunotherapy program at Fred Hutch to improve treatments. For example, his team is harnessing cutting-edge computational methods to determine how cancers evade immunotherapy. He has made significant contributions to vaccine research and is the principal investigator of the Vaccine and Immunology Statistical Center of the Collaboration for AIDS Vaccine Discovery.

    Gabriela de Queiroz

    Endnote Speaker

    Gabriela de Queiroz is a Sr. Developer Advocate/Sr. Engineering & Data Science Manager at IBM where she leads the CODAIT Machine Learning Team. She works in different open source projects and is actively involved with several organizations to foster an inclusive community.

    She is the founder of R-Ladies, a worldwide organization for promoting diversity in the R community with more than 150 chapters in 45+ countries. She likes to mentor and shares her knowledge through mentorship programs, tutorials and talks.

Session Talks

    Kate Hertweck

    R We There Yet? Building Communities of Practice Around R and Topics in Biology

    R meetups and communities are thriving in cities around the world, but even identifying other R users at your workplace can be surprisingly difficult. While it's possible to develop expert-level R coding skills in isolation, it's much easier (and far more fun!) to improve your coding skills in cooperative communities of practice, encompassing users of various skills levels and working on different types of problems. What does it take to develop communities of practice at an institution or company? How do you assess what members of a community need or prefer? In this talk, I'll discuss my experiences supporting emerging communities of practice for coding skills at a large non-profit organization with many R users. I'll identify common impediments to community development, but also provide specific recommendations for facilitating and encouraging investment and cohesion in cooperative learning.

    Biography:

    Kate Hertweck is the bioinformatics training manager at Fred Hutchinson Cancer Research Center, where they develop and teach courses on reproducible computational methods to researchers. Kate's graduate training at University of Missouri in genomic evolution in monocotyledenous plants was followed by a postdoctoral fellowship at the National Evolutionary Synthesis Center (NESCent) at Duke University, where they fell in love with R and began working exclusively in computational biology and data science more broadly. Kate then spent four years as an assistant professor teaching bioinformatics, genomics, and plant taxonomy at the University of Texas at Tyler before deciding to focus more closely on training researchers. Kate has been involved in The Carpentries, a globally-distributed non-profit organization that teaches reproducible computational methods, since 2014, serving as a member in community governance since 2016. When not being an overenthusiastic instructor, Kate likes to spend their time doing fiber arts (knitting, crochet) and enjoying all things science fiction

    Robert Amezquita

    The Role of Data Science in Translational Cancer Research: From Desk, to Bench, to Bedside

    The guiding mission of Fred Hutch is the elimination cancer. Over the course of more than 40 years, we have been redefining what's possible in cancer research, and translating these findings into cures. Fred Hutch has been a pioneer of immunotherapies that have improved patient outcomes in ways we have never seen before. However, these novel therapies provide robust cures in only a subset of patients with specific types of cancer. Thus, there remain significant challenges in deciphering how to extend these cures to all cancers and all patients. To overcome these challenges, the Hutch has created the Translation Data Science Interdiscipinary Research Center (TDS IRC) to harness joint advances in statistics, computational science, and biology through tight-knit collaboration. Here, we will discuss how we infuse data science throughout the bench-to-bedside discovery cycle to fuel new research opportunities.

    Biography:

    Robert Amezquita is a Postdoctoral Fellow in the Immunotherapy Integrated Research Center at Fred Hutch under the mentorship of Raphael Gottardo. His current research focuses on utilizing computational approaches leveraging transcriptional and epigenomic profiling at single-cell resolution to understand how novel anti-cancer therapeutics - ranging from small molecule therapies to biologics such as CAR-T cells - affect immune response dynamics. In particular, his work aims to better understand the process of immune cell differentiation under the duress of cancer as a means to inform future immunotherapies. To accomplish this, Robert works collaboratively across the center with experimental collaborators, extensively utilizing R for data analysis. Recently, Robert and colleagues published "Orchestrating Single-Cell Analysis with Bioconductor" on bioRxiv, a review focusing on single-cell RNA-seq analysis methods based in R.

    Heather Nolis and
    Sai Jyotsna

    How To Talk So Engineers Will Listen: R in Production at T-Mobile

    When we hired the very first data scientist for the AI @ T-Mobile team, nobody in our organization used R. Less than 8 weeks later, we had R models running in production environments. In lots of organizations, R isn't valued as a "real" programming language. At T-Mobile by sitting data scientists and engineers together we have cultivated a environment of mutual respect.

    In this talk, we will walk through our typical engineering product development workflow at T-Mobile, where the heart of the product is a model created in R. We will also cover the differences in how engineers at data scientists work. By doing so, we hope to empower R users to consider themselves as engineers and convey the vocabulary necessary to make engineers stop and listen.

    Biographies:

    Heather Nolis

    Heather began her academic career receiving a dual degree in French and Neuroscience with intent to pursue a PhD in molecular neuropharmacology. Once she realized how heavily that field relied on software built by other people, she pivoted - deciding to make software herself. Over her time in graduate school for Computer Science at Seattle University and working as a software engineer at T-Mobile, she's developed significant strength in machine learning, cloud computing, and proof-of-concept product development.

    Sai Jyotsna

    Sai Jyotsna is a senior Software Engineer with extensive practice in computer science and technology engineering. She is currently working with Logic20/20 and part of the AI@T-Mobile team. She is passionate about using AI-ML in software products to augment customer service experiences.

    Bethany Yollin

    Creating Interactive GIS Applications with Shiny and Leaflet

    The Shiny web application framework by RStudio enables data scientists to create and share interactive data analytics. Since Shiny was introduced in 2012, countless open-source contributors have published packages that allow R developers to use Javascript libraries in their Shiny applications. One such Javascript library is leaflet, a platform for creating interactive maps. This talk will introduce how Shiny, leaflet and other geospatial mapping packages can work together to create beautiful web applications capable of providing information-rich visualizations. This talk will also briefly touch on "Dockerizing" and deploying Shiny applications in a High Availability production environment using GCP (Google Cloud Platform). Lastly, if time permits, there will be an opportunity to demo some applications that put all these concepts and technologies together.

    Biography:

    Bethany Yollin is a data scientist working in the transportation industry. With an educational background in geography and applied mathematics, she enjoys developing fun and informative web applications using Shiny. She lives in Seattle, Washington.

    Clara Yuan

    Surge Pricing: An Application of Segmented Regression in Marketplace Pricing

    Convoy engages in a two-sided marketplace, wherein we transact with both shippers and carriers. When shipper demand for shipment services increases quickly, the carrier supply reacts by raising prices. In order to limit Convoy’s exposure to an unexpectedly large gap between the price shippers pay us and the price we pay carriers, we need to understand at what shipment volume carriers will begin raising prices, and by how much. In this talk, I will describe how segmented regression is a natural fit for identifying the surge point - the point at which prices begin rising - and the surge premium - how much prices will rise by, as a function of volume.

    Biography:

    Clara Yuan is a data scientist at Convoy, where she works on economic research into the fundamental dynamics of Convoy's marketplace. She has a PhD in applied economics and a BS in operations research. She was introduced to R over 10 years ago and has never looked back.

    Edward P. Flinchem

    Bayesian NLP in R on Clinical Text: Predictions from Electronic Health Records

    Healthcare in the US follows a crisis first, response second pattern. Consequences include high costs and potentially avoidable human suffering. Alternatively, many envision healthcare adopting data-informed patterns, so as to predict poor outcomes and deliver care proactively, thereby mitigating future suffering, controlling costs, and providing better care for more people. The intervention side of that vision is a vast scope and not my topic, but I will demonstrate that the predictive aspect is practical today with concrete examples and simple, transparent models constructed and visualized in R and trained on text extracted from electronic health records (EHRs).

    The frontier of machine learning in healthcare is the analysis of unstructured text in EHRs. Naive Bayesian (NB) modeling is a core method for machine learning, well known in document classification, for example. I demonstrate the utility of NB models, with clinical text as input, to predict hospitalizations, emergency department usage, and mortality. I discuss simplicity and transparency as factors critical to gaining the support of stakeholders in adopting models.

    Biography:

    Ed Flinchem is a Principal Data Scientist with the Davita Medical Group, a major provider of primary and specialty care in six states. Ed's focus is on developing predictive machine learning models to optimize value based care delivery with an emphasis on forecasting high risk, high complexity outcomes. Ed leverages both the structured and unstructured data (free text) of electronic health records, in the service of providing better healthcare, for more people, at lower cost. Ed has served the Davita Medical Group and its subsidiary, The Everett Clinic, since 2017.

    Over his 27 year career, Ed has served in industry, government, academic, and startup roles. As Chief Data Scientist at TurboPatent, Ed applied machine learning to the text of patent applications to predict rejections by the US Patent Office. Ed co-invented the predictive text input method, T9, a product based on machine learning and one of the most widely distributed pieces of software in history, used daily by billions of persons texting on their mobile phones. Prior to developing T9, Ed served in academic and government labs developing software to advance research and teaching in physical oceanography and geophysics, acquiring expertise in large scale data analysis, statistics, geographical information systems, satellite remote sensing, fluid dynamics, and digital signal processing.

    Ed earned his B.A. in physics at Brown University in 1985, followed by graduate study in physical oceanography and geophysics at the University of Washington. He has authored over 20 publications, including 5 journal articles and 15 patent applications.

    Eina Ooka

    Time Series Forcasting with Keras: LSTM vs ConvNN

    When we look for examples of Long Short Term Memory (LSTM), they usually concern natural language processing. Similarly, Convolutional Neural Networks (ConvNN) usually concern image processing. As such, most popular applications of deep learning are not time series forecasting. How can we then effectively apply these methods to time series forecasting? To answer this I have built hourly solar generation forecasts with different methods in Keras. As a practitioner in the power utility industry, I will talk about different deep learning architectures suitable for time series forecasting and how they compare to traditional statistical methods.

    Biography:

    Eina Ooka is a senior quantitative analyst at the Energy Authority. She develops multivariate stochastic forecasting models for electric power markets and utility portfolios, utilizing both statistical and data science methods. She develops production-level models in R, all the way from R&D to the shiny app deployment that can be utilized throughout the company for portfolio management.

    Michael Frasco

    Deploying Machine Learning in R with Amazon SageMaker

    Too often, data scientists build potentially high-impact models in R on their personal laptops that never see the light of day in production due to deployment obstacles. At Convoy, we leverage machine learning to manage hundreds of marketplaces each day, so building a robust and frictionless platform to deploy models is critical to our success. Convoy uses Amazon SageMaker to minimize the amount of code that data scientists need to write to go from researching and training a model locally in R to deploying the same model in production. As a result, our team has ownership over the end-to-end machine learning pipeline and can rapidly deliver impact on the Convoy product. In this talk, we'll discuss the central challenge of deploying machine learning in production, how the Plumber package allows us to serve our models in production, and the benefits that Amazon SageMaker provides in this project.

    Biography:

    Michael Frasco is a data scientist at Convoy, a private company in Seattle that operates a marketplace for trucking services. At Convoy, he builds internal tools that enable collaboration across the company and make other data scientists more productive. Michael received an MS in Statistics from the University of Chicago, where he fell in love with Bayesian statistics. In his free time, Michael enjoys watching basketball, exploring the pacific Northwest, and reading books from the library.

    Kevin Kuo

    The latest drops from the Tensorflow + R ecosystem

    We provide a quick overview of the R interface to the TensorFlow ecosystem and move on to recent developments, including support for TF 2.0. We introduce the tfprobability package which enables building probabilistic models and demonstrate some applications.

    Biography:

    Kevin is a software engineer at RStudio building open source packages for machine learning development and deployment.

    Bryan Mayer

    Reproducible Data Processing in Team Workflows with DataPackageR

    As data is cleaned and updated throughout a research project, it can be easy for each user to generate their own versions. With many instances of data, confusion often surfaces for naming conventions (i.e., date and initial suffixes) and could result in multiple "final" versions, potentially hindering the analysis work flow and making reproducibility difficult. In this talk, I will motivate and demonstrate the use of DataPackageR: an R package that transforms the data processing pipeline into a version-controlled data package. With data packages generated by DataPackageR, raw data remains immutable, read-only input that is processed into analysis datasets once the package is built. To maintain reproducibility, pre-specified user-generated R markdown scripts are rendered, simultaneously generating processed data and documentation vignettes. The data package version is also automatically updated with each build. The R dataset can then be consumed downstream, with the data version established, by the original author, teammates, or other researchers by simply installing and loading the data package. As part of the presentation, I will demonstrate real-life application where we package laboratory-generated HIV data for downstream clinical trial analysis.

    Biography:

    Bryan Mayer is a Staff Scientist at the Fred Hutchinson Cancer Research Center in Seattle currently working on a team of biostatisticians that analyzes pre-clinical HIV vaccine trials. The team relies heavily on R-developed tools emphasizing best coding practices, reproducible research, and open access to data. In additional to his current work in HIV vaccine research, Bryan has been using R for over a decade on a variety of statistical and epidemiological applications in infectious disease research.

    Javier Luraschi

    Cluster Computing Made Easy with Spark and R

    Have you ever found yourself waiting hours for R to finish analyzing data, running out of memory or spending hours fine tunning your model parameters? Are you interested in using large datasets in R but you are not sure where to start? Fear not, this talk will introduce you to the exciting world of cluster computer using Apache Spark with the ease of use of R.

    This talk will start by introducing techniques to make code faster and explain where Apache Spark fits. It will introduce Apache Spark and then the sparklyr package, which provides an interface to Apache Spark for R.

    You will learn how to install and use Spark from R with familiar packages like dplyr, DBI and broom. You will then learn how to make use of the modeling functionality available in Spark and advanced functions like processing graphs, using other machine learning frameworks, process real time data, run custom R code using new features introduced in sparklyr 1.0.0 and upcoming extension currently being developed.

    This talk should be of interest to new users that are unfamiliar to cluster computing, intermediate users that have used Apache Spark with R and advanced users interested in learning the latest best practices and features available in Spark with R.

    Biography:

    Javier is a software engineer with experience in technologies ranging from desktop, web, mobile and backend; to augmented reality and deep learning applications. He previously worked in Microsoft Research and SAP and holds a double degree in Mathematics and Software Engineering. Javier is the creator of sparklyr, r2d3, cloudml and other R packages.

    Gagandeep Singh

    Building Data Science Infrastructure at Enterprise Level

    Modern day organizations are spending considerable amount of resources in data science research and development and the need for having a dedicated data science department in house has increased exponentially. Companies are looking at external partners with dedicated competence and experience in this field to assist in building a comprehensive ‘one place all tools’ solution. We, as data science specialty consultants, have established partnerships with popular data science platforms providers like RStudio and Jupyterhub. We were brought in by a multinational leading biotechnology company to design, develop and deploy an integrated data science development platform for their team of over 100 data scientists. The ask was to build a comprehensive solution- where users can use either R and Python and to develop model and share results through a common platform.

    The biggest concern for us was to provide a solution which can handle multitude of user sessions, but also provide high performance computing and resources at the same time. Safe option was to build a high availability, load balanced environment, though it will create troubles in the future as numbers of users keep on increasing and resources need to be optimized. We decided to take a two-pronged approach, where a kubernetes backed containerized solution will be the primary interface and a load balanced product for backing up additional load. Users launch their own containers for each processing session and kubernetes takes care of backend resource allocations. They can run both Jupyter notebooks (through Python IDE) and R scripts(through R IDE) in the container and perform multiple assignments concurrently. The publishing platform provides a cohesive product to share results through shiny applications, HTML reports, or even python code. Connect is enabled to run both Python and R. It has also been configured to schedule reports to be sent as emails. We have also built R IDE in a load balanced High availability environment. Here R IDE’s internal load balancer works with AWS’s load balancer to accommodate backup and smooth operations in case any of the server(s) goes down. The publishing platform is also configured with high availability, which means multiple servers are simultaneously serving the user’s publishing needs by using a common database. We have also integrated a high availability Package Manager in the mix, which enables the administrator to establish control over package access and downloads. Users can also utilize Package Manager to access different versions of R packages. Our instance of Package Manager is also capable of serving internally developed packages by connecting to the original git source, which eliminates the need for additional administration.

    Biography:

    Gagandeep is a Senior Data Scientist at ProCogia, a Seattle based data science consulting firm. An avid user of R since his graduate school days, he has developed R based data science solutions at multiple Fortune 500 clients. He is a certified RStudio Administrative Professional and a regular contributor to the eastide Seattle useR meetup. Through his talk at Cascadia, Gagandeep wants to evangelize the strength of R and related products as a production ready enterprise solution.

Lightning Talks

    Brittany Barker

    Modeling in R to safeguard U.S. agricultural and natural resources from invasive pests

    A primary goal of the US Department of Agriculture (USDA) is to safeguard US agricultural and natural resources through early detection of exotic plant pests and weeds. We have developed a spatial modeling platform in R for the prediction of life cycle events (phenology) and climate suitability of invasive insect species in the continental US. This platform combines gridded weather/climate data with insect temperature response parameters to produce maps that can depict the potential range of a species and the timing of pest events (e.g., when adults will emerge and start laying eggs). Products from the model will be used to guide USDA supported trapping programs for at least 16 insect species. We plan to place a version of the model online, and to share the source code as open source software.

    Biography:

    I use R and other bioinformatic tools to study how animal and plant population dynamics are influenced by environmental changes resulting from climate, land use, fires, and invasive species. My current research focuses on supporting agriculture by developing climate driven models for several insect pests. I am currently a Research Associate with the Integrated Plant Protection Center in the College of Agricultural Sciences at Oregon State University. Previously, I worked as an Ecologist with the US Geological Survey in Boise, Idaho, and I completed a postdoctoral fellowship in the Ecology and Evolutionary Department at the University of Arizona

    Joseph Scheidt

    Improving Performance Metrics with R

    In many situations, we are limited in our choices of metrics to evaluate performance. Often, the metrics we do have are flawed, as they are impacted by factors which are not performance related. Using my development of Student Independent Performance as an example, I will show the steps to using linear or logistic regression models in R to improve performance metrics by adjusting for these factors.

    Student Independent Performance is a simple metric I created to better compare schools using standardized test scores. As a school's average test score is impacted heavily by the racial and class demographic of that school, I used logistic regression to filter out the variance in test scores caused by student demographics. What remains is a measure that compares the performance of schools on standardized tests independent of the demographics of their student bodies.

    The project is hosted at https://github.com/josephscheidt/sip

    Biography:

    Joe is an operations manager for a transportation company, where he specializes in developing metrics to track and improve company performance. He is also pursuing a Master in Data Science degree from Johns Hopkins University. He spends way too much of his spare time using R to explore baseball statistics and to be grumpy about the Mariners.

    Scott Came

    Analyzing Legislative Activity with R

    During the 2019 session of the Washington Legislature, I used R to curate a dataset of bills and roll call vote results (i.e., capturing how each Representative and Senator voted on each bill). This involved using httr and the tidyverse to harvest data from the Legislative Service Center's XML data feed, and munge the data into data frames suitable for analysis and visualization. I used the sf package and ggplot to create a cartogram of how legislators voted, and also performed analyses of caucus loyalty. I published results/visualizations periodically throughout the session on my Twitter timeline ( @scottcame ).

    This talk will provide a very quick walkthrough of the R code and a glimpse of some of the results.

    Biography:

    Scott Came is the principal consultant in Olympia-based Cascadia Analytics LLC, where he provides data engineering and data science consulting services to clients in the Northwest and nationally. His 25-year career has spanned software engineering, data science, and executive management, mostly in the public and non-profit sectors. He also enjoys analyzing and visualizing data to explore interesting topics in sport (baseball mostly), elections, politics, and education policy.

    Tiernan Martin

    DRAKE-AGE: Lessions Learned While Package-ing {drake}

    ROpenSci’s {drake} package is a workflow management toolkit that cuts downtime and rewards well-organized projects with reliable reproducibility. This talk shares the author’s experience of drowning in one project’s workflow complexity and finding a lifeline in the pairing of {drake} and R’s package framework. Talk content will include best practices, lessons learned (the hard way), and opportunities for improvement to this killer combination.

    • ROpenSci's {drake} https://ropensci.github.io/drake/
    • Speaker's {drakepkg} https://github.com/tiernanmartin/drakepkg (WIP)
    • Example of a project that so desperately needed {drake} to save the day:
      https://github.com/tiernanmartin/NeighborhoodChangeTypology

    Biography:

    Tiernan Martin is a data analyst and program manager at Futurewise, an urban planning policy organization based in Seattle, Washington. His research and analytical work spans a variety of topics including gentrification, affordable housing, and bicycle/pedestrian transportation.

    tw: @maynardandking

    Dror Berel

    Scope Creep and other Software design lessons learned the hard way...

    5 lessons on how to deal with scope creep, from a data-science / machine learning perspective.

    • Lesson #1: Begin at the end! Define what your scope is. Do you need to extend it?
    • Lesson #2: Do not reinvent the wheel! There are other experts that know how to do it better than you!
    • Lesson #3: Found a gap? Be creative, but keep it simple!
    • Lesson #4: Do not be afraid to refactor!
    • Lesson #5: go to lesson #1

    Couple of case studies will be demonstrated in the context of machine learning, and genomic data analysis.More details at my recent blog: https://medium.com/@drorberel/scope-creep-and-other-software-design-lessons-learned-the-hard-way-edacf021965b

    Biography:

    Dror Berel enjoys analyzing and exploring data, especially using his favorite open-source tool: R. With a background in statistics and biology, he was an early adopter of R and started using it nearly 18 years ago; he's thrilled to see how it's grown since then. With work experience in biostatistics cores at several medical research institutes, he gained experience in collecting and analyzing clinical and public-health data, communicating it with collaborators and publishing at peer-reviewed journals.

    Jacqueline Nolis

    Adding shine to Shiny: improving the look of your UI

    Shiny is a package that makes it incredibly easy to create a graphical user interface around R code and quickly share prototypes with colleagues. By default, Shiny apps end up all having the same grey and blue look - which is boring and often comes across as sloppy. From themes to editing CSS files and even creating your own website templates, there are so many ways to improve the styling and design of your shiny app. In this talk I'll walk through these methods, pros and pitfalls. By the end, you should be ready to add some pizzaz to your apps!

    Biography:

    Dr. Jacqueline Nolis is the Principal Data Scientist at Nolis, LLC. She has over a decade of experience in the data science industry, working with companies ranging from DSW and Union bank to T-Mobile and Airbnb. Her academic research covered optimization under uncertainty with a specialization in electric vehicle routing, which yielded a PhD in Industrial Engineering from ASU. Previously, Jacqueline was the Director of Insights and Analytics at Lenati and a Lead of Advanced Analytics at Promontory Financial Group.

    M. Edward (Ed) Borasky

    Archetypal Ballers and Ternary Plots - Evaluating Basketball Players via Unsupervised Learning

    Archetypal analysis is a dimensionality reduction technique that reduces an 18-dimensional box score dataset down to three dimensions by representing each player's skills as a linear combination of the skills of the extreme players - the best and the worst. Combined with ternary plot visualization, archetypal analysis offers both insight and quantitative evaluation of players and teams.

    This talk describes a new R package for archetypal analysis of men's and women's college and professional basketball players obtained from the web, with examples from the 2019 NCAA "March Madness" tournaments, the 2018-2018 NBA season and the 2018 WNBA season.

    Biography:

    M. Edward (Ed) Borasky is a retired scientific applications / operating systems programmer and open source aficionado, He has been using Linux and R for almost 20 years. He is a volunteer at Hack Oregon, where he builds transit operations databases and APIs.

    Mark Druffel

    Bootstrapping Business / Data Transformation with R

    Companies routinely lack the necessary information to make a confident decision - they instead rely on experience and anecdotes because supporting data and analytics aren’t readily available. Often these same companies hire teams of data analysts and data scientists to provide context to decision makers, but they do so without a strategy, infrastructure, IT support – much less refined requirements to execute against. This leads to analysts and data scientists executing full-stack work while trying to manage the expectations of their business partners, who rarely understand the complexities…

    Our team spent the last year applying a new framework to our internal operations to enable our leadership team to leverage data and analytics in their process. I will walk through this framework at a high-level and discuss how we were able use it as a tool to communicate to the business. Further, I’ll discuss how we were able to plug R into the process to bootstrap solutions that are scalable and lightweight.

    Biography:

    Mark Druffel is a Consultant at Propeller, a boutique consulting firm with offices in Portland, San Francisco, and Denver. As a consultant, Mark works within industry clients to enable their organizations to better leverage data for their businesses. Mark has worked in several industries and roles and has been an active member of the R community since 2015.

    Ryan Hafen

    Visualizing geo-temporal data in R - geofacts and geovis

    A common type of data encountered in data analysis is geo-temporal, where data is observed for different geographic regions at different points in time. In this talk I will introduce two R packages that help visualize this type of data, geofacet and geovis. The geofacet package provides a mechanism to easily arrange a grid of time series (or any other visualizations) according to the underlying geography. The geovis package produces geographical maps that allow interactive viewing of geographies at multiple resolutions (e.g. country, state, municipality) and over time. I will illustrate the use of the packages using data from 14 million birth records in Brazil.

    Biography:

    Ryan Hafen is a data scientist working as an independent consultant. He works on tools, methodology, and applications in exploratory analysis, visualization, computational statistics, statistical model building, and machine learning on large, complex datasets. Ryan is active in the data science open source community, mainly working on projects in R and JavaScript.