Joseph Scheidt

6/8/2019

Sometimes available data offer poor bases for comparison, where performance is clouded by non-relevant factors.

Examples:

- Equipment maintenance costs influenced by usage amount
- Employee product knowledge dependent on time with company

Goal is to remove variance caused by non-relevant factors, giving us a better way to judge performance

- Experience with statistics not required

- Few metrics other than standardized test scores available
- Standardized test scores heavily dependent on socioeconomic makeup of student body

- Using linear regression, we can remove variance due to racial and economic factors

```
#Create linear model using lm command
#lm(formula = my_metric ~ nonrelevant_factors, data = my_data)
model <- lm(formula = avg_test_score ~ white + asian + free_disc_lunch,
data = school_data)
#Check the results for p-values (R squared doesn't need to be high)
summary(model)
```

```
Call:
lm(formula = avg_test_score ~ white + asian + free_disc_lunch,
data = school_data)
Residuals:
Min 1Q Median 3Q Max
-0.72201 -0.09447 0.00962 0.11337 0.51785
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.60201 0.03333 18.063 <2e-16 ***
white 0.28315 0.03292 8.602 <2e-16 ***
asian 0.59982 0.06869 8.733 <2e-16 ***
free_disc_lunch -0.61315 0.02997 -20.461 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1663 on 1058 degrees of freedom
Multiple R-squared: 0.623, Adjusted R-squared: 0.6219
F-statistic: 582.7 on 3 and 1058 DF, p-value: < 2.2e-16
```

```
#Add predicted score as column of dataframe
school_data$expected_score <- predict(model, school_data, type = "response")
#Create new metric by comparing actual score to predicted score
school_data$new_metric <- school_data$avg_test_score - school_data$expected_score
```

```
top_n(school_data, 10, new_metric) %>%
select(school, district, avg_test_score, new_metric)
```

```
# A tibble: 10 x 4
school district avg_test_score new_metric
<chr> <chr> <dbl> <dbl>
1 Nooksack Elementary Nooksack Valley School Dâ€¦ 0.863 0.415
2 Hamilton Elementary Port Angeles School Distâ€¦ 0.846 0.371
3 Evergreen Elementary Bethel School District 0.839 0.518
4 Moxee Elementary East Valley School Distrâ€¦ 0.741 0.400
5 Chester H Thompson â€¦ Bethel School District 0.734 0.511
6 Gildo Rey Elementarâ€¦ Auburn School District 0.729 0.510
7 Garfield Elementaryâ€¦ Everett School District 0.722 0.411
8 Paterson Elementaryâ€¦ Paterson School District 0.706 0.515
9 Stanley Tacoma Public Schools 0.65 0.423
10 Union Gap School Union Gap School District 0.48 0.390
```