Report.Rmd

---
title: "WNBA Adjusted Plus-Minus Models"
date: "June 15, 2021"
author: "Team Dean Oliver  \nBrad Hymas, Cameron Jones, Cecelia Fu, Chase Cardon,  \nDavid Teuscher, Sam Francis, and Tanner Darm"
header-includes:
    - \usepackage{setspace}\doublespacing
output: pdf_document
bibliography: AdjustedPlusMins.bibtex  
---

---
nocite: | 
  @sportsref, @tidyverse, @mass, @patchwork, @broom, @R, @jacobs2017
...

# Background

Hall of Fame NBA player and coach Phil Jackson once said, “The strength of the team is each individual member. The strength of each member is the team.”  Fans, coaches, and players alike would agree that successful basketball teams are the ones who have strong individual players who work well together as a team to win games.

A player’s  plus-minus rating is one of many basketball statistics that seeks to quantify the value of a player. Specifically, the plus-minus rating was developed to identify a player’s contribution when they are on the court. Its calculation is simple; take the overall change in the score from when a player enters the game and when they exit the game, and this is their +/- rating.

This rating is useful in that it is a way of measuring a player’s value to the team beyond simply how many points they score, how many baskets they assist, or how often they make free-throws. Instead, plus-minus ratings can give an indication of a player’s value to a team. 

Normally, this is hard to figure out because it is impossible to track how many points defensive players prevent the other team from scoring. A way around this is to measure the change in the opposing team’s scoring rate when different combinations of players are on the court. In basketball, the fundamental objective is to have a higher scoring rate throughout the game than your opponent, and the ability to track an individual player’s effect on that scoring rate is a great way to measure that player’s overall value. It is not the only valid method for measuring that value, but it makes a lot of sense from a holistic perspective. 

Adjusted plus-minus is similar to plus-minus in that it uses the net change in score when a player is on the court; however, APM seeks to go a step further than this. Other important aspects of the game are taken into consideration, such as the aggregate contribution of the other players on the court and the existence of a team’s home court advantage. 

One of the reasons that we must use an adjusted plus-minus is to see what players are valuable by themselves, and not just being carried by their team. Plus-minus ratings alone are not enough to tell us this; it is possible that a player could have a high plus-minus rating simply because they were on the court at the same time as players who were really good, or their rating could be high simply because they played for only a few minutes and those few minutes happened to be a good run for the team.

Manu Ginobili is an example of a player who has a high plus-minus rating because of his team and not necessarily because of his extraordinary contribution on the court. Manu Ginobili has the fifth highest non-adjusted plus-minus rating; better than Kobe Bryant, Shaquille O’Neal, or Steph Curry. Even though Manu Ginobili won four national championships, it would be difficult to argue that he is better than the league MVPs. Manu Ginobili could be seen most games coming off the bench to support his team later in the game. He also played with 15-time all star Tim Duncan, six-time all star Tony Parker, and five-time all star and two-time Defensive Player of the Year, Kawhi Leonard. It is obvious that if Ginobili was on the floor with those people, he was likely to have a high plus-minus. His high rating, however, would be a result of the all stars he shared the court with, and not so much his individual contribution.

Draymond Green is a prime example of how adjusted plus-minus ratings can reveal the extraordinary contributions of players who are otherwise seen as average or comparatively unimportant to a team. Green, a player on the Golden State Warriors, had a game in which he only scored four points for his team against the Grizzlies in 2017. He also had ten assists, eleven rebounds, and ten steals. He finished the game with the highest adjusted plus-minus out of anyone in the game without scoring many points. This allows us to see that an adjusted plus-minus statistic shows us how useful a player truly is when they are on the court. It also shows how important other statistics are besides points scored.

The Lebron James vs. Michael Jordan debate[@unk2020] is another area in which an adjusted plus/minus model could bring a lot of context. If you measure each player’s performance based simply on points per game, then Jordan is the winner with 30.1 compared to Lebron James’ 27.0. However, James has a lot more rebounds, assists, and is a better 3-point shooter, so there are a lot of things to consider in the definition of who is a better all around player. Michael Jordan finished his career with a Box Score Plus/Minus of 9.21, where Lebron James currently has a 8.89 BPM. It is easy to pick-and-choose statistics showing who is better that simply support one’s pre-existing belief, but in an argument over who actually provides the most value to their team as a whole in the game of basketball holistically, adjusted plus/minus becomes a very important metric. 

```{r, echo=FALSE, message=FALSE, warning=FALSE, out.width='67%', fig.align='center'}
library(tidyverse)
dat <- read.csv("Data/turnovers.csv")
ggplot(dat, aes(x = Turnovers, y = PlusMinus)) + 
    geom_point() + theme_minimal() +
    labs(y = "Plus Minus")
```

In addition to being very dependent on teammates and opponents, traditional plus-minus scores appear random with little correlation to traditional box score stats. The figure above compares the total turnovers in a game and the plus-minus rating from the same game from over 4,000 player performances in the 2019 WNBA season. As we see, as a player gets more turnovers, an event that should certainly lead to a lower plus-minus, we see little actual effect on the players’ plus-minus. Ultimately, from all of these player performances, no common basketball statistics were correlated with plus-minus. Looking at points, rebounds, assists, steals, blocks, fouls, turnovers and more, the best correlated stat was points, where the correlation coefficient was 0.229. We determined that we must in fact create a model to establish an adjusted plus-minus metric.


A great analogy of a statistic such as plus-minus being used in forming a team’s strategy because of its holistic value to basketball is seen in the movie Moneyball[@moneyball2011], directed by Bennett Miller. When picking their team for the upcoming season, the protagonists choose to look at baseball player’s statistics of how often they got on base, as opposed to traditional baseball statistics such as their batting average. In the movie, the statistician noticed that while the game of baseball itself is measured by the number of runs made throughout the game, a player’s value to his specific team could actually be even more effectively measured by the number of times they actually got on base, and this metric is what they focused on the most in their player selection. Their novel approach allowed them to recruit players that normally would have been overlooked based just on traditional metrics, and revolutionized recruiting in Major League Baseball. While there has not been quite as dramatic of a storyline surrounding plus-minus models in basketball, the principle of looking at more holistic statistics to gauge player value is the same when you compare plus-minus to things like free-throws or average points per game.

Ultimately by understanding the plus-minus model coefficients for the WNBA, we can see what players are considered the best “team players” in the league. Having a better picture of a player’s overall value is helpful to consider when teams are making decisions such as player salary or team strategy. 

# Previous Work 

There has been an interest in adjusted plus-minus models for a number of years. The first origins of an adjusted plus-minus began with Winston and Sagarin[@winsag], who developed WinVal, which was used by the Dallas Mavericks to rank and value players. Since Winston and Sagarin were working for the Mavericks, their work was not publicly released. In 2004, Dan Rosenbaum[@rosenbaum2004] implemented the same idea and released the work publicly. Rosenbaum drew from a player's plus-minus, but also accounted for the quality of the teammates that a player would play with. The adjusted plus-minus was calculated by performing a regression with each player represented by a binary variable indicating whether or not they were on the court. 

Later work focused on improvements to the initial versions of APM. In 2010, Joe Sill presented work on Regularized Adjusted +/- (RAPM)[@sill2010], focusing on solving issues with unstable and unreliable APM estimates due to multicollinearity and small amounts of playing time. RAPM uses ridge regression, which uses penalization to shrink coefficient estimates towards zero and provides more stable results, especially when there is multicollinearity. RAPM has a number of advantages over APM, specifically that it allows each player to have a reliable estimate, even if they played few minutes, as well as needing less to get reliable estimates. Sill[@sill2010] found that three years of data were needed to get estimates for APM that were as reliable as estimates for RAPM from one year of data. In his paper, Sill also proposes the use of Bayesian methods to improve adjusted plus-minus models.

Engelmann[@engelmann2017] suggested other additional variables to be added to an APM model, such as days of rest, distance traveled, or even the score difference in the game. Previous research by Engelmann and others had shown that players play worse the further ahead they are and that could also be considered in APM models. Engelmann has also shown that offensive and defensive estimates for APM can be obtained to determine how influential a player is on both sides of the ball. The most recent research with adjusted plus-minus models has been focused on incorporating player tracking data that is readily available. ESPN's Real Plus Minus[@ESPN] has been updated to include player tracking data, which provides better information of how players impact plays and not simply whether or not they are on the court 


# Data

The play-by-play data was web scraped from ESPN using the wehoop R package[@wehoop]. After obtaining the raw play-by-play data, additional parsing was done to prepare the data for the adjusted plus-minus models. First, the lineups for both teams during each play were determined by pulling the starters and replacing a player each time a substitution occurred. There are occurrences in the data where the substitution was not correctly recorded or one of the players involved in the substitution was missing. In order to account for most of those errors, the play-by-play data from Basketball Reference was compared to determine the correct substitutions. This accounted for and corrected most of the discrepancies in the play-by-play data, but there were a few other instances where the substitutions had to be manually corrected. The most common scenario where a substitution needed to be manually corrected is when a substitution occurs at the beginning of a quarter and is incorrectly recorded in the play-by-play data from ESPN. Other sources for WNBA play-by-play data, such as Basketball Reference and WNBA.com, do not include substitutions at the beginning of the quarter, resulting in the need for manual correction of a number of substitutions. This should be taken into account when expanding to additional seasons of WNBA play, since either there will be some substitutions that will need to be manually corrected, or certain games will need to be excluded when calculating the point differential per possession. 

Once the lineup for both teams was determined for every play, the play-by-play data was reduced into possessions, where possession of the ball changed from one team to the other. Most often, defensive rebounds, a turnover, or a made shot indicated a change in possession, although there were a few other scenarios that had to be accounted for, such as being fouled while making a shot and jump balls. Once separated into possessions, the point differential from the start of the possession to the end of the possession was calculated with the difference being positive if the home team scored and negative if the away team scored. It was also determined whether the home or away team had the ball in order to account for home court advantage when fitting the models. Dummy variables were created for every WNBA player that played in 2019 and each possession recorded the 10 players who were on the court for that possession, which then allowed the adjusted plus-minus models to be fit. 


# Model Specification and Justification

Adjusted plus-minus models are fit using the framework of multiple linear regression. The adjusted plus-minus (APM) model is: 
\begin{center}$\text{y}  =\textbf{X}\boldsymbol\beta + \boldsymbol\epsilon$\end{center}
\begin{center}$\boldsymbol\epsilon \sim \mathcal{MVN}(\textbf{0}, \sigma^2\textbf{I})$\end{center} 

where $y_i$ is the point differential with respect to the home team score for the ith possession and $\textbf{X}$ is a matrix that has a column for every WNBA player who played in 2019 as well as an indicator for whether the possession was for the home team or not and a row for every possession from the 2019 season. The players who were on the court for a possession are indicated by 1 in the respective column for the player. The $\beta$ coefficients are the estimate for player APM per possession. Since APM is often reported per 100 possessions, these coefficients were multiplied by 100.

The regularized adjusted plus-minus model (RAPM) is: 

\begin{center}$\text{y}  =\textbf{X}\boldsymbol\beta + \boldsymbol\epsilon$\end{center}
\begin{center}$\boldsymbol\epsilon \sim \mathcal{MVN}(\textbf{0}, \sigma^2\textbf{I})$\end{center} 
$$L(\lambda,\beta) = |y-X\beta|^2 + \lambda * \sum_{p=1}^{P}(\beta_p^2)$$
$$\hat{\beta}_{\text{ridge}} = \text{arg} \min_{\beta}\{ L(\lambda,\beta) \}$$


where $y_i$ and $\textbf{X}$ and beta are the same as the APM model, with an additional parameter, lambda, which is a shrinkage parameter that penalizes the ridge regression model based on the size of the estimated beta coefficients. Ridge regression is similar to linear regression, but there is an additional penalty that is included. As the coefficients get larger, the penalty increases, so the penalization shrinks the coefficients towards 0. In order to maximize the function with the additional penalty term, most coefficients will be a lot smaller than they will be for the APM model. 

Both of these models have assumptions that are essential if they are to be used to make inference about the data. For the APM, the essential assumptions are linearity of the data, independence between observations, and normality and homoscedasticity of the residuals. Linearity is somewhat of a superfluous assumption for this model because all of the variables are binary variables and added-variable plots of binary variables do not reveal much information. Since all of the explanatory variables are binary, it is assumed that the linearity assumption is met. There are some potential issues with the independence assumption. For this analysis, it is assumed that possessions are independent of each other, but there may be some arguments that possessions are not completely independent of each other. Although there may be viable arguments, it is not unreasonable to assume independence, thus the possessions are assumed to be independent. The figure below shows a histogram of the residuals and a fitted vs. residual values plot and both the normality and homoscedasticity assumptions appear to be met based on the plots below, so these assumptions are valid and it seems reasonable to use multiple linear regression to model the data. 

```{r, echo=FALSE, message=FALSE, warning=FALSE, out.width='67%', fig.align='center'}
library(tidyverse)
library(patchwork)
library(MASS)
library(broom)
points <- read.csv("Data/point_diff_2019_updated.csv", header = TRUE)

# Set the y vector and X matrix
y <- points$point_diff
x <- points %>% dplyr::select(-point_diff, -game_id, -home_possession) %>% as.matrix()

# Fit an APM (multiple linear regression ) and a RAPM model (ridge regression) and extract the coefficients
ridge.model <- lm.ridge(y~-1+x, lambda = 2000)
model <- lm(y~-1+x)

stdresiduals <- stdres(model)

plot1 <- ggplot(model, aes(x = stdresiduals)) + 
  geom_histogram(mapping = aes(x = stdresiduals, y = ..density..), binwidth = 1, color = "black", fill = "white") +
  stat_function(fun = dnorm, color = "red", lwd = 1) +
  labs(x = 'Standardized Residuals', y = 'Count') + theme_minimal()

#3. Check equal variance with fitted vs. residual plot and BP-test
plot2 <- ggplot() + geom_point(aes(x = model$fitted.values, y = stdresiduals)) + 
    geom_hline(yintercept = 0, col = "blue") + 
    labs(x = "Fitted Values", y = "Standardized Residuals") + theme_minimal()
plot1  + plot2
```


Since all of these assumptions are met, the APM model can be fit. Even though the key assumptions for multiple linear regression is met, there is one other area of concern that leads one to believe that the APM model is not the best model for determining player fit. Multicollinearity is an issue that arises with the APM model because there are many players who play together with each other as a result of substitution patterns and scouting by teams. The multicollinearity makes it difficult to determine which player actually had a positive or negative effect on the team. Also, players who played very few possessions have unstable and unreliable estimates from the APM model and the variance of these estimates are generally large. As a result of these issues, RAPM is used. As previously mentioned, ridge regression will shrink coefficients towards 0, which keeps players who did not play a lot from having large coefficient estimates. The other benefit of ridge regression is that correlated players will shrink towards each other rather than removing one of the correlated variables, which is the result when using LASSO. MacDonald[@macdonald2012] also showed that single season estimates with ridge regression are stable enough to have value. 

The only assumption needed for ridge regression is linearity and since all of the variance are binary indicator variables, it is assumed that this assumption is met and it is appropriate to use ridge regression for this analysis. An important part of the ridge regression model is tuning the shrinkage parameter, $\lambda$. The figure below shows that the generalized cross validation score continues to decrease as $\lambda$ increases. It is not reasonable to have $\lambda$ so high because it will shrink all of the coefficients towards 0 and there will be no useful results from the model. Joe Sill[@sill2010] found that a $\lambda$ of 2000 worked well when working with NBA data and that value has been used for this analysis. There could be additional work done to examine the effect of different $\lambda$ values of the estimates, but that work is left for another analysis. 

```{r, echo=FALSE, message=FALSE, warning=FALSE, out.width='67%', fig.align='center'}
rapm2 <- lm.ridge(formula = point_diff~., data = points,
                  lambda = seq(0, 5000, 200))
r2 <- tidy(rapm2)
g <- glance(rapm2)
ggplot(r2, aes(lambda, GCV)) + geom_line() + geom_vline(xintercept = g$lambdaGCV, col = "red", lty = 2) + theme_minimal()
```

# Results

The two tables below show the top ten coefficients from each model as well as the variance for each coefficient. The variance of the RAPM coefficients is much smaller than the variance of the APM coefficients, which is one of the reasons why the RAPM model is preferred over the APM model. The RAPM coefficients have a smaller enough variance to determine whether or not some players are significantly better or worse than the average player. 

```{r, message=FALSE, warning=FALSE, echo=FALSE}
ridge.coefs <- ridge.model$coef * 100
coefs <- model$coef * 100
lm_var <- diag(vcov(model))* 1000
s2 <- 1/(nrow(x) - ncol(x)) * t(y - x %*% matrix(ridge.model$coef)) %*% (y - x %*% matrix(ridge.model$coef))
ridge_var_part1 <- solve(t(x) %*% x + 2000* diag(ncol(x))) %*% t(x) %*% x %*% solve(t(x) %*% x + 2000 * diag(ncol(x)))
ridge_var <- as.vector(s2) * diag(ridge_var_part1) * 1000

player2019 <- read.csv("Data/Players_2019.csv")
ModelInfo <- player2019 %>% arrange(Player) %>% bind_cols(APM_coef = coefs, APM_var = lm_var, RAPM_coef = ridge.coefs, RAPM_var = ridge_var)

rapm_players <- ModelInfo %>% 
    dplyr::arrange(desc(RAPM_coef)) %>% 
    dplyr::select(Player, RAPM_coef, RAPM_var) %>%
    mutate(RAPM_coef = round(RAPM_coef, digits = 3), 
           RAPM_var = round(RAPM_var, digits = 3)) %>%                                                                             head(10)
apm_players <- ModelInfo %>% 
    dplyr::arrange(desc(APM_coef)) %>% 
    dplyr::select(Player, APM_coef, APM_var) %>% 
    mutate(APM_coef = round(APM_coef, digits = 3), 
           APM_var = round(APM_var, digits = 3)) %>%                                                                             head(10)
names(rapm_players) <- c("Player", "RAPM", "RAPM Variance")
names(apm_players) <- c("Player", "APM", "APM Variance")
knitr::kable(rapm_players)
knitr::kable(apm_players)
```


The figure below also illustrates that the variance is smaller for the RAPM model. The estimate of the coefficients for the RAPM model range from -3 to 3, while the estimates of the coefficients for the APM model range from approximately -20 to 20. While the distribution of the coefficients is normal for both models, which is what was expected, inference can only be made about the coefficients from the RAPM model because the variance of the coefficients in the APM model is too large. 

```{r, echo=FALSE, warning=FALSE, message=FALSE, out.width='67%', fig.align='center'}
plot_rapm <- ggplot(ModelInfo, aes(x=(RAPM_coef))) + 
                geom_histogram(aes(y = ..density..), 
                               color = 'black',
                               bins = 10,
                               fill = '#a9daff') +
                stat_function(fun = stats::dnorm,
                              args = list(
                                  mean = mean(ModelInfo$RAPM_coef, na.rm = TRUE),
                                  sd = stats::sd(ModelInfo$RAPM_coef, na.rm = TRUE)
                              ),
                              col = '#317196',
                              size = 2) + 
                xlab("RAPM") +
                ylab('Density') + 
                theme_minimal() + theme(panel.background = element_rect(fill = "transparent", colour = NA),  plot.background = element_rect(fill = "transparent", colour = NA)) + geom_text(x = -2.5, y = .4, label = "Variance: 0.939" , check_overlap = TRUE)
plot_apm <- ggplot(ModelInfo, aes(x=(APM_coef))) + 
                geom_histogram(aes(y = ..density..), 
                               color = 'black',
                               bins = 10,
                               fill = '#a9daff') +
                stat_function(fun = stats::dnorm,
                              args = list(
                                  mean = mean(ModelInfo$APM_coef, na.rm = TRUE),
                                  sd = stats::sd(ModelInfo$APM_coef, na.rm = TRUE)
                              ),
                              col = '#317196',
                              size = 2) + 
                xlab("APM") +
                ylab('Density') + 
                theme_minimal() + theme(panel.background = element_rect(fill = "transparent", colour = NA),  plot.background = element_rect(fill = "transparent", colour = NA))  + geom_text(x = -16.5, y = .05, label = "Variance: 38.580", check_overlap = TRUE)
plot_apm / plot_rapm
```

The figure below also provides an interesting finding for certain players. Although there is very little correlation between RAPM and a player's salary, there are some players who out perform their salary by contributing a lot even though they have a low salary and there are players who do not contribute to the team as much even though they are being paid more. This is a useful finding for both front offices and players because players can use adjusted plus-minus to show their true contribution to the team while front offices can determine which players are worth paying more when deciding which free agents to sign or whether or not they should offer extensions to their own players. 
```{r, echo=FALSE, message=FALSE, warning=FALSE, out.width='67%', fig.align='center'}
salary <- read.csv("Data/WNBA player salary 20192.csv")
all_salary <- ModelInfo %>% inner_join(salary, by = "Player")
ggplot(all_salary, aes(x = Salary, y = RAPM_coef)) + 
    geom_point(col = '#317196', size = 2) + 
    labs(y = "RAPM") +
    scale_x_continuous(labels = scales::dollar_format()) +
    geom_vline(xintercept = median(all_salary$Salary,na.rm = TRUE, color = "")) +
    geom_hline(yintercept = 0) +
    theme_minimal()
```


# Conclusion

Going forward, our models would be greatly improved if we were able to implement player tracking data for the WNBA players. Currently, we account for the fact that a player is on or off the court and whether or not a team is at home or away. Player tracking would provide us with information such as which players are near each other on the court and which players are more active. It is unclear if this data exists for all WNBA teams, but at least four WNBA teams do have this data. Gaining access to this could improve our player ratings and give amazing insights to player value metrics.

In addition to player tracking data, we could easily expand our models to include all previous seasons of the WNBA. This would allow us to more deeply assess APM ratings for players across seasons.

Finally, it would be natural for this model to be applied to college basketball teams. We could compare the APM ratings of college players who move onto the NBA or WBNA to identify potential patterns in their  ratings. This could be helpful in evaluating how players change when they enter the NBA or WNBA and potentially lead to better predictions about which college players will perform best in the NBA or WNBA.

Adjusted plus-minus models allow us to understand how effective a player is for their entire team. Through our model we can clearly see what WNBA players are able to improve the amount of points over the other team. This can allow us to understand what statistics are important and what players are considered “team players,” as they do what is best for their team. This will help players to be assessed properly and even paid the salary that they deserve because of their contribution to their team. Adjusted plus-minus models in the WNBA will help managers understand what players are valuable to help win championships and create legacies.


# References
<div id="refs"></div>