Unveiling black boxes with SHAP Values #

Nowadays, correct interpretation of model predictions is crucial. It builds user confidence, helps to understand the process being modeled, and suggests how to improve the model. Sometimes simple models are preferred, e.g. in finance, because they are easy to interpret, but they usually do not achieve the same performance as complex ones. Therefore, to overcome this trade-off between accuracy and interpretability, various methods have been developed. In this post I want to talk about one of them, the SHAP framework.

What are SHAP Values? #

SHAP Values is a method that assigns each feature a value that reflects its contribution to the model prediction. These values are based on cooperative game theory, the concept of Shapley values, introduced by Lloyd Shapley. In this context, each attribute is treated as a player in the game, and the Shapley value measures the average marginal contribution of each attribute across all possible combinations of attributes

The Mathematics Behind SHAP Values #

As already mentioned, Shapley values is a concept in the theory of cooperative games. For each such game, it specifies the distribution of the total payoff received by the coalition of all players.

Formal Definition: #

The Shapley value for player \( i \) in a cooperative game is defined as the average marginal contribution of the player to the coalitions. Formally, we have a set \(N\) of players and a characteristic function \(\mathcal{v}\) representing gains, which maps subset of players to real numbers(gain). Also, \(\mathcal{v(\emptyset)}=0\) meaning that empty coaliation of players worths nothing. Then, the Shapley value \(\phi_i\) for player \(i\) is given by: \[\phi_i{(\mathcal{v})}=\sum_{S \subset N \backslash \left\{i\right\}}{\frac{|S|!(n-|S|-1)!}{|N|!} (\mathcal{v}(S \cup \left\{i\right\})-\mathcal{v}(S))}\]

where:

\(S\) is a subset of players excluding player \(i\) .
\(|S|\) is the number of player in the coalition \(S\)
\(\mathcal{v}(S)\) is a total gain of the coalition \(S\) This formula calculates the marginal contribution of player \(i\) to each possible coalition and then averages it.

Extension to SHAP Values and Properties: #

SHAP combines the local interpretability methods(Linear LIME, for example) and Shapley values. It results in desired properties:

Local accuracy: \[f(x)=g(x')=\phi_0+\sum^M_{i=1}{\phi_ix'_i}\] The explanation model \(g(x')\) matches the original model \(f(x)\) when \(x=h_x(x')\) , where \(\phi_0=f(h_x(0))\) represents the model output with all simplified inputs toggled off(missing)
Missingness: \[x'_i=0 \to \phi_i=0\] Missing features have no attributed impact
Consistency: Let \(f_x(z')=f(h_x(z'))\) and \(z'\backslash i\) denote setting \(z'_i=0\) . For any two models \(f\) and \(f'\) , if \[f'_x(z')-f'_x(z'\backslash i)\geq f_x(z')-f_x(z'\backslash i)\] for all inputs \(z' \in \left\{0,1\right\}^M\) , then \(\phi_i(f',x)\geq\phi_i(f,x)\) . It means that if a model changes so that the marginal contribution of a feature value increases or stays the same (regardless of other features), the Shapley value also increases or stays the same.

Computation of SHAP Values #

Kernel SHAP #

This is a model-agnostic method for approximating SHAP values. This method uses a Linear LIME to locally approximate the original model.

First, we need to heuristically choose the parameters for LIME: \[\Omega(g)=0,\] \[\pi_{x}(z')=\frac{(M-1)}{(M \text{ choose } |z'|)|z'|(M-|z'|)}\] \[L(f,g,\pi_{x})=\sum_{z'\in Z}{[f(h_x(z'))-g(z')]^2\pi_{x}(z')}\]

Then, since \(g(z')\) is linear, \(L\) is a squared loss, the objective function of LIME: \(\xi=\underset{g \in \mathcal{G}}{\operatorname{argmin}}{L(f,g,\pi_{x'})+\Omega{(g)}}\) can be solved using linear regression.

Illustrative example #

Model and Instance: Let’s say we have a predictive model f and a dataset with three features. We want to understand how each feature contributes to the model’s prediction for a specific data point \(x = (x_1,x_2,x_3)\) by computing SHAP values.
Generating coalitions: To do it we need to consider all possible coalitions of features that could be used to make a prediction. Each coalition is a subset of the features used for prediction. The set of coalitions in our case: {0,0,0},{0,0,1},{0,1,0},{0,1,1},{1,0,0},{1,0,1},{1,1,0},{1,1,1}.
Obtaining modeling results for coalitions: For each of these coalitions, we compute the model output. The missing features must be imputed. We obtain the following outputs: \(f(\emptyset), f(x_1),f(x_2),f(x_3),f(x_1,x_2),f(x_1,x_3),f(x_2,x_3),f(x_1,x_2,x_3)\)
Obtaining weights for coalitions: \[\pi_{x}(z')=\frac{(M-1)}{(M \text{ choose } |z'|)|z'|(M-|z'|)}\] For example, \(\pi_x({0,0,1})=\frac{3-1}{\frac{3!}{1!(3-1)!}1(3-1)}=\frac{2}{6}=\frac{1}{3}\)
We obtain the model \(g\) : Finally, we train a linear model (explanation model) \(g\) . This model is trained to match the outputs of our original model \(f\) . The weights of the model \(g\) are obtained by optimizing the following loss function \[L(f,g,\pi_{x})=\sum_{z'\in Z}{[f(h_x(z'))-g(z')]^2\pi_{x}(z')}\] The weights of model \(g\) are the Shapley values.

Interpreting SHAP Values #

Individual Instance Interpretation: #

Feature Contribution: SHAP values give us the ability to measure how badly or good a feature is in making model predictions about an individual, instance. A positive value implies that the feature contributes towards increasing the prediction of the model while a negative value suggests that it reduces the prediction.
Magnitude: The magnitude or absolute value of a SHAP number impacts on how much influence a particular attribute has in our model prediction, Larger numbers indicate more importance in shaping up the final outcome.

Global Feature Importance: #

Feature Importance Ranking: The average of the absolute SHAP values for each feature across all instances gives us a global ranking of feature importance. This ranking helps identify the features that consistently have the most significant impact on model predictions.
Understanding Model Behavior: An insight into how well our model reacts to different levels of a given characteristic is only possible by studying distributions of its corresponding Shapley Values. In doing so, an expose can be made about any prejudices or non-linear aspects within.

Applications of SHAP Values #

SHAP values is a powerful tool that has several applications:

1. Model Debugging: we can identify features which cause problems in predictions, that can indicate, for example, data leakage or correlations.

2. Fairness and Bias Analysis: we can identify biases when the model making different unfair predictions based on some attributes like race, gender, and etc. Understanding the impact of these feature can help us develop fairer models.

Conclusion: #

Main takeaways: #

Shapley values is a measure of the average marginal contribution of each feature across all possible subsets of features.
SHAP values are connecting Shapley values to the local interpretability methods, providing properties such as locality, missingness, and consistency.
SHAP values enable us to better undestand both individual instance and global model behavior, as well as providing feature contribution analysis, feature importance ranking, and model behavior understanding.
They are commonly used for model debugging allowing identifying problematic features and developing more unbiased and fair model

References: #

The blog post is mainly based on the original paper introducing SHAP values - A Unified Approach to Interpreting Model Predictions (nips.cc)

The original implementation of SHAP - shap/shap: A game theoretic approach to explain the output of any machine learning model. (github.com)

My implementation of Kernel SHAP - https://colab.research.google.com/drive/1TPHvns2psDNKknwubxCTHZprw3UGB-w1?usp=sharing