boxchart
Description
boxchart( creates a box chart, or
box plot, for each predictor in explainer)explainer.BlackboxModel.PredictorNames,
where explainer is a shapley object. For
each predictor, the function displays the Shapley values for the query points in
explainer.QueryPoints. The corresponding box plot displays the
following: the median, the lower and upper quartiles, any outliers (computed using the
interquartile range), and the minimum and maximum values that are not outliers.
If explainer.BlackboxModel is a classification model, the function
displays box plots for class explainer.BlackboxModel.ClassNames(1) by
default.
boxchart(
specifies additional options using one or more name-value arguments. For example, specify
explainer,Name=Value)NumImportantPredictors=5 to create box plots for the five features with
the greatest mean absolute Shapley values
(explainer.MeanAbsoluteShapley).
boxchart( displays the box
plots in the target axes ax,___)ax. Specify ax as the
first argument in any of the previous syntaxes.
returns a
b = boxchart(___)BoxChart object using any of the input argument combinations in the
previous syntaxes. Use b to query or modify the properties (BoxChart Properties) of the object after you create it.
Examples
Train a regression model and create a shapley object. Use the fit object function to compute the Shapley values for the specified query points. Then visualize the Shapley values for multiple query points by using the boxchart object function.
Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s.
load carbigCreate a table containing the predictor variables Acceleration, Cylinders, and so on, as well as the response variable MPG.
tbl = table(Acceleration,Cylinders,Displacement, ...
Horsepower,Model_Year,Weight,MPG);Removing missing values in a training set helps to reduce memory consumption and speed up training for the fitrkernel function. Remove missing values in tbl.
tbl = rmmissing(tbl);
Train a blackbox model of MPG by using the fitrkernel function. Specify the Cylinders and Model_Year variables as categorical predictors. Standardize the remaining predictors.
rng("default") % For reproducibility mdl = fitrkernel(tbl,"MPG",CategoricalPredictors=[2 5], ... Standardize=true);
Create a shapley object. Because mdl does not contain training data, specify the data set tbl.
explainer = shapley(mdl,tbl)
explainer =
BlackboxModel: [1×1 RegressionKernel]
QueryPoints: []
BlackboxFitted: []
Shapley: []
X: [392×7 table]
CategoricalPredictors: [2 5]
Method: "interventional-kernel"
Intercept: 23.2474
NumSubsets: 64
explainer stores the training data tbl in the X property. By default, shapley subsamples 100 observations from the data in X, and stores their indices in the SampledObservationIndices property.
Compute the Shapley values for all observations in tbl. To speed up computations, the fit object function uses the sampled observations instead of all of X to compute the Shapley values. If you have a Parallel Computing Toolbox™ license, you can further reduce computational time by setting the UseParallel name-value argument.
explainer = fit(explainer,tbl,UseParallel=true)
explainer =
shapley explainer with the following mean absolute Shapley values:
Predictor Value
______________ _______
"Acceleration" 0.5678
"Cylinders" 0.96799
"Displacement" 0.79668
"Horsepower" 0.78681
"Model_Year" 0.86258
"Weight" 0.987
Properties, Methods
For a regression model, fit computes Shapley values using the predicted response, and stores them in the Shapley property of the shapley object. Because explainer contains Shapley values for multiple query points, the function displays the mean absolute Shapley values by default.
Visualize the distribution of the Shapley values by using the boxchart object function.
boxchart(explainer)

For each predictor, the function displays a box plot of the Shapley values for the query points. The function determines the order of the predictors by using the mean absolute Shapley values.
The box plot for the Weight predictor indicates that the Shapley values are distributed symmetrically about the median. The minimum is slightly less than –2, the 25th percentile is approximately –1, the median is approximately 0, the 75th percentile is approximately 1, and the maximum is approximately 2.
Use a data tip to view the Shapley value metrics for the Weight predictor.
b = boxchart(explainer);
datatip(b,"DataIndex",6);
Train a classification model and create a shapley object. Then visualize the Shapley values for multiple query points by using the boxchart object function.
Load the CreditRating_Historical data set. The data set contains customer IDs and their financial ratios, industry labels, and credit ratings.
tbl = readtable("CreditRating_Historical.dat");Display the first three rows of the table.
head(tbl,3)
ID WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry Rating
_____ _____ _____ _______ ________ _____ ________ ______
62394 0.013 0.104 0.036 0.447 0.142 3 {'BB'}
48608 0.232 0.335 0.062 1.969 0.281 8 {'A' }
42444 0.311 0.367 0.074 1.935 0.366 1 {'A' }
Train a blackbox model of credit ratings by using the fitcecoc function. Use the variables from the second through seventh columns in tbl as the predictor variables. A recommended practice is to specify the class names to set the order of the classes.
blackbox = fitcecoc(tbl,"Rating", ... PredictorNames=tbl.Properties.VariableNames(2:7), ... CategoricalPredictors="Industry", ... ClassNames={'AAA','AA','A','BBB','BB','B','CCC'});
Create a shapley object that explains the predictions for multiple query points. For faster computation, shapley subsamples 100 observations from the predictor data in blackbox to compute the Shapley values. Specify the sampled observations as the query points in the call to the fit object function.
rng("default") % For reproducibility explainer = shapley(blackbox); queryPoints = explainer.X(explainer.SampledObservationIndices,:); explainer = fit(explainer,queryPoints);
For a classification model, fit computes Shapley values using the predicted class scores, and stores them in the Shapley property of the shapley object. Because explainer contains Shapley values for multiple query points, display the mean absolute Shapley values instead.
explainer.MeanAbsoluteShapley
ans=6×8 table
Predictor AAA AA A BBB BB B CCC
__________ _________ __________ _________ _________ _________ _________ _________
"WC_TA" 0.055977 0.034453 0.027338 0.023902 0.036098 0.054763 0.054931
"RE_TA" 0.12468 0.10314 0.10787 0.087013 0.090298 0.17123 0.2552
"EBIT_TA" 0.0015598 0.00095166 0.0011936 0.0010499 0.0010047 0.0018817 0.0017712
"MVE_BVTD" 0.84966 0.68785 0.66198 0.94501 1.3672 1.5715 1.2161
"S_TA" 0.025009 0.0095605 0.010606 0.014469 0.0017235 0.0075275 0.012529
"Industry" 0.076169 0.085926 0.063854 0.046528 0.053801 0.11261 0.11829
For each predictor and class, the mean absolute Shapley value is the absolute value of the Shapley values, averaged across all query points. For class AA, the MVE_BVTD predictor has a noticeably greater mean absolute Shapley value than the other predictors.
Visualize the distribution of the Shapley values for class AA by using the boxchart object function.
boxchart(explainer,ClassName={'AA'})
For each predictor, the function displays a box plot of the Shapley values for the query points. The function determines the order of the predictors by using the mean absolute Shapley values.
For class AA, some of the Shapley values for the RE_TA predictor are outliers. This result suggests that, for a few query points, the predictor greatly affects the class AA predicted score.
Input Arguments
Object explaining the blackbox model, specified as a shapley
object. explainer must contain Shapley values; that is,
explainer.Shapley must be nonempty.
Axes for the plot, specified as an Axes object. If you do not specify ax, then boxchart creates the plot using the current axes. For more information on creating an Axes object, see axes.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN, where Name is
the argument name and Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: boxchart(explainer,NumImportantPredictors=5,JitterOutliers="on")
creates a box plot for each of the five predictors with the greatest mean absolute Shapley
values, and jitters the outliers in the box plots.
Number of important predictors to plot, specified as a positive integer. The
boxchart function plots the Shapley values of the specified
number of predictors with the greatest mean absolute Shapley values.
Example: NumImportantPredictors=5 specifies to plot the five most important predictors. The boxchart function determines the order of importance by using the mean absolute Shapley values.
Data Types: single | double
Class label to plot, specified as a numeric scalar, logical scalar, character vector, string
scalar, or categorical scalar. The value and data type of ClassName
must match one of the class names in the ClassNames property of the
machine learning model in explainer
(explainer.BlackboxModel.ClassNames). The software accepts
character vectors, string scalars, and categorical scalars interchangeably.
This argument is valid only when the machine learning model (BlackboxModel) in explainer is a classification model.
Example: ClassName="AAA"
Data Types: single | double | logical | char | string | categorical
Outlier marker displacement, specified as "on" or
"off", or as numeric or logical 1
(true) or 0 (false). A
value of "on" is equivalent to true, and
"off" is equivalent to false. Therefore, you
can use the value of this property as a logical value. The value is stored as an
on/off logical value of type matlab.lang.OnOffSwitchState.
If you specify the JitterOutliers value as
"on", then boxchart randomly displaces the
outlier markers along the vertical direction to help you distinguish between outliers
that have similar Shapley values.
Example: JitterOutliers="on"
Data Types: single | double | logical | char | string
More About
In game theory, the Shapley value of a player is the average marginal contribution of the player in a cooperative game. In the context of machine learning prediction, the Shapley value of a feature for a query point explains the contribution of the feature to a prediction (response for regression or score of each class for classification) at the specified query point.
The Shapley value of a feature for a query point is the contribution of the feature to the deviation from the average prediction. For a query point, the sum of the Shapley values for all features corresponds to the total deviation of the prediction from the average. That is, the sum of the average prediction and the Shapley values for all features corresponds to the prediction for the query point.
For more details, see Shapley Values for Machine Learning Model.
Tips
Use
boxchartwhenexplainercontains Shapley values for many query points.
Version History
Introduced in R2024a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Seleccione un país/idioma
Seleccione un país/idioma para obtener contenido traducido, si está disponible, y ver eventos y ofertas de productos y servicios locales. Según su ubicación geográfica, recomendamos que seleccione: .
También puede seleccionar uno de estos países/idiomas:
Cómo obtener el mejor rendimiento
Seleccione China (en idioma chino o inglés) para obtener el mejor rendimiento. Los sitios web de otros países no están optimizados para ser accedidos desde su ubicación geográfica.
América
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)