CompactClassificationEnsemble
Compact classification ensemble
Description
Compact version of a classification ensemble. The compact version does not include the data for training the classification ensemble. Therefore, you cannot perform some tasks with a compact classification ensemble, such as cross validation. Use a compact classification ensemble for making predictions (classifications) of new data.
Creation
Create a CompactClassificationEnsemble
object from a full ClassificationEnsemble
or ClassificationBaggedEnsemble
model object by using compact
.
Properties
This property is read-only.
Categorical predictor
indices, specified as a vector of positive integers. CategoricalPredictors
contains index values indicating that the corresponding predictors are categorical. The index
values are between 1 and p
, where p
is the number of
predictors used to train the model. If none of the predictors are categorical, then this
property is empty ([]
).
Data Types: single
| double
This property is read-only.
List of the elements in Y
with duplicates removed, returned as a
categorical array, cell array of character vectors, character array, logical vector, or
numeric vector. ClassNames
has the same data type as the data in the
argument Y
. (The software treats string arrays as cell arrays of character
vectors.)
Data Types: double
| logical
| char
| cell
| categorical
This property is read-only.
Method used to combine weak learner weights, returned as either
'WeightedAverage'
or 'WeightedSum'
.
Data Types: char
This property is read-only.
Misclassification costs, returned as a square numeric matrix.
Cost
has K
rows and columns, where
K
is the number of classes.
Cost(i,j)
is the cost of classifying a point into class
j
if its true class is i
. The order of the
rows and columns of Cost
corresponds to the order of the classes in
ClassNames
.
Data Types: double
This property is read-only.
Expanded predictor names, returned as a cell array of character vectors.
If the model uses encoding for categorical variables, then
ExpandedPredictorNames
includes the names that describe the
expanded variables. Otherwise, ExpandedPredictorNames
is the same as
PredictorNames
.
Data Types: cell
This property is read-only.
Number of trained weak learners in the ensemble, returned as a positive integer.
Data Types: double
This property is read-only.
Predictor names, specified as a cell array of character vectors. The order of the
entries in PredictorNames
is the same as in the training data.
Data Types: cell
This property is read-only.
Prior probabilities for each class, returned as a K
-element numeric
vector, where K
is the number of unique classes in the response. The
order of the elements of Prior
corresponds to the order of the
classes in ClassNames
.
Data Types: double
This property is read-only.
Name of the response variable, returned as a character vector.
Data Types: char
Function for transforming scores, specified as a function handle or the name of a built-in
transformation function. "none"
means no transformation;
equivalently, "none"
means @(x)x
. For a list of
built-in transformation functions and the syntax of custom transformation functions, see
ScoreTransform
(for
trees) or ScoreTransform
(for
ensembles).
Add or change a ScoreTransform
function using dot notation:
Mdl.ScoreTransform = "function" % or Mdl.ScoreTransform = @function
Data Types: char
| string
| function_handle
This property is read-only.
Trained weak learners, returned as a cell vector. The entries of the cell vector contain the corresponding compact classification models.
Data Types: cell
This property is read-only.
Trained weak learner weights, returned as a numeric vector. TrainedWeights
has NumTrained
elements, where
NumTrained
is the number of weak
learners in the ensemble. The ensemble computes the predicted
response by aggregating weighted predictions from its
learners.
Data Types: double
Indicator that learner j
uses predictor i
,
returned as a logical matrix of size
P
-by-NumTrained
, where P
is
the number of predictors (columns) in the training data.
UsePredForLearner(i,j)
is true
when learner
j
uses predictor i
, and is
false
otherwise. For each learner, the predictors have the same
order as the columns in the training data.
If the ensemble is not of type Subspace
, all entries in
UsePredForLearner
are true
.
Data Types: logical
Object Functions
compareHoldout | Compare accuracies of two classification models using new data |
edge | Classification edge for classification ensemble model |
gather | Gather properties of Statistics and Machine Learning Toolbox object from GPU |
lime | Local interpretable model-agnostic explanations (LIME) |
loss | Classification loss for classification ensemble model |
margin | Classification margins for classification ensemble model |
partialDependence | Compute partial dependence |
plotPartialDependence | Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots |
predict | Predict labels using classification ensemble model |
predictorImportance | Estimates of predictor importance for classification ensemble of decision trees |
removeLearners | Remove members of compact classification ensemble |
shapley | Shapley values |
Examples
Create a compact classification ensemble for efficiently making predictions on new data.
Load the ionosphere
data set.
load ionosphere
Train a boosted ensemble of 100 classification trees using all measurements and the AdaBoostM1
method.
Mdl = fitcensemble(X,Y,Method="AdaBoostM1")
Mdl = ClassificationEnsemble ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' NumObservations: 351 NumTrained: 100 Method: 'AdaBoostM1' LearnerNames: {'Tree'} ReasonForTermination: 'Terminated normally after completing the requested number of training cycles.' FitInfo: [100×1 double] FitInfoDescription: {2×1 cell} Properties, Methods
Mdl
is a ClassificationEnsemble
model object that contains the training data, among other things.
Create a compact version of Mdl
.
CMdl = compact(Mdl)
CMdl = CompactClassificationEnsemble ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' NumTrained: 100 Properties, Methods
CMdl
is a CompactClassificationEnsemble
model object. CMdl
is almost the same as Mdl
. One exception is that CMdl
does not store the training data.
Compare the amounts of space consumed by Mdl
and CMdl
.
mdlInfo = whos("Mdl"); cMdlInfo = whos("CMdl"); [mdlInfo.bytes cMdlInfo.bytes]
ans = 1×2
864863 617126
Mdl
consumes more space than CMdl
.
CMdl.Trained
stores the trained classification trees (CompactClassificationTree
model objects) that compose Mdl
.
Display a graph of the first tree in the compact ensemble.
view(CMdl.Trained{1},Mode="graph");
By default, fitcensemble
grows shallow trees for boosted ensembles of trees.
Predict the label of the mean of X
using the compact ensemble.
predMeanX = predict(CMdl,mean(X))
predMeanX = 1×1 cell array
{'g'}
Tips
For an ensemble of classification trees, the Trained
property
of ens
stores an ens.NumTrained
-by-1
cell vector of compact classification models. For a textual or graphical
display of tree t
in the cell vector, enter:
view(ens.Trained{
for ensembles aggregated using LogitBoost or GentleBoost.t
}.CompactRegressionLearner)view(ens.Trained{
for all other aggregation methods.t
})
Extended Capabilities
Usage notes and limitations:
The
predict
function supports code generation.To integrate the prediction of an ensemble into Simulink®, you can use the ClassificationEnsemble Predict block in the Statistics and Machine Learning Toolbox™ library or a MATLAB® Function block with the
predict
function.When you train an ensemble by using
fitcensemble
, the following restrictions apply.The value of the
ScoreTransform
name-value argument cannot be an anonymous function.Code generation limitations for the weak learners used in the ensemble also apply to the ensemble.
For decision tree weak learners, you cannot use surrogate splits; that is, the value of the
Surrogate
name-value argument must be'off'
.For k-nearest neighbor weak learners, the value of the
Distance
name-value argument cannot be a custom distance function. The value of theDistanceWeight
name-value argument can be a custom distance weight function, but it cannot be an anonymous function.
For fixed-point code generation, the following additional restrictions apply.
When you train an ensemble by using
fitcensemble
, you must train an ensemble using tree learners, and theScoreTransform
value cannot be'invlogit'
.Categorical predictors (
logical
,categorical
,char
,string
, orcell
) are not supported. You cannot use theCategoricalPredictors
name-value argument. To include categorical predictors in a model, preprocess them by usingdummyvar
before fitting the model.Class labels with the
categorical
data type are not supported. Both the class label value in the training data (Tbl
orY
) and the value of theClassNames
name-value argument cannot be an array with thecategorical
data type.
For more information, see Introduction to Code Generation.
Usage notes and limitations:
The following object functions fully support GPU arrays:
The following object functions offer limited support for GPU arrays:
The object functions execute on a GPU if at least one of the following applies:
The model was fitted with GPU arrays.
The predictor data that you pass to the object function is a GPU array.
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2011aStarting in R2022a, the Cost
property stores the user-specified cost
matrix, so that you can compute the observed misclassification cost using the specified cost
value. The software stores normalized prior probabilities (Prior
)
that do not reflect the penalties described in the cost matrix. To compute the observed
misclassification cost, specify the LossFun
name-value argument as
"classifcost"
when you call the loss
function.
Note that model training has not changed and, therefore, the decision boundaries between classes have not changed.
For training, the fitting function updates the specified prior probabilities by
incorporating the penalties described in the specified cost matrix, and then normalizes the
prior probabilities and observation weights. This behavior has not changed. In previous
releases, the software stored the default cost matrix in the Cost
property and stored the prior probabilities used for training in the
Prior
property. Starting in R2022a, the software stores the
user-specified cost matrix without modification, and stores normalized prior probabilities that do
not reflect the cost penalties. For more details, see Misclassification Cost Matrix, Prior Probabilities, and Observation Weights.
Some object functions use the Cost
and Prior
properties:
The
loss
function uses the cost matrix stored in theCost
property if you specify theLossFun
name-value argument as"classifcost"
or"mincost"
.The
loss
andedge
functions use the prior probabilities stored in thePrior
property to normalize the observation weights of the input data.
If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases.
If you want the software to handle the cost matrix, prior
probabilities, and observation weights in the same way as in previous releases, adjust the prior
probabilities and observation weights for the nondefault cost matrix, as described in Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix. Then, when you train a
classification model, specify the adjusted prior probabilities and observation weights by using
the Prior
and Weights
name-value arguments, respectively,
and use the default cost matrix.
See Also
fitcensemble
| ClassificationEnsemble
| ClassificationBaggedEnsemble
| predict
| compact
| fitctree
| view
| compareHoldout
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Seleccione un país/idioma
Seleccione un país/idioma para obtener contenido traducido, si está disponible, y ver eventos y ofertas de productos y servicios locales. Según su ubicación geográfica, recomendamos que seleccione: .
También puede seleccionar uno de estos países/idiomas:
Cómo obtener el mejor rendimiento
Seleccione China (en idioma chino o inglés) para obtener el mejor rendimiento. Los sitios web de otros países no están optimizados para ser accedidos desde su ubicación geográfica.
América
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)