# ClassificationNaiveBayes

Naive Bayes classification for multiclass classification

## Description

`ClassificationNaiveBayes` is a Naive Bayes classifier for multiclass learning. Trained `ClassificationNaiveBayes` classifiers store the training data, parameter values, data distribution, and prior probabilities. Use these classifiers to perform tasks such as estimating resubstitution predictions (see `resubPredict`) and predicting labels or posterior probabilities for new data (see `predict`).

## Creation

Create a `ClassificationNaiveBayes` object by using `fitcnb`.

## Properties

expand all

### Predictor Properties

Predictor names, specified as a cell array of character vectors. The order of the elements in `PredictorNames` corresponds to the order in which the predictor names appear in the training data `X`.

Expanded predictor names, specified as a cell array of character vectors.

If the model uses dummy variable encoding for categorical variables, then `ExpandedPredictorNames` includes the names that describe the expanded variables. Otherwise, `ExpandedPredictorNames` is the same as `PredictorNames`.

Categorical predictor indices, specified as a vector of positive integers. `CategoricalPredictors` contains index values corresponding to the columns of predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty (`[]`).

Data Types: `single` | `double`

Multivariate multinomial levels, specified as a cell array. The length of `CategoricalLevels` is equal to the number of predictors (`size(X,2)`).

The cells of `CategoricalLevels` correspond to predictors that you specify as `'mvmn'` during training, that is, they have a multivariate multinomial distribution. Cells that do not correspond to a multivariate multinomial distribution are empty (`[]`).

If predictor j is multivariate multinomial, then `CategoricalLevels{`j`}` is a list of all distinct values of predictor j in the sample. `NaN`s are removed from `unique(X(:,j))`.

Unstandardized predictors used to train the naive Bayes classifier, specified as a numeric matrix. Each row of `X` corresponds to one observation, and each column corresponds to one variable. The software excludes observations containing at least one missing value, and removes corresponding elements from Y.

### Predictor Distribution Properties

Predictor distributions, specified as a character vector or cell array of character vectors. `fitcnb` uses the predictor distributions to model the predictors. This table lists the available distributions.

ValueDescription
`'kernel'`Kernel smoothing density estimate
`'mn'`Multinomial distribution. If you specify `mn`, then all features are components of a multinomial distribution. Therefore, you cannot include `'mn'` as an element of a string array or a cell array of character vectors. For details, see Estimated Probability for Multinomial Distribution.
`'mvmn'`Multivariate multinomial distribution. For details, see Estimated Probability for Multivariate Multinomial Distribution.
`'normal'`Normal (Gaussian) distribution

If `DistributionNames` is a 1-by-P cell array of character vectors, then `fitcnb` models the feature j using the distribution in element j of the cell array.

Example: `'mn'`

Example: `{'kernel','normal','kernel'}`

Data Types: `char` | `string` | `cell`

Distribution parameter estimates, specified as a cell array. `DistributionParameters` is a K-by-D cell array, where cell (k,d) contains the distribution parameter estimates for instances of predictor d in class k. The order of the rows corresponds to the order of the classes in the property `ClassNames`, and the order of the predictors corresponds to the order of the columns of `X`.

If class `k` has no observations for predictor `j`, then the `Distribution{k,j}` is empty (`[]`).

The elements of `DistributionParameters` depend on the distributions of the predictors. This table describes the values in `DistributionParameters{k,j}`.

Distribution of Predictor jValue of Cell Array for Predictor `j` and Class `k`
`kernel`A `KernelDistribution` model. Display properties using cell indexing and dot notation. For example, to display the estimated bandwidth of the kernel density for predictor 2 in the third class, use `Mdl.DistributionParameters{3,2}.BandWidth`.
`mn`A scalar representing the probability that token j appears in class k. For details, see Estimated Probability for Multinomial Distribution.
`mvmn`A numeric vector containing the probabilities for each possible level of predictor j in class k. The software orders the probabilities by the sorted order of all unique levels of predictor j (stored in the property `CategoricalLevels`). For more details, see Estimated Probability for Multivariate Multinomial Distribution.
`normal`A 2-by-1 numeric vector. The first element is the sample mean and the second element is the sample standard deviation.

Kernel smoother type, specified as the name of a kernel or a cell array of kernel names. The length of `Kernel` is equal to the number of predictors (`size(X,2)`). `Kernel{`j`}` corresponds to predictor j and contains a character vector describing the type of kernel smoother. If a cell is empty (`[]`), then `fitcnb` did not fit a kernel distribution to the corresponding predictor.

This table describes the supported kernel smoother types. I{u} denotes the indicator function.

ValueKernelFormula
`'box'`Box (uniform)

`$f\left(x\right)=0.5I\left\{|x|\le 1\right\}$`

`'epanechnikov'`Epanechnikov

`$f\left(x\right)=0.75\left(1-{x}^{2}\right)I\left\{|x|\le 1\right\}$`

`'normal'`Gaussian

`$f\left(x\right)=\frac{1}{\sqrt{2\pi }}\mathrm{exp}\left(-0.5{x}^{2}\right)$`

`'triangle'`Triangular

`$f\left(x\right)=\left(1-|x|\right)I\left\{|x|\le 1\right\}$`

Example: `'box'`

Example: `{'epanechnikov','normal'}`

Data Types: `char` | `string` | `cell`

Kernel smoother density support, specified as a cell array. The length of `Support` is equal to the number of predictors (`size(X,2)`). The cells represent the regions to which `fitcnb` applies the kernel density. If a cell is empty (`[]`), then `fitcnb` did not fit a kernel distribution to the corresponding predictor.

This table describes the supported options.

ValueDescription
1-by-2 numeric row vectorThe density support applies to the specified bounds, for example `[L,U]`, where `L` and `U` are the finite lower and upper bounds, respectively.
`'positive'`The density support applies to all positive real values.
`'unbounded'`The density support applies to all real values.

Kernel smoother window width, specified as a numeric matrix. `Width` is a K-by-P matrix, where K is the number of classes in the data, and P is the number of predictors (`size(X,2)`).

`Width(k,j)` is the kernel smoother window width for the kernel smoothing density of predictor `j` within class `k`. `NaN`s in column `j` indicate that `fitcnb` did not fit predictor `j` using a kernel density.

### Response Properties

Unique class names used in the training model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors.

`ClassNames` has the same data type as `Y`, and has K elements (or rows) for character arrays. (The software treats string arrays as cell arrays of character vectors.)

Data Types: `categorical` | `char` | `string` | `logical` | `double` | `cell`

Response variable name, specified as a character vector.

Data Types: `char` | `string`

Class labels used to train the naive Bayes classifier, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Each row of `Y` represents the observed classification of the corresponding row of `X`.

`Y` has the same data type as the data in `Y` used for training the model. (The software treats string arrays as cell arrays of character vectors.)

Data Types: `single` | `double` | `logical` | `char` | `string` | `cell` | `categorical`

### Training Properties

Parameter values used to train the `ClassificationNaiveBayes` model, specified as an object. `ModelParameters` contains parameter values such as the name-value pair argument values used to train the naive Bayes classifier.

Access the properties of `ModelParameters` by using dot notation. For example, access the kernel support using `Mdl.ModelParameters.Support`.

Number of training observations in the training data stored in `X` and `Y`, specified as a numeric scalar.

Prior probabilities, specified as a numeric vector. The order of the elements in `Prior` corresponds to the elements of `Mdl.ClassNames`.

`fitcnb` normalizes the prior probabilities you set using the `'Prior'` name-value pair argument, so that `sum(Prior)` = `1`.

The value of `Prior` does not affect the best-fitting model. Therefore, you can reset `Prior` after training `Mdl` using dot notation.

Example: `Mdl.Prior = [0.2 0.8]`

Data Types: `double` | `single`

Observation weights, specified as a vector of nonnegative values with the same number of rows as `Y`. Each entry in `W` specifies the relative importance of the corresponding observation in `Y`. `fitcnb` normalizes the value you set for the `'Weights'` name-value pair argument, so that the weights within a particular class sum to the prior probability for that class.

### Classifier Properties

Misclassification cost, specified as a numeric square matrix, where `Cost(i,j)` is the cost of classifying a point into class `j` if its true class is `i`. The rows correspond to the true class and the columns correspond to the predicted class. The order of the rows and columns of `Cost` corresponds to the order of the classes in `ClassNames`.

The misclassification cost matrix must have zeros on the diagonal.

The value of `Cost` does not influence training. You can reset `Cost` after training `Mdl` using dot notation.

Example: `Mdl.Cost = [0 0.5 ; 1 0]`

Data Types: `double` | `single`

Cross-validation optimization of hyperparameters, specified as a `BayesianOptimization` object or a table of hyperparameters and associated values. This property is nonempty if the `'OptimizeHyperparameters'` name-value pair argument is nonempty when you create the model. The value of `HyperparameterOptimizationResults` depends on the setting of the `Optimizer` field in the `HyperparameterOptimizationOptions` structure when you create the model.

Value of `Optimizer` FieldValue of `HyperparameterOptimizationResults`
`'bayesopt'` (default)Object of class `BayesianOptimization`
`'gridsearch'` or `'randomsearch'`Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

Classification score transformation, specified as a character vector or function handle. This table summarizes the available character vectors.

ValueDescription
`'doublelogit'`1/(1 + e–2x)
`'invlogit'`log(x / (1 – x))
`'ismax'`Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
`'logit'`1/(1 + ex)
`'none'` or `'identity'`x (no transformation)
`'sign'`–1 for x < 0
0 for x = 0
1 for x > 0
`'symmetric'`2x – 1
`'symmetricismax'`Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
`'symmetriclogit'`2/(1 + ex) – 1

For a MATLAB® function or a function you define, use its function handle for the score transformation. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Example: `Mdl.ScoreTransform = 'logit'`

Data Types: `char` | `string` | `function handle`

## Object Functions

 `compact` Reduce size of machine learning model `compareHoldout` Compare accuracies of two classification models using new data `crossval` Cross-validate machine learning model `edge` Classification edge for naive Bayes classifier `incrementalLearner` Convert naive Bayes classification model to incremental learner `lime` Local interpretable model-agnostic explanations (LIME) `logp` Log unconditional probability density for naive Bayes classifier `loss` Classification loss for naive Bayes classifier `margin` Classification margins for naive Bayes classifier `partialDependence` Compute partial dependence `plotPartialDependence` Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots `predict` Classify observations using naive Bayes classifier `resubEdge` Resubstitution classification edge `resubLoss` Resubstitution classification loss `resubMargin` Resubstitution classification margin `resubPredict` Classify training data using trained classifier `shapley` Shapley values `testckfold` Compare accuracies of two classification models by repeated cross-validation

## Examples

collapse all

Create a naive Bayes classifier for Fisher's iris data set. Then, specify prior probabilities after training the classifier.

Load the f`isheriris` data set. Create `X` as a numeric matrix that contains four petal measurements for 150 irises. Create `Y` as a cell array of character vectors that contains the corresponding iris species.

```load fisheriris X = meas; Y = species;```

Train a naive Bayes classifier using the predictors `X` and class labels `Y`. `fitcnb` assumes each predictor is independent and fits each predictor using a normal distribution by default.

`Mdl = fitcnb(X,Y)`
```Mdl = ClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 DistributionNames: {'normal' 'normal' 'normal' 'normal'} DistributionParameters: {3x4 cell} Properties, Methods ```

`Mdl` is a trained `ClassificationNaiveBayes` classifier. Some of the `Mdl` properties appear in the Command Window.

Display the properties of `Mdl` using dot notation. For example, display the class names and prior probabilities.

`Mdl.ClassNames`
```ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } ```
`Mdl.Prior`
```ans = 1×3 0.3333 0.3333 0.3333 ```

The order of the class prior probabilities in `Mdl.Prior` corresponds to the order of the classes in `Mdl.ClassNames`. By default, the prior probabilities are the respective relative frequencies of the classes in the data. Alternatively, you can set the prior probabilities when calling `fitcnb` by using the '`Prior'` name-value pair argument.

Set the prior probabilities after training the classifier by using dot notation. For example, set the prior probabilities to 0.5, 0.2, and 0.3, respectively.

`Mdl.Prior = [0.5 0.2 0.3];`

You can now use this trained classifier to perform additional tasks. For example, you can label new measurements using `predict` or cross-validate the classifier using `crossval`.

Train and cross-validate a naive Bayes classifier. `fitcnb` implements 10-fold cross-validation by default. Then, estimate the cross-validated classification error.

Load the `ionosphere` data set. Remove the first two predictors for stability.

```load ionosphere X = X(:,3:end); rng('default') % for reproducibility```

Train and cross-validate a naive Bayes classifier using the predictors `X` and class labels `Y`. A recommended practice is to specify the class names. `fitcnb` assumes that each predictor is conditionally and normally distributed.

`CVMdl = fitcnb(X,Y,'ClassNames',{'b','g'},'CrossVal','on')`
```CVMdl = ClassificationPartitionedModel CrossValidatedModel: 'NaiveBayes' PredictorNames: {1x32 cell} ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods ```

`CVMdl` is a `ClassificationPartitionedModel` cross-validated, naive Bayes classifier. Alternatively, you can cross-validate a trained `ClassificationNaiveBayes` model by passing it to `crossval`.

Display the first training fold of `CVMdl` using dot notation.

`CVMdl.Trained{1}`
```ans = CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell} Properties, Methods ```

Each fold is a `CompactClassificationNaiveBayes` model trained on 90% of the data.

Full and compact naive Bayes models are not used for predicting on new data. Instead, use them to estimate the generalization error by passing `CVMdl` to `kfoldLoss`.

`genError = kfoldLoss(CVMdl)`
```genError = 0.1852 ```

On average, the generalization error is approximately 19%.

You can specify a different conditional distribution for the predictors, or tune the conditional distribution parameters to reduce the generalization error.

expand all

expand all

## References

[1] Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer Series in Statistics. New York, NY: Springer, 2009. https://doi.org/10.1007/978-0-387-84858-7.

[2] Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval, NY: Cambridge University Press, 2008.