templateKNN
k-nearest neighbor classifier template
Description
returns
a k-nearest neighbor (KNN) learner template suitable
for training ensembles or error-correcting output code (ECOC) multiclass
models.t
= templateKNN()
If you specify a default template, then the software uses default values for all input arguments during training.
Specify t
as a learner in fitcensemble
or fitcecoc
.
creates a template with additional options specified by one or more name-value
arguments.t
= templateKNN(Name=Value
)
For example, you can specify the nearest neighbor search method, the number of nearest neighbors to find, or the distance metric.
If you display t
in the Command Window, then all options appear empty
([]
), except those that you specify using name-value
arguments. During training, the software uses default values for empty
options.
Examples
Create a nondefault k-nearest neighbor template for use in fitcensemble
.
Load Fisher's iris data set.
load fisheriris
Create a template for a 5-nearest neighbor search, and specify to standardize the predictors.
t = templateKNN(NumNeighbors=5,Standardize=true)
t = Fit template for classification KNN. NumNeighbors: 5 NSMethod: '' Distance: '' BucketSize: [] IncludeTies: [] DistanceWeight: [] BreakTies: '' Exponent: [] Cov: [] Scale: [] StandardizeData: 1 CacheSize: 1000 Version: 1 Method: 'KNN' Type: 'classification'
All properties of the template object are empty except for NumNeighbors
, Method
, StandardizeData
, and Type
. When you specify t
as a learner, the software fills in the empty properties with their respective default values.
Specify t
as a weak learner for a classification ensemble.
Mdl = fitcensemble(meas,species, ... Method="Subspace",Learners=t);
Display the in-sample (resubstitution) misclassification error.
L = resubLoss(Mdl)
L = 0.0600
Create a nondefault k-nearest neighbor template for use in fitcecoc
.
Load Fisher's iris data set.
load fisheriris
Create a template for a 5-nearest neighbor search, and specify to standardize the predictors.
t = templateKNN(NumNeighbors=5,Standardize=true)
t = Fit template for classification KNN. NumNeighbors: 5 NSMethod: '' Distance: '' BucketSize: [] IncludeTies: [] DistanceWeight: [] BreakTies: '' Exponent: [] Cov: [] Scale: [] StandardizeData: 1 CacheSize: 1000 Version: 1 Method: 'KNN' Type: 'classification'
All properties of the template object are empty except for NumNeighbors
, Method
, StandardizeData
, and Type
. When you specify t
as a learner, the software fills in the empty properties with their respective default values.
Specify t
as a binary learner for an ECOC multiclass model.
Mdl = fitcecoc(meas,species,Learners=t);
By default, the software trains Mdl
using the one-versus-one coding design.
Display the in-sample (resubstitution) misclassification error.
L = resubLoss(Mdl,LossFun="classiferror")
L = 0.0467
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: templateKNN(NumNeighbors=4,Distance="minkowski")
specifies a
4-nearest neighbor classifier template using the Minkowski distance
measure.
Tie-breaking algorithm used by the predict
method if multiple classes have the
same smallest cost, specified as one of the following:
"smallest"
— Use the smallest index among tied groups."nearest"
— Use the class with the nearest neighbor among tied groups."random"
— Use a random tiebreaker among tied groups.
By default, ties occur when multiple classes have the same number of nearest points among the k nearest neighbors.
Example: BreakTies="nearest"
Maximum number of data points in the leaf node of the Kd-tree, specified
as a positive integer value. This argument is meaningful only when
NSMethod
is "kdtree"
.
Example: BucketSize=40
Data Types: single
| double
Covariance matrix, specified as a positive definite matrix of scalar values representing the
covariance matrix when computing the Mahalanobis distance. This argument is only valid
when Distance
is "mahalanobis"
.
You cannot simultaneously specify Standardize
and either of
Scale
or Cov
.
Data Types: single
| double
Distance metric, specified as a valid distance metric name or function
handle. The allowable distance metric names depend on your choice of a
neighbor-searcher method (see NSMethod
).
NSMethod
Value | Distance Metric Names |
---|---|
"exhaustive" | Any distance metric of ExhaustiveSearcher |
"kdtree" | "cityblock" ,
"chebychev" ,
"euclidean" , or
"minkowski" |
This table includes valid distance metrics of ExhaustiveSearcher
.
Distance Metric Names | Description |
---|---|
"cityblock" | City block distance. |
"chebychev" | Chebychev distance (maximum coordinate difference). |
"correlation" | One minus the sample linear correlation between observations (treated as sequences of values). |
"cosine" | One minus the cosine of the included angle between observations (treated as vectors). |
"euclidean" | Euclidean distance. |
"hamming" | Hamming distance, percentage of coordinates that differ. |
"jaccard" | One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ. |
"mahalanobis" | Mahalanobis distance, computed using a positive
definite covariance matrix C . The
default value of C is the sample
covariance matrix of X , as computed
by cov(X,"omitrows") . To specify a
different value for C , use the
Cov name-value
argument. |
"minkowski" | Minkowski distance. The default exponent is
2 . To specify a different
exponent, use the Exponent
name-value argument. |
"seuclidean" | Standardized Euclidean distance. Each coordinate
difference between X and a query
point is scaled, meaning divided by a scale value
S . The default value of
S is the standard deviation
computed from X ,
S = std(X,"omitnan") .
To specify another value for S , use
the Scale name-value
argument. |
"spearman" | One minus the sample Spearman's rank correlation between observations (treated as sequences of values). |
@ | Distance function handle.
function D2 = distfun(ZI,ZJ) % calculation of distance ...
|
When you call fitcensemble
or fitcecoc
, if you specify
Learners
as a templateKNN
object
and CategoricalPredictors
as
"all"
, then the default distance metric is
"hamming"
. Otherwise, the default distance metric
is "euclidean"
.
Change Distance
using dot notation:
mdl.Distance = newDistance
.
If NSMethod
is "kdtree"
, you
can use dot notation to change Distance
only for
the metrics "cityblock"
,
"chebychev"
, "euclidean"
, and
"minkowski"
.
For definitions, see Distance Metrics.
Example: Distance="minkowski"
Data Types: char
| string
| function_handle
Distance weighting function, specified as a function handle or one of the values in this table.
Value | Description |
---|---|
"equal" | No weighting |
"inverse" | Weight is 1/distance |
"squaredinverse" | Weight is 1/distance2 |
@ | fcn is a function that accepts a matrix of nonnegative distances,
and returns a matrix the same size containing nonnegative distance
weights. For example, "squaredinverse" is equivalent
to @(d)d.^(-2) . |
Example: DistanceWeight="inverse"
Data Types: char
| string
| function_handle
Minkowski distance exponent, specified as a positive scalar value. This argument is only valid
when Distance
is "minkowski"
.
Example: Exponent=3
Data Types: single
| double
Tie inclusion flag, specified as a logical value indicating whether predict
includes all the neighbors whose distance values are equal to the
kth smallest distance. If IncludeTies
is
true
, predict
includes all these neighbors.
Otherwise, predict
uses exactly k
neighbors.
Example: IncludeTies=true
Data Types: logical
Nearest neighbor search method, specified as "kdtree"
or
"exhaustive"
.
"kdtree"
— Creates and uses a Kd-tree to find nearest neighbors."kdtree"
is valid when the distance metric is one of the following:"euclidean"
"cityblock"
"minkowski"
"chebychev"
"exhaustive"
— Uses the exhaustive search algorithm. When predicting the class of a new pointxnew
, the software computes the distance values from all points inX
toxnew
to find nearest neighbors.
The default is "kdtree"
when X
has
10
or fewer columns, X
is not sparse or a
gpuArray
, and the distance metric is a "kdtree"
type; otherwise, "exhaustive"
.
Example: NSMethod="exhaustive"
Number of nearest neighbors in X
to find for classifying each point when
predicting, specified as a positive integer value.
Example: NumNeighbors=3
Data Types: single
| double
Distance scale, specified as a vector containing nonnegative scalar values with length equal
to the number of columns in X
. Each coordinate difference between
X
and a query point is scaled by the corresponding element of
Scale
. This argument is only valid when
Distance
is "seuclidean"
.
You cannot simultaneously specify Standardize
and either of
Scale
or Cov
.
Data Types: single
| double
Flag to standardize the predictors, specified as true
(1
) or false
(0)
.
If you set Standardize=true
, then the software centers and scales each
column of the predictor data (X
) by the column mean and standard
deviation, respectively.
The software does not standardize categorical predictors, and throws an error if all predictors are categorical.
You cannot simultaneously specify Standardize=true
and either of
Scale
or Cov
.
It is good practice to standardize the predictor data.
Example: Standardize=true
Data Types: logical
Output Arguments
kNN classification template suitable for training ensembles or
error-correcting output code (ECOC) multiclass models, returned as a
template object. Pass t
to fitcensemble
or fitcecoc
to specify how to
create the KNN classifier for the ensemble or ECOC model,
respectively.
If you display t
to the Command Window, then
all, unspecified options appear empty ([]
). However,
the software replaces empty options with their corresponding default
values during training.
Version History
Introduced in R2014a
See Also
ClassificationKNN
| ExhaustiveSearcher
| fitcensemble
| fitcecoc
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Seleccione un país/idioma
Seleccione un país/idioma para obtener contenido traducido, si está disponible, y ver eventos y ofertas de productos y servicios locales. Según su ubicación geográfica, recomendamos que seleccione: .
También puede seleccionar uno de estos países/idiomas:
Cómo obtener el mejor rendimiento
Seleccione China (en idioma chino o inglés) para obtener el mejor rendimiento. Los sitios web de otros países no están optimizados para ser accedidos desde su ubicación geográfica.
América
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)