KDTreeSearcher

Create Kd-tree nearest neighbor searcher

Description

KDTreeSearcher model objects store the results of a nearest neighbor search that uses the Kd-tree algorithm. Results include the training data, distance metric and its parameters, and maximum number of data points in each leaf node (that is, the bucket size). The Kd-tree algorithm partitions an n-by-K data set by recursively splitting n points in K-dimensional space into a binary tree.

Once you create a KDTreeSearcher model object, you can search the stored tree to find all neighboring points to the query data by performing a nearest neighbor search using knnsearch or a radius search using rangesearch. The Kd-tree algorithm is more efficient than the exhaustive search algorithm when K is small (that is, K ≤ 10), the training and query sets are not sparse, and the training and query sets have many observations.

Creation

Use either the createns function or the KDTreeSearcher function (described here) to create a KDTreeSearcher model object. Both functions use the same syntax except that the createns function has the 'NSMethod' name-value pair argument, which you use to choose the nearest neighbor search method. The createns function also creates an ExhaustiveSearcher object. Specify 'NSMethod','kdtree' to create a KDTreeSearcher object. The default is 'kdtree' if K ≤ 10, the training data is not sparse, and the distance metric is Euclidean, city block, Chebychev, or Minkowski.

Description

example

Mdl = KDTreeSearcher(X) grows a default Kd-tree (Mdl) using the n-by-K numeric matrix of training data (X).

example

Mdl = KDTreeSearcher(X,Name,Value) specifies additional options using one or more name-value pair arguments. You can specify the maximum number of data points in each leaf node (that is, the bucket size) and the distance metric, and set the distance metric parameter (DistParameter) property. For example, KDTreeSearcher(X,'Distance','minkowski','BucketSize',10) specifies to use the Minkowski distance when searching for nearest neighbors and to use 10 for the bucket size. To specify DistParameter, use the P name-value pair argument.

Input Arguments

expand all

Training data that grows the Kd-tree, specified as a numeric matrix. X has n rows, each corresponding to an observation (that is, an instance or example), and K columns, each corresponding to a predictor (that is, a feature).

Data Types: single | double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Distance','minkowski','P',3,'BucketSize',10 specifies to use the following when searching for nearest neighbors: the Minkowski distance, 3 for the Minkowski distance metric exponent, and 10 for the bucket size.

Distance metric used when you call knnsearch or rangesearch to find nearest neighbors for future query points, specified as the comma-separated pair consisting of 'Distance' and one of these values.

ValueDescription
'chebychev'Chebychev distance (maximum coordinate difference).
'cityblock'City block distance.
'euclidean'Euclidean distance.
'minkowski'Minkowski distance. The default exponent is 2. To specify a different exponent, use the 'P' name-value pair argument.

For more details, see Distance Metrics.

The software does not use the distance metric for creating a KDTreeSearcher model object, so you can alter the distance metric by using dot notation after creating the object.

Example: 'Distance','minkowski'

Exponent for the Minkowski distance metric, specified as the comma-separated pair consisting of 'P' and a positive scalar. This argument is valid only if 'Distance' is 'minkowski'.

Example: 'P',3

Data Types: single | double

Maximum number of data points in each leaf node of the Kd-tree, specified as the comma-separated pair consisting of 'BucketSize' and a positive integer.

Example: 'BucketSize',10

Data Types: single | double

Properties

expand all

Training data that grows the Kd-tree, specified as a numeric matrix. X has n rows, each corresponding to an observation (that is, an instance or example), and K columns, each corresponding to a predictor (that is, a feature).

The input argument X of createns or KDTreeSearcher sets this property.

Data Types: single | double

Distance metric used when you call knnsearch or rangesearch to find nearest neighbors for future query points, specified as 'chebychev', 'cityblock', 'euclidean', or 'minkowski'.

The 'Distance' name-value pair argument of createns or KDTreeSearcher sets this property.

The software does not use the distance metric for creating a KDTreeSearcher model object, so you can alter it by using dot notation.

Distance metric parameter values, specified as empty ([]) or a positive scalar.

If Distance is 'minkowski', then DistParameter is the exponent in the Minkowski distance formula. Otherwise, DistParameter is [], indicating that the specified distance metric formula has no parameters.

The 'P' name-value pair argument of createns or KDTreeSearcher sets this property.

You can alter DistParameter by using dot notation, for example, Mdl.DistParameter = PNew, where PNew is a positive scalar.

Data Types: single | double

Maximum number of data points in each leaf node of the Kd-tree, specified as a positive integer.

The 'BucketSize' name-value pair argument of createns or KDTreeSearcher sets this property.

Data Types: single | double

Object Functions

 knnsearch Find k-nearest neighbors using searcher object rangesearch Find all neighbors within specified distance using searcher object

Examples

collapse all

Grow a four-dimensional Kd-tree that uses the Euclidean distance.

X = meas;
[n,k] = size(X)
n = 150
k = 4

X has 150 observations and 4 predictors.

Grow a four-dimensional Kd-tree using the entire data set as training data.

Mdl1 = KDTreeSearcher(X)
Mdl1 =
KDTreeSearcher with properties:

BucketSize: 50
Distance: 'euclidean'
DistParameter: []
X: [150x4 double]

Mdl1 is a KDTreeSearcher model object, and its properties appear in the Command Window. The object contains information about the grown four-dimensional Kd-tree, such as the distance metric. You can alter property values using dot notation.

Alternatively, you can grow a Kd-tree by using createns.

Mdl2 = createns(X)
Mdl2 =
KDTreeSearcher with properties:

BucketSize: 50
Distance: 'euclidean'
DistParameter: []
X: [150x4 double]

Mdl2 is also a KDTreeSearcher model object, and it is equivalent to Mdl1. Because X has four columns and the default distance metric is Euclidean, createns creates a KDTreeSearcher model by default.

To find the nearest neighbors in X to a batch of query data, pass the KDTreeSearcher model object and the query data to knnsearch or rangesearch.

Load Fisher's iris data. Focus on the petal dimensions.

X = meas(:,[3 4]); % Predictors

Grow a two-dimensional Kd-tree using createns and the training data. Specify the Minkowski distance metric.

Mdl = createns(X,'Distance','Minkowski')
Mdl =
KDTreeSearcher with properties:

BucketSize: 50
Distance: 'minkowski'
DistParameter: 2
X: [150x2 double]

Because X has two columns and the distance metric is Minkowski, createns creates a KDTreeSearcher model object by default.

Access properties of Mdl by using dot notation. For example, use Mdl.DistParameter to access the Minkowski distance exponent.

Mdl.DistParameter
ans = 2

You can pass query data and Mdl to:

• knnsearch to find indices and distances of nearest neighbors

• rangesearch to find indices of all nearest neighbors within a distance that you specify

Create a KDTreeSearcher model object and alter the Distance property by using dot notation.

X = meas;

Grow a default four-dimensional Kd-tree using the entire data set as training data.

Mdl = KDTreeSearcher(X)
Mdl =
KDTreeSearcher with properties:

BucketSize: 50
Distance: 'euclidean'
DistParameter: []
X: [150x4 double]

Specify that the neighbor searcher use the Minkowski metric to compute the distances between the training and query data.

Mdl.Distance = 'minkowski'
Mdl =
KDTreeSearcher with properties:

BucketSize: 50
Distance: 'minkowski'
DistParameter: 2
X: [150x4 double]

You can pass Mdl and the query data to either knnsearch or rangesearch to find the nearest neighbors to the points in the query data based on the Minkowski distance.

Grow a Kd-tree nearest neighbor searcher object by using the createns function. Pass the object and query data to the knnsearch function to find k-nearest neighbors.

Remove five irises randomly from the predictor data to use as a query set.

rng(1);                     % For reproducibility
n = size(meas,1);           % Sample size
qIdx = randsample(n,5);     % Indices of query data
tIdx = ~ismember(1:n,qIdx); % Indices of training data
Q = meas(qIdx,:);
X = meas(tIdx,:);

Grow a four-dimensional Kd-tree using the training data. Specify the Minkowski distance for finding nearest neighbors.

Mdl = createns(X,'Distance','minkowski')
Mdl =
KDTreeSearcher with properties:

BucketSize: 50
Distance: 'minkowski'
DistParameter: 2
X: [145x4 double]

Because X has four columns and the distance metric is Minkowski, createns creates a KDTreeSearcher model object by default. The Minkowski distance exponent is 2 by default.

Find the indices of the training data (Mdl.X) that are the two nearest neighbors of each point in the query data (Q).

IdxNN = knnsearch(Mdl,Q,'K',2)
IdxNN = 5×2

17     4
6     2
1    12
89    66
124   100

Each row of IdxNN corresponds to a query data observation, and the column order corresponds to the order of the nearest neighbors, with respect to ascending distance. For example, based on the Minkowski distance, the second nearest neighbor of Q(3,:) is X(12,:).