Rank features for unsupervised learning using Laplacian scores
Load the sample data.
Rank the features based on importance.
[idx,scores] = fsulaplacian(X);
Create a bar plot of the feature importance scores.
bar(scores(idx)) xlabel('Feature rank') ylabel('Feature importance score')
Select the top five most important features. Find the columns of these features in
ans = 1×5 15 13 17 21 19
The 15th column of
X is the most important feature.
Compute a similarity matrix from Fisher's iris data set and rank the features using the similarity matrix.
Load Fisher's iris data set.
D = pdist(meas); Z = squareform(D);
Construct the similarity matrix and confirm that it is symmetric.
S = exp(-Z.^2); issymmetric(S)
ans = logical 1
Rank the features.
idx = fsulaplacian(meas,'Similarity',S)
idx = 1×4 3 4 1 2
Ranking using the similarity matrix
S is the same as ranking by specifying
idx2 = fsulaplacian(meas,'NumNeighbors',size(meas,1))
idx2 = 1×4 3 4 1 2
X— Input data
Input data, specified as an n-by-p numeric
matrix. The rows of
X correspond to observations (or points), and the
columns correspond to features.
The software treats
X as missing
data and ignores any row of
X containing at least one
comma-separated pairs of
the argument name and
Value is the corresponding value.
Name must appear inside quotes. You can specify several name and value
pair arguments in any order as
'NumNeighbors',10,'KernelScale','auto'specifies the number of nearest neighbors as 10 and the kernel scale factor as
'Similarity'— Similarity matrix
(empty matrix) (default) | symmetric matrix
Similarity matrix, specified as the comma-separated pair consisting of
'Similarity' and an
n-by-n symmetric matrix, where
n is the number of observations. The similarity matrix (or
adjacency matrix) represents the input data by modeling local neighborhood
relationships among the data points. The values in a similarity matrix represent the
edges (or connections) between nodes (data points) that are connected in a similarity graph. For more information,
see Similarity Matrix.
If you specify the
'Similarity' value, then you cannot
specify any other name-value pair argument. If you do not specify the
'Similarity' value, then the software computes a similarity
matrix using the options specified by the other name-value pair arguments.
'Distance'— Distance metric
Distance metric, specified as the comma-separated pair consisting of
'Distance' and a character vector, string scalar, or function
handle, as described in this table.
Euclidean distance (default)
Standardized Euclidean distance. Each coordinate difference between
observations is scaled by dividing by the corresponding element of the
standard deviation computed from
Mahalanobis distance using the sample covariance of
City block distance
Minkowski distance. The default exponent is 2. Use the
Chebychev distance (maximum coordinate difference)
One minus the cosine of the included angle between observations (treated as vectors)
One minus the sample correlation between observations (treated as sequences of values)
Hamming distance, which is the percentage of coordinates that differ
One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ
One minus the sample Spearman's rank correlation between observations (treated as sequences of values)
Custom distance function handle. A distance function has the form
function D2 = distfun(ZI,ZJ) % calculation of distance ...
If your data is not sparse, you can generally compute distance more quickly by using a built-in distance instead of a function handle.
For more information, see Distance Metrics.
When you use the
'mahalanobis' distance metric, you can specify the additional
name-value pair argument
'Cov', respectively, to control the distance metrics.
'Distance','minkowski','P',3 specifies to use the
Minkowski distance metric with an exponent of
'NumNeighbors'— Number of nearest neighbors
log(size(X,1))(default) | positive integer
Number of nearest neighbors used to construct the similarity
graph, specified as the comma-separated pair consisting of
and a positive integer.
'KernelScale'— Scale factor
'auto'| positive scalar
Scale factor for the kernel, specified as the comma-separated pair consisting of
'auto' or a positive scalar. The software uses the scale factor to transform distances to similarity measures. For more information, see Similarity Graph.
'auto' option is supported only for the
'seuclidean' distance metrics.
If you specify
'auto', then the software selects an appropriate
scale factor using a heuristic procedure. This heuristic procedure uses subsampling,
so estimates can vary from one call to another. To reproduce results, set a random
number seed using
rng before calling
idx— Indices of features ordered by feature importance
Indices of the features in
X ordered by feature importance,
returned as a numeric vector. For example, if
5, then the third most important feature is the fifth column in
scores— Feature scores
Feature scores, returned as a numeric vector. A large score value in
scores indicates that the corresponding feature is important. The
scores have the same order as the features in
fsulaplacian computes the values in
scores as follows:
For each data point in
X, define a local neighborhood using
the nearest neighbor method, and find pairwise distances for all points i and j in the
Convert the distances to the similarity matrix
S using the kernel transformation , where σ is the scale factor for the kernel as
specified by the
'KernelScale' name-value pair argument.
Center each feature by removing its mean.
where xr is the rth feature, Dg is the degree matrix, and .
Compute the score sr for each feature.
Note that  defines the Laplacian score as
where L is the Laplacian matrix, defined as
the difference between Dg and
fsulaplacian function uses only the second
term of this equation for the score value of
scores so that a large
score value indicates an important feature.
Selecting features using the Laplacian score is consistent with minimizing the value
where xir represents the ith observation of the rth feature. Minimizing this value implies that the algorithm prefers features with large variance. Also, the algorithm assumes that two data points of an important feature are close if and only if the similarity graph has an edge between the two data points.
 He, X., D. Cai, and P. Niyogi. "Laplacian Score for Feature Selection." NIPS Proceedings. 2005.