# relieff

Rank importance of predictors using ReliefF or RReliefF algorithm

## Description

example

[idx,weights] = relieff(X,y,k) ranks predictors using either the ReliefF or RReliefF algorithm with k nearest neighbors. The input matrix X contains predictor variables, and the vector y contains a response vector. The function returns idx, which contains the indices of the most important predictors, and weights, which contains the weights of the predictors.

If y is numeric, relieff performs RReliefF analysis for regression by default. Otherwise, relieff performs ReliefF analysis for classification using k nearest neighbors per class. For more information on ReliefF and RReliefF, see Algorithms.

example

[idx,weights] = relieff(X,y,k,Name,Value) specifies additional options using one or more name-value pair arguments. For example, 'updates',10 sets the number of observations randomly selected for computing weights to 10.

## Examples

collapse all

Load the sample data.

Find the important predictors using 10 nearest neighbors.

[idx,weights] = relieff(meas,species,10)
idx = 1×4

4     3     1     2

weights = 1×4

0.1399    0.1226    0.3590    0.3754

idx shows the predictor numbers listed according to their ranking. The fourth predictor is the most important, and the second predictor is the least important. weights gives the weight values in the same order as the predictors. The first predictor has a weight of 0.1399, and the fourth predictor has a weight of 0.3754.

Load the sample data.

Rank the predictors based on importance using 10 nearest neighbors.

[idx,weights] = relieff(X,Y,10);

Create a bar plot of predictor importance weights.

bar(weights(idx))
xlabel('Predictor rank')
ylabel('Predictor importance weight')

Select the top 5 most important predictors. Find the columns of these predictors in X.

idx(1:5)
ans = 1×5

24     3     8     5    14

The 24th column of X is the most important predictor of Y.

Rank categorical predictors using relieff.

Load the sample data.

Convert the categorical predictor variables Mfg, Model, and Origin to numerical values, and combine them into an input matrix. Specify the response variable MPG.

X = [grp2idx(Mfg) grp2idx(Model) grp2idx(Origin)];
y = MPG;

Find the ranks and weights of the predictor variables using 10 nearest neighbors and treating the data in X as categorical.

[idx,weights] = relieff(X,y,10,'categoricalx','on')
idx = 1×3

2     3     1

weights = 1×3

-0.0019    0.0501    0.0114

The Model predictor is the most important in predicting MPG. The Mfg variable has a negative weight, indicating it is not a good predictor of MPG.

## Input Arguments

collapse all

Predictor data, specified as a numeric matrix. Each row of X corresponds to one observation, and each column corresponds to one variable.

Data Types: single | double

Response data, specified as a numeric vector, categorical vector, logical vector, character array, string array, or cell array of character vectors.

Data Types: single | double | categorical | logical | char | string | cell

Number of nearest neighbors, specified as a positive integer scalar.

Data Types: single | double

### Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: relieff(X,y,5,'method','classification','categoricalx','on') specifies 5 nearest neighbors and treats the response variable and predictor data as categorical.

Method for computing weights, specified as the comma-separated pair consisting of 'method' and either 'regression' or 'classification'. If y is numeric, 'regression' is the default method. Otherwise, 'classification' is the default.

Example: 'method','classification'

Prior probabilities for each class, specified as the comma-separated pair consisting of 'prior' and a value in this table.

ValueDescription
'empirical'The class probabilities are determined from class frequencies in y.
'uniform'All class probabilities are equal.
numeric vectorOne value exists for each distinct group name.
structure

A structure S with two fields:

• S.group contains the group names as a variable of the same type as y.

• S.prob contains a vector of corresponding probabilities.

Example: 'prior','uniform'

Data Types: single | double | char | string | struct

Number of observations to select at random for computing weights, specified as the comma-separated pair consisting of 'updates' and either 'all' or a positive integer scalar. By default, relieff uses all observations.

Data Types: single | double | char | string

Categorical predictors flag, specified as the comma-separated pair consisting of 'categoricalx' and either 'on' or 'off'. If you specify 'on', then relieff treats all predictors in X as categorical. Otherwise, it treats all predictors in X as numeric. You cannot mix numeric and categorical predictors.

Example: 'categoricalx','on'

Distance scaling factor, specified as the comma-separated pair consisting of 'sigma' and a numeric positive scalar. For observation i, influence on the predictor weight from its nearest neighbor j is multiplied by ${e}^{-{\left(\text{rank}\left(i,j\right)/\text{sigma)}}^{2}}$. rank(i,j) is the position of the jth observation among the nearest neighbors of the ith observation, sorted by distance. The default is Inf for classification (all nearest neighbors have the same influence) and 50 for regression.

Example: 'sigma',20

Data Types: single | double

## Output Arguments

collapse all

Indices of predictors in X ordered by predictor importance, returned as a numeric vector. For example, if idx(3) is 5, then the third most important predictor is the fifth column in X.

Data Types: double

Weights of the predictors, returned as a numeric vector. The values in weights have the same order as the predictors in X. weights range from –1 to 1, with large positive weights assigned to important predictors.

Data Types: double

## Tips

• Predictor ranks and weights usually depend on k. If you set k to 1, then the estimates can be unreliable for noisy data. If you set k to a value comparable with the number of observations (rows) in X, relieff can fail to find important predictors. You can start with k = 10 and investigate the stability and reliability of relieff ranks and weights for various values of k.

• relieff removes observations with NaN values.

## Algorithms

collapse all

### ReliefF

ReliefF finds the weights of predictors in the case where y is a multiclass categorical variable. The algorithm penalizes the predictors that give different values to neighbors of the same class, and rewards predictors that give different values to neighbors of different classes.

ReliefF first sets all predictor weights Wj to 0. Then, the algorithm iteratively selects a random observation xr, finds the k-nearest observations to xr for each class, and updates, for each nearest neighbor xq, all the weights for the predictors Fj as follows:

If xr and xq are in the same class,

${W}_{j}{}^{i}={W}_{j}{}^{i-1}-\frac{{\Delta }_{j}\left({x}_{r},{x}_{q}\right)}{m}\cdot {d}_{rq}.$

If xr and xq are in different classes,

${W}_{j}{}^{i}={W}_{j}{}^{i-1}+\frac{{p}_{{y}_{q}}}{1-{p}_{{y}_{r}}}\cdot \frac{{\Delta }_{j}\left({x}_{r},{x}_{q}\right)}{m}\cdot {d}_{rq}.$

• Wji is the weight of the predictor Fj at the ith iteration step.

• pyr is the prior probability of the class to which xr belongs, and pyq is the prior probability of the class to which xq belongs.

• m is the number of iterations specified by 'updates'.

• ${\Delta }_{j}\left({x}_{r},{x}_{q}\right)$ is the difference in the value of the predictor Fj between observations xr and xq. Let xrj denote the value of the jth predictor for observation xr, and let xqj denote the value of the jth predictor for observation xq.

• For discrete Fj,

${\Delta }_{j}\left({x}_{r},{x}_{q}\right)=\left\{\begin{array}{cc}0,& {x}_{rj}={x}_{qj}\\ 1,& {x}_{rj}\ne {x}_{qj}\end{array}.$

• For continuous Fj,

${\Delta }_{j}\left({x}_{r},{x}_{q}\right)=\frac{|{x}_{rj}-{x}_{qj}|}{\text{max}\left({F}_{j}\right)-\text{min}\left({F}_{j}\right)}.$

• drq is a distance function of the form

${d}_{rq}=\frac{{\stackrel{˜}{d}}_{rq}}{\sum _{l=1}^{k}{\stackrel{˜}{d}}_{rl}}.$

The distance is subject to the scaling

${\stackrel{˜}{d}}_{rq}={e}^{-{\left(\text{rank}\left(r,q\right)/\text{sigma)}}^{2}}$

where rank(r,q) is the position of the qth observation among the nearest neighbors of the rth observation, sorted by distance. k is the number of nearest neighbors, specified by k. You can change the scaling by specifying 'sigma'.

### RReliefF

RReliefF works with continuous y. Similar to ReliefF, RReliefF also penalizes the predictors that give different values to neighbors with the same response values, and rewards predictors that give different values to neighbors with different response values. However, RReliefF uses intermediate weights to compute the final predictor weights.

Given two nearest neighbors, assume the following:

• Wdy is the weight of having different values for the response y.

• Wdj is the weight of having different values for the predictor Fj.

• ${W}_{dy\wedge dj}$ is the weight of having different response values and different values for the predictor Fj.

RReliefF first sets the weights Wdy, Wdj, ${W}_{dy\wedge dj}$, and Wj equal to 0. Then, the algorithm iteratively selects a random observation xr, finds the k-nearest observations to xr, and updates, for each nearest neighbor xq, all the intermediate weights as follows:

${W}_{dy}{}^{i}={W}_{dy}{}^{i-1}+{\Delta }_{y}\left({x}_{r},{x}_{q}\right)\cdot {d}_{rq}.$

${W}_{dj}{}^{i}={W}_{dj}{}^{i-1}+{\Delta }_{j}\left({x}_{r},{x}_{q}\right)\cdot {d}_{rq}.$

${W}_{dy\wedge dj}{}^{i}={W}_{dy\wedge dj}{}^{i-1}+{\Delta }_{y}\left({x}_{r},{x}_{q}\right)\cdot {\Delta }_{j}\left({x}_{r},{x}_{q}\right)\cdot {d}_{rq}.$

• The i and i-1 superscripts denote the iteration step number. m is the number of iterations specified by 'updates'.

• ${\Delta }_{y}\left({x}_{r},{x}_{q}\right)$ is the difference in the value of the continuous response y between observations xr and xq. Let yr denote the value of the response for observation xr, and let yq denote the value of the response for observation xq.

${\Delta }_{y}\left({x}_{r},{x}_{q}\right)=\frac{|{y}_{r}-{y}_{q}|}{\text{max}\left(y\right)-\text{min}\left(y\right)}.$

• The ${\Delta }_{j}\left({x}_{r},{x}_{q}\right)$ and drq functions are the same as for ReliefF.

RReliefF calculates the predictor weights Wj after fully updating all the intermediate weights.

${W}_{j}=\frac{{W}_{dy\wedge dj}}{{W}_{dy}}-\frac{{W}_{dj}-{W}_{dy\wedge dj}}{m-{W}_{dy}}.$

## References

[1] Kononenko, I., E. Simec, and M. Robnik-Sikonja. (1997). “Overcoming the myopia of inductive learning algorithms with RELIEFF.” Retrieved from CiteSeerX: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.4740

[2] Robnik-Sikonja, M., and I. Kononenko. (1997). “An adaptation of Relief for attribute estimation in regression.” Retrieved from CiteSeerX: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.8381

[3] Robnik-Sikonja, M., and I. Kononenko. (2003). “Theoretical and empirical analysis of ReliefF and RReliefF.” Machine Learning, 53, 23–69.