surrogateAssociation

Mean predictive measure of association for surrogate splits in classification tree

Syntax

ma = surrogateAssociation(tree)
ma = surrogateAssociation(tree,N)

Description

ma = surrogateAssociation(tree) returns a matrix of predictive measures of association for the predictors in tree.

ma = surrogateAssociation(tree,N) returns a matrix of predictive measures of association averaged over the nodes in vector N.

Input Arguments

 tree A classification tree constructed with fitctree, or a compact regression tree constructed with compact. N Vector of node numbers in tree.

Output Arguments

 ma ma = surrogateAssociation(tree) returns a P-by-P matrix, where P is the number of predictors in tree. ma(i,j) is the predictive measure of association between the optimal split on variable i and a surrogate split on variable j. For more details, see Algorithms.ma = surrogateAssociation(tree,N) returns a P-by-P representing the predictive measure of association between variables averaged over nodes in the vector N. N contains node numbers from 1 to max(tree.NumNodes).

Examples

expand all

Grow a classification tree using species as the response. Specify to use surrogate splits for missing values.

tree = fitctree(meas,species,'surrogate','on');

Find the mean predictive measure of association between the predictor variables.

ma = surrogateAssociation(tree)
ma = 4×4

1.0000         0         0         0
0    1.0000         0         0
0.4633    0.2500    1.0000    0.5000
0.2065    0.1413    0.4022    1.0000

Find the mean predictive measure of association averaged over the odd-numbered nodes in tree.

N = 1:2:tree.NumNodes;
ma = surrogateAssociation(tree,N)
ma = 4×4

1.0000         0         0         0
0    1.0000         0         0
0.7600    0.5000    1.0000    1.0000
0.4130    0.2826    0.8043    1.0000

expand all

Algorithms

Element ma(i,j) is the predictive measure of association averaged over surrogate splits on predictor j for which predictor i is the optimal split predictor. This average is computed by summing positive values of the predictive measure of association over optimal splits on predictor i and surrogate splits on predictor j and dividing by the total number of optimal splits on predictor i, including splits for which the predictive measure of association between predictors i and j is negative.