Main Content

bhattacharyyaDistance

One-dimensional Bhattacharyya distance between two independent data groups to measure class separability

Description

bhattacharyyaDistance is a function used in code generated by Diagnostic Feature Designer.

Z = bhattacharyyaDistance(X,I) calculates the one-dimensional Bhattacharyya distances between two independent subsets of data set X that are grouped according to the logical labels in I. The Bhattacharyya distance provides a metric for ranking features according to their ability to separate two classes of data, such as data from healthy and faulty machines. The distance calculation assumes that the data in X follows a Gaussian distribution.

Code that is generated by Diagnostic Feature Designer uses bhattacharyyaDistance when ranking features with this method.

Input Arguments

collapse all

Data set containing data samples that can be logically classified into two groups, specified as a vector when you have a single set of samples, such as values for one feature, and a matrix when you have multiple sets of samples.

  • When X contains a single set of n features, such as a set of multiple features extracted from a single data source, X is a 1-by-n vector.

  • When X contains m sets of n features, X is an m-by-n matrix. Each row in X represents one data source and must correspond to a single logical class.

X must contain at least two rows that correspond to the logical class in I of 0 and two rows that correspond to the label 1 to calculate legitimate Bhattacharyya distance values.

For example, suppose that you have a set of five features for each of 20 gearboxes and you are computing the Bhattacharyya distances to assess these features. X is a 20-by-5 matrix. Each row represents a gearbox that is either healthy or faulty, as indicated by the associated logical class label of 0 or 1. At least two gearboxes must be healthy and at least two gearboxes must be faulty. The Bhattacharyya distance indicates how well each feature separates the data for the healthy gearboxes from the data for the faulty gearboxes.

Logical classification labels that assign the rows in X to one of two logical classes, specified as a vector of length m, where m is the number of rows in X.

For example, suppose once more that X is a 20-by-5 matrix corresponding to 20 gearboxes. The first 9 gearboxes are healthy. The remaining 11 gearboxes are faulty. Define the healthy state as 0 and the faulty state as 1. Then I has a length of 20. The first 9 labels in I are equal to 0 and the remaining 11 labels are equal to 1.

Output Arguments

collapse all

Bhattacharyya distances between labeled groups, returned as a scalar or a vector of length n.

  • If X is a vector, then Z is a scalar.

  • If X is a matrix, then bhattacharyyaDistance calculates the distance separately for each feature. Z is then a vector of length n, where n is the number of columns in Z.

bhattacharyyaDistance treats NaN entries in X as missing values and ignores them.

References

[1] Theodoridis, Sergios, and Konstantinos Koutroumbas. Pattern Recognition, 177–179. 2nd ed. Amsterdam; Boston: Academic Press, 2003.

Introduced in R2020a