How to use Pearson correlation coefficient for feature selection?

42 views (last 30 days)
Joyjit Chatterjee on 9 Dec 2018
I am trying to use pearson correlation coefficient for feature selection in my data. I have a 21392x1974 table, with the 1974 columns as variables/features and the 21392 rows as observations. I have looked into Mathworks documentation on corrcoeff() but most of the examples are for small size data. I am quite confused how I can apply it to such a huge dataset. Also, I am not sure if Pearson Correlation Coefficient can be applied to the 1974th column of my data which has various string type labels (like Apple, Ball, Cat, etc., - Total 14 different classes of labels). My aim is to:-
1. Calculate the Pearson correlation coefficient between 7th column vs each column of my data. Thereby, 7th column will generate perfect correlation (1) as it is correlated with itself. My aim is to find how correlated all features are with the 7th column of the data. I would also like to display the column indices in the orginal data for which the Pearson Correlation Coefficient is >= 0.70.
2. I would secondly like to know if it is possible to find pearson correlation coefficient between 1974th column )labels/classes) vs each column of my data as a second scenario I would like to ascertain.
I have looked at various resources like http://matlab.izmiran.ru/help/techdoc/ref/corrcoef.html and https://uk.mathworks.com/help/matlab/ref/corrcoef.html , but am really confused as to how this can be done for my data. Any help in this regard would be really appreciated. Cheers and Thanks!

bushra raza on 9 Dec 2018
Joyjit Chatterjee on 9 Dec 2018
Hi. I looked at that but still I am not clear as to how to find the correlation based on my requirements as mentioned in the question.

Categories

Find more on Correlation and Convolution in Help Center and File Exchange

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by