Getting NaN when computing partialcorr (no NaNs in data)

Hi, I am using partialcorr on series of data and it sometimes results in NaNs. Why is that? I am sure I have no NaNs in my data and no missing or empty entries. Sometimes using partialcorr([x y], 'rows','complete') helps bot it does not always fix the problem. Thanks for help.

4 comentarios

Kate
Kate el 20 de Jun. de 2016
Could you provide some sample data/code? Make sure that your columns are variables and rows are observations. It could be an effect of how many variables you are partially correlating, or filtering for statistical significance (which you should definitely do). Hope that helps.
Well, it's been a year since this question was asked, but there has never been an answer. I have the same problem: I'm using partialcorr to calculate the correlation between two variables (flowering date and cumulated temperature) while controlling for two other variables at the same time.
I calculated the cumulated temperature over different periods of the year (31 different periods altogether) and want to know which period of the year explains the greatest variance in the flowering date while I already have two other variables in the model.
For 30 of the 31 different periods I used, partialcorr runs without problem, however there is one where partialcorr returns NaN.
I provided the data (64 years/observations in total) and this is the command I used:
partialcorr([flower_date,cum_temp],[Var1,Var2])
I'd be greatful for any help!
Cheers, Sarah
encountering the same issue. I wish someone helps..
dpb
dpb el 10 de Oct. de 2022
Editada: dpb el 10 de Oct. de 2022
tF=readtable(websave('Test_data.txt','https://www.mathworks.com/matlabcentral/answers/uploaded_files/125764/Test_data.txt'));
partialcorr([tF.flower_date,tF.cum_temp],[tF.Var1,tF.Var2])
ans = 2×2
1 NaN NaN NaN
fitlm(tF,'predictorVars',{'cum_temp','Var1','Var2'},'ResponseVar','flower_date','intercept',true)
Warning: Regression design matrix is rank deficient to within machine precision.
ans =
Linear regression model: flower_date ~ 1 + Var1 + Var2 + cum_temp Estimated Coefficients: Estimate SE tStat pValue ________ _________ _______ __________ (Intercept) 0 0 NaN NaN Var1 17.841 0.25253 70.647 1.8066e-59 Var2 -0.42291 0.016155 -26.178 1.5975e-34 cum_temp 0.36047 0.0049775 72.419 4.1539e-60 Number of observations: 64, Error degrees of freedom: 61 Root Mean Squared Error: 3.28 R-squared: 0.845, Adjusted R-Squared: 0.84 F-statistic vs. constant model: 167, p-value = 1.9e-25
So partialcorr isn't lying to us; let's see what's going on between the independent variables themselves...
corrcoef([tF.cum_temp,tF.Var1,tF.Var2])
ans = 3×3
1.0000 -0.9174 -0.4560 -0.9174 1.0000 0.7726 -0.4560 0.7726 1.0000
OK, none of those are identically 1 altho cum_temp is very highly correlated with Var1 and Var1,Var2 are pretty high with each other, they aren't directly correlated. So, the conclusion has to be that cum_temp is a linear combination of the other two...let's check that out next--
fitlm(tF,'predictorVars',{'Var1','Var2'},'ResponseVar','cum_temp','intercept',true)
ans =
Linear regression model: cum_temp ~ 1 + Var1 + Var2 Estimated Coefficients: Estimate SE tStat pValue ________ __ _____ ______ (Intercept) 427 0 Inf 0 Var1 -61 0 -Inf 0 Var2 1 0 Inf 0 Number of observations: 64, Error degrees of freedom: 61 R-squared: 1, Adjusted R-Squared: 1 F-statistic vs. constant model: 8.54e+29, p-value = 0
That last shows that cum_temp is identically predicted by a linear combination of Var1, Var2 leading to the given results before.
This probably means that Var1, Var2 were/are derived, not observed variables and may throw doubt on the rest of the prior analyses as well, depending on just how those corollary variables were/are defined and what it is that prevented the above result for other cases as well.

Iniciar sesión para comentar.

Respuestas (1)

Adam Danz
Adam Danz el 4 de Mayo de 2021
The same basic problem is happening with the partial correlation.
Matlab's partialcorr follows the steps explained in Wikipedia's Partial Correlation article.
When correlating variable X with variable Y while controlling for variable Z, the X variable may be predicted by Z so their residuals would be 0 or very close to 0. To prevent returning a spurious correlation, the partialcorr function detects residuals close to 0 and sets them to 0 to avoid floating point roundoff error. If you look at the equation in the wiki article, it will be clear why NaN values are returned in those cases since 0/0=NaN.
The partialcorr.m file contains valuable comments by its authors explaining this just above the lines of code that compute the correlation coefficients (r2021a).

Etiquetas

Preguntada:

el 24 de Mayo de 2016

Editada:

dpb
el 10 de Oct. de 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by