Correlation and regression between matrixes with NaN values

Hello!
I want to calcolate the regression and correlation coefficent between two matrixes (temperature and sea level pressure), having the same dimension 241 x 81 but containg some NaN values.
The final goal is to have a two dimensions matrix that I can plot (see attached image), i.e. for every point in the map I have a value for my correlations and regression coefficents
Thank you a lot!

3 comentarios

Hi Carlotta,
Is it possible to share some data to help you better?
MarKf
MarKf el 5 de Sept. de 2023
Editada: MarKf el 5 de Sept. de 2023
Maybe there is no need for data. If the 2 matrices are 2D and have the same dimensions then they can be correlated even if they have NaNs. However you wouldn't obtain 241 x 81 values, the result would not be a matrix of the same size. Unless you cross-correlate, the correlations would give you a vector (either 81 or 241 -depending on how you correlate- rhos or R_squares -depending on what kinda correlation- or less -depending on missing values and what you decide to do with those). Cross-correlating will not give you a matrix corresponding to the same 2D locations, so I'm guessing that's not what you want. So maybe having the data can help us understand.
As for the NaNs, you have a few options, like using 'rows','complete' name-value pair to ignore rows with NaN values, which is likely what you need ( R = corr(A,B, 'rows','complete') ).

Iniciar sesión para comentar.

Respuestas (2)

I see, "array1" has some islands of values in a sea of NaNs.
ar1 = load(websave('rd', "https://nl.mathworks.com/matlabcentral/answers/uploaded_files/1473551/array1.mat"));
ar2 = load(websave('rd', "https://nl.mathworks.com/matlabcentral/answers/uploaded_files/1473556/array2.mat"));
a1 = ar1.array3; a2 = ar2.d;
ar1_0s = a1; ar1_0s(isnan(ar1_0s)) = 0; imagesc((ar1_0s)*10^2+a2); %here to visualize what I mean
So you have only sum(sum(~isnan(a1))) = 1719 non-NaNs values to correlate. You cannot do a map with 2D locations of those islands as I mention in the comment above, unless you have a couple of vectors for each of those locations you want to correlate. I just thought that you could also normxcorr2 but again that's probably not what you want given that these are geo/meteorogical data.
You could still correlate the values for each location that you have, that is a1(:) and a2(:) (converting each input into its vector representation), corrcoef does that automatically:
corrcoef(a1,a2, 'rows','complete')
ans = 2×2
1.0000 0.5341 0.5341 1.0000
To get rho = 0.5341
dpb
dpb el 5 de Sept. de 2023
Editada: dpb el 5 de Sept. de 2023
"...regression and correlation coefficent between two matrixes (temperature and sea level pressure), ... to have a .... for every point in the map .. value for my correlations and regression coefficents"
whos -file array1
Name Size Bytes Class Attributes array3 241x81 156168 double
whos -file array2
Name Size Bytes Class Attributes d 241x81 156168 double
load array1
array3(1:5,1:5)
ans = 5×5
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
load array2
d(1:5,1:5)
ans = 5×5
1.0e+05 * 1.0038 1.0039 1.0040 1.0041 1.0041 1.0038 1.0039 1.0040 1.0041 1.0041 1.0038 1.0039 1.0039 1.0041 1.0041 1.0038 1.0039 1.0039 1.0040 1.0040 1.0038 1.0039 1.0039 1.0040 1.0040
Pretty meaningless variable names, one presumes the 10E5 must be P and the other by elimination T?
However, for each point in the 2D array there is only one value each for T, P, so there is no "regression" or "correlation" of the two on a pointwise basis. You can look at the overall correlation between the two variables, but there's nothing to regress against or compare pointwise.
[r,p]=corrcoef(d,array3,'rows','complete')
r = 2×2
1.0000 0.5341 0.5341 1.0000
p = 2×2
1.0000 0.0000 0.0000 1.0000
gives the overall correlation between the two arrays for the locations that are both finite in the same positions; that's about all there is to be gained from these data in that regards.
ADDENDUM:
What might be interesting would be
scatter(d,array3)
Indeed...there are some pretty clear correlations amongst given sets of data; the various columns are heavily correlated in having a definite set of trends but it is the relationship from one observation to another that is correlated, not that the two variables are highly (linearly) correlated.
Wonder how many columns contain at least one observation...
nnz(any(isfinite(array3)))
ans = 41
So, 41 out of the 81 columns have at least one observation so there are 41 separate traces above...
What, this means I dunno, but is pretty interesting -- and indicates that the overall correlation coefficient doesn't really indicate much and probably is of no practical value.

4 comentarios

Thank you dpb for you time and precious comments. Trying the same code you wrote here, gave me the same results. I think my mistake was in the methodological approach i.e., instead of comparing the two matrixes i need to callculate the correlation coefficent between the pressure matrix and the time series of temperature.
Only if you have 2 timeseries (both variables, like temperature and pressure over the year) for each location then you could have a topography of correlations as I said. That might be interesting and it would look the same as the map you posted (have a look at the worldmap axesm plotm geoshow family to plot that btw).
Otherwise, with a temperature timeseries for each location but only a latitude+longitude pressure matrix with a single value per location as above, the only thing you can do is to average the temperature and do the same correlation as above.
Possibly most of the variance of both variables could be explained by latitute in this location-based correlation above (which is still highly significant btw with rho = 0.53), but with the timeseries-based correlation topography you wouldn't need to control for that (like by regressing out the latitude or other gradient components). This might explain the trends seen above in the colums (or it could be altitude or distance from the sea, but since it's columns it might be that).
Actually there is a very useful toolbox, called Climate data Toolbox with which you can calculate the corrrelation between a time series and a 3D dataset (maybe you already knew it).
It is very useful especially for people working with climate, oceanographic data.
And indeed using the corr3 function I got the same map :)
"instead of comparing the two matrixes i need to callculate the correlation coefficent between the pressure matrix and the time series of temperature."
As the other respondent noted, you would have to have multiple arrays at differing times to do that which I suppose you probably do have.
But, the correlations above are by position and probably just reflect the changing depth as traverse the latitude. But, you haven't told us what temperature it is that is actually measured, nor even precisely what the pressure measurement is pressure of what...if it's atmospheric pressure at sea level, then it's going to be greatly influenced by what else is going on in the global weather patterns at the time.

Iniciar sesión para comentar.

Categorías

Más información sobre Data Distribution Plots en Centro de ayuda y File Exchange.

Preguntada:

el 5 de Sept. de 2023

Comentada:

dpb
el 6 de Sept. de 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by