How to compute the correlation between two metrices of same dimenstion (correlation between a column of a matrix to corresponding column of other matrix)?

I have two matrix A and B. Both metrics are same dimenstion. I need to compute the correlation coefficient between [A(:,1), B(:,1)], [A(:,2), B(:,2)], ......., [A(:,n),B(:,n)]. It is a column of A to a corresponding a column in B. At the end I want the output like correlation cofficient of each column from A matrix to corresponding column of B matrix. How do I perform efficiently in MATLAB?

 Respuesta aceptada

A = rand(10, 3);
B = rand(10, 3);
R = diag(corr(A, B))
R = 3×1
0.2872 0.3099 0.1910

9 comentarios

Hi Ive, Thanks a lot. it works :)
Hello Ive, I realize that your answer work for low dimenstion matrix but for let's say 80000*80000 it is running out.
I am getting this error for my function diag(corr(tempim,recon))
Error using *
Requested 80000x80000 (47.7GB) array exceeds maximum array size preference (15.8GB). This might cause MATLAB to
become unresponsive.
Error in internal.stats.corrPearson (line 93)
coef = x' * y; % 1/(n-1) doesn't matter, renormalizing anyway
Error in corr (line 212)
coef = corrFun(rows,tail,x,y);
Please clarify how to handle this kind of error? Although I have perform this operation sequentially in a loop but it is very time consuming. I want some efficient way to do it. Thanks
The error you get is because your large arrays do not fit into the memory. You can either use tall arrays:
A = rand(10, 3); B = rand(10, 3);
tA = tall(A);
tB = tall(B);
R = gather(diag(corr(tA, tB)))
Or directly calculate Pearson's correlation coefficient column wise:
A = rand(10, 3); B = rand(10, 3);
dA = A - mean(A, 1);
dB = B - mean(B, 1);
PearsonR = @(dA, dB) sum(dA.*dB)./sqrt(sum(dA.^2).*sum(dB.^2));
R1 = PearsonR(dA, dB).';
R2 = diag(corr(A, B)); % to compare (you don't need it for your large arrays)
all(abs(R1 - R2) < eps) % they are the same
ans = logical
1
Still it doesn't resolve my problem computationally. It is taking too much time, I don't understand if there is away to do faster.
I want to compute the correlation of matrix as per mention in my question. Let's try to understand how It can be faster. I need a column of metrix A to the corresponding column in matrix B. so this is one correlation value. If I have N columns then I will get N correlation values. Here all the correlation values are independently calculated means column of one matrix to column of other matrix. Can't we do parallelly to compute all the correlaion at once or in batches to get the output faster? I think there should be a way to do it faster.
However I am getting this error by running your this given code below:
A = rand(10, 3); B = rand(10, 3);
tA = tall(A);
tB = tall(B);
R = gather(diag(corr(tA, tB)))
Check for incorrect argument data type or missing argument in call to function 'diag'.
You can try this:
% R = diag(gather(corr(tA, tB)))
But it still generates a memory error.
For your other comment: you can use a parfor to calculate coefficient per each column in a parallel loop but you most probably won't get any benefit out of parallel loop because the memory overhead would be a bottleneck.
You should avoid corr function in this special case (unless there another efficient way), and use the other approach I suggested. This in fact benefits from multithreaded calculation
A = rand(1000,80000);
B = rand(1000,80000);
dA = A - mean(A, 1);
dB = B - mean(B, 1);
PearsonR = @(dA, dB) sum(dA.*dB)./sqrt(sum(dA.^2).*sum(dB.^2));
tic;R1 = PearsonR(dA, dB);toc
Elapsed time is 0.745682 seconds.
So, I don't understand what you mean by Still it doesn't resolve my problem computationally.
Thanks a lot Ive. Finally it works for me.
If you can give a explanation how your last one code is working and why it is efficient than the previous one? It would be great .
The previous one was technically the same. I included this line:
R2 = diag(corr(A, B)); % to compare (you don't need it for your large arrays)
only for you to see my approach does exactly what corr built-in function does, with the difference that it only calculates column-wise correlation coeffs and not all pairwise combinations. So, this is computationally more efficient and thanks to MATLAB, it's done multithreaded.
Glad it worked for you.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Matrices and Arrays en Centro de ayuda y File Exchange.

Etiquetas

Preguntada:

el 4 de Sept. de 2021

Comentada:

el 7 de Sept. de 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by