How can the following code be optimized / vectorized?

How can you optimize the following? It is painfully slow and takes around 5 minutes to run on my computer. Here I am calcuatling the distances between atoms in molecules while taking account periodic boundary conditions (which mean an atom on the edge of the box might actually belong to a molecule that wraps around, so to speak). I imagine this could be done in a far more efficient and speedy way than my Fortran-esque implementation.
A % an array where column 1 is the atom ID, column 2 is the molecule ID, and columns 3-5 are the positions in x, y, and z
N = 200; % molecules in system
Sites = 150; % sites in each molecule
length = 120; % length of box on each side
SitesInSystem = N*Sites;
B = zeros(SitesInSystem,SitesInSystem);
% center box at origin
A(:,3) = A(:,3) - length/2;
A(:,4) = A(:,4) - length/2;
A(:,5) = A(:,5) - length/2;
for ii = 1:SitesInSystem
for jj = 1:SitesInSystem
if ii ~= jj % if not the same site
dx = A(jj,3) - A(ii,3);
dy = A(jj,4) - A(ii,4);
dz = A(jj,5) - A(ii,5);
% taking into account periodic boundaries
if (dx > length * 0.5)
dx = dx - length;
elseif (dx <= -length * 0.5)
dx = dx + length;
end
if (dy > length * 0.5)
dy = dy - length;
elseif (dy <= -length * 0.5)
dy = dy + length;
end
if (dz > length * 0.5)
dz = dz - length;
elseif (dz <= -length * 0.5)
dz = dz + length;
end
% calculate distances
B(ii,jj) = sqrt(dx^2+dy^2+dz^2);
end
end
end
B = triu(B); % to avoid double counting
Perhaps this also can be combined with the following snippet, which calculates something based on the values of the matrix B:
for ii = 1:N % molecule
for jj = 1:Sites % site
for kk = 1:N
for mm = 1:Sites
if ii ~= kk % if not the same molecule
if (B(ii*jj,kk*mm) < 1.75) && (B(ii*jj,kk*mm) > 0)
% do a calculation, i.e., add edge ii-kk to a graph
% with N nodes
end
end
end
end
end
end

4 comentarios

KSSV
KSSV el 16 de Mzo. de 2022
It looks like this can be achieved with ease. Why don't you attach the data for A. What is its size? I feel it is a column major array. I am surprised with the loop indexing looking at A.
L'O.G.
L'O.G. el 16 de Mzo. de 2022
Editada: L'O.G. el 16 de Mzo. de 2022
A is quite large: 30000 (rows) x 5 (columns), since N (molecules) is 200 here and Sites (atoms) are 150. As mentioned in my attempt, column 1 is the atom ID (integer from 0 to 29999), column 2 is the molecule ID (integer from 0 to 199), and columns 3-5 are the positions in x, y, and z, respectively (integer for now, but could be a floating point number). The attached is a very pared down example (again, not the whole data, which is quite large). The example just has two molecules consisting of 137 atoms each (rather than 150 atoms per molecule as in my code above). Here you can read the data into MATLAB and use readtable and table2array to turn the data into a MATLAB array.
Stephen23
Stephen23 el 16 de Mzo. de 2022
"Here you can read the data into MATLAB and use readtable and table2array to turn the data into a MATLAB array."
READMATRIX would be simpler and more efficient: https://www.mathworks.com/help/matlab/ref/readmatrix.html
L'O.G.
L'O.G. el 16 de Mzo. de 2022
@Stephen Thanks. I didn't know about that function!

Iniciar sesión para comentar.

 Respuesta aceptada

Bruno Luong
Bruno Luong el 16 de Mzo. de 2022
Editada: Bruno Luong el 16 de Mzo. de 2022
It takes about 25 seconds only PC
Ac35 = A(:,3:5);
Ai = reshape(Ac35,[],1,3);
B = zeros(SitesInSystem,SitesInSystem);
for j=1:SitesInSystem
Aj = reshape(Ac35(j,:),1,[],3);
dxyz = Ai-Aj;
dxyz = mod(dxyz+length/2,length)-length/2;
B(:,j) = sqrt(sum(dxyz.^2,3));
end

Más respuestas (2)

KSSV
KSSV el 16 de Mzo. de 2022
I think pdist2, this inbuilt function will work for you.
T = readtable('Sample.xlsx') ;
A = table2array(T) ;
A = A(:,3:5) ; % pick coordinates (x,y,z)
d = pdist2(A,A) ;
B = triu(d) ; % you may check this with you answer. Use isequal to check.

3 comentarios

L'O.G.
L'O.G. el 16 de Mzo. de 2022
Editada: L'O.G. el 16 de Mzo. de 2022
@KSSV Thanks, I really appreciate it. Two things: first, due to the calculation of the differences dx, dy, and dz that take into account periodic boundary calculations, is there an easy way of incorporating that without all of the for loops? In your proposed solution, that's not taken into account. Second, is the second set of for loops necessary to do the calculations I mention using B?
KSSV
KSSV el 16 de Mzo. de 2022
I have checked your code and pdist2 for the provided 274*5 data. Both the methods, gave same solution.
L'O.G.
L'O.G. el 16 de Mzo. de 2022
Editada: L'O.G. el 16 de Mzo. de 2022
@KSSV Right, but for the full case (not the sample file I gave) I would need to consider that some atoms might be at the edges of the box, so that periodic boundaries as applied to that molecule must be taken into account. Sorry I didn't make that clearer. I mentioned it in my initial query and coded it using the if statements nested in the orignal set of for loops, but I guess not very clearly.

Iniciar sesión para comentar.

You can see that the rem() based expression I use here can be used to vectorize the calculation.
format bank
Length = 5;
x = -3.5 : .5 : 3.5
x = 1×15
-3.50 -3.00 -2.50 -2.00 -1.50 -1.00 -0.50 0 0.50 1.00 1.50 2.00 2.50 3.00 3.50
for K = 1 : length(x)
dx = x(K);
if (dx > Length * 0.5)
dx = dx - Length;
elseif (dx <= -Length * 0.5)
dx = dx + Length;
end
newx_if(K) = dx;
end
newx_mod = rem((x - Length/2), Length) + Length/2;
newx_if
newx_if = 1×15
1.50 2.00 2.50 -2.00 -1.50 -1.00 -0.50 0 0.50 1.00 1.50 2.00 2.50 -2.00 -1.50
newx_mod
newx_mod = 1×15
1.50 2.00 2.50 -2.00 -1.50 -1.00 -0.50 0 0.50 1.00 1.50 2.00 2.50 3.00 3.50
newx_if - newx_mod
ans = 1×15
0 0 0 0 0 0 0 0 0 0 0 0 0 -5.00 -5.00

3 comentarios

L'O.G.
L'O.G. el 16 de Mzo. de 2022
Editada: L'O.G. el 16 de Mzo. de 2022
@Walter Roberson Thanks very much. I am trying to understand what you did. I guess I should actually take the differences dx (and dy and dz) before using rem() to take into account the boundaries. How would I do that efficiently? Generating all distances (for x, for example) by something like pdist(A(:,1),@(x,y) x-y); generates a huge vector and is slow.
newxyz_mod = rem((A - Length/2), Length) + Length/2; % for x, y, z
looks fine, as long as Length is the same in all three directions.
You would follow it with
B = sqrt( sum(newxyz_mod.^2, 2) );
L'O.G.
L'O.G. el 16 de Mzo. de 2022
Editada: L'O.G. el 16 de Mzo. de 2022
@Walter Roberson Got it. Thank you! And about my other question, the differences -- rem() should use the differences, correct? Not the positions themselves? That's my understanding, but I don't know how to calculate that efficiently. As I mentioned in my previous comment, something like pdist(A(:,1),@(x,y) x-y); generates a huge vector and is slow, and that's just for x! I think there will be (N*Sites)^2 / 2 possible distances which will fill an upper triangular matrix. I'm not sure how to code that efficiently here, though.

Iniciar sesión para comentar.

Categorías

Productos

Versión

R2021b

Preguntada:

el 16 de Mzo. de 2022

Comentada:

el 16 de Mzo. de 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by