Binning/Sorting 3D data sets

27 visualizaciones (últimos 30 días)
Ramesh Venkatasubramanian
Ramesh Venkatasubramanian el 27 de Oct. de 2020
Comentada: Ramesh Venkatasubramanian el 28 de Oct. de 2020
Hello all,
I have a data set with x, y, mass_of_particle (I have attached the data file). I wanted to sort the particle by creating new discretized x and y and accumulate the mass when it falls within the discretized positions as below,
A = dlmread('exp.txt','');
res = 400;
plane_size = 100;
st = -50;
en = 50;
B = zeros(res);
count = 0;
for x = 1:res
for y = 1:res
for i = 1:length(bbc)
if A(i,1)*1e3 >= st + ((res/plane_size) * (x-1)) ...
&& A(i,1)*1e3 < st + ((res/plane_size) * x)...
&& A(i,2)*1e3 >= st + ((res/plane_size) * (y-1))...
&& A(i,2)*1e3 < st + ((res/plane_size)*y)
B(x,y) = B(x,y) + A(i,3);
end
end
end
end
The above code should give something like this, meaning it should accumulate the particle mass that falls into the discretization,
The problem that I am facing is that the above code becomes computationally expensive if the number of particles increase and I am looking for something shorter as well, like using accumarray.
Thank you.
  3 comentarios
Ramesh Venkatasubramanian
Ramesh Venkatasubramanian el 27 de Oct. de 2020
  • bbc is same as A.
  • Yes the mass of the particles are really small.
  • The current "for" loop implementation works, but the problem is it is computationally expensive when the number of particles increase.
I just noticed I made a typo in the code and I have also added few lines to the code, please refer to the corrected code below,
A = dlmread('exp.txt','');
res = 400;
plane_size = 100;
st = -50;
en = 50;
B = zeros(res+1);
count = 0;
dxy = plane_size/res;
for x = 1:res
for y = 1:res
for i = 1:length(A)
if A(i,1)*1e3 >= st + (dxy * (x-1)) ...
&& A(i,1)*1e3 < st + (dxy * x)...
&& A(i,2)*1e3 >= st + (dxy * (y-1))...
&& A(i,2)*1e3 < st + (dxy * y)
B(x,y) = B(x,y) + A(i,3);
end
end
end
end
xx = -50:dxy:50;
yy = -50:dxy:50;
figure(1)
contourf(xx,yy,B,51);
colorbar;
colormap(jet);
xlim([-50 50]);
ylim([-50 50]);
zlim([0 inf]);
Adam Danz
Adam Danz el 27 de Oct. de 2020
Editada: Adam Danz el 27 de Oct. de 2020
The problem you're solving is actually more complicated than counting the number of particles in a 2D grid. What you're doing is adding all elements in column 3 of nx3 matrix A according to the binning of columns 1 and 2.
Your approach is to loop through each bin (400x400=160000 bins) and within each bin, loop through every row of matrix A which has 401150 rows. That's a total of 400*400*401150 which is more than 64 billion iterations of the conditional statements and sum().
See my answer which uses vectorization to avoid loops and cuts down the execution time by a factor of 845.

Iniciar sesión para comentar.

Respuesta aceptada

Adam Danz
Adam Danz el 27 de Oct. de 2020
Editada: Adam Danz el 27 de Oct. de 2020
Excluding reading-in the data (first line), this entire solution takes about 0.25 seconds which is 845 times faster than the original version (3.5 minutes).
This uses histcounts2 to bin the 2D data (first 2 columns of A) and to identify which bin each element belongs to.
It then uses splitapply to add all values from the 3rd column of A within each bin and uses indexing to assign the results to B.
A = readmatrix('exp.txt'); % Use readmatrix instead of the outdated dlmread.
res = 400;
plane_size = 100;
st = -50;
en = 50;
count = 0;
dxy = plane_size/res;
A2 = A(:,1:2) .* [1e3, 1e3];
xEdge = st + dxy * (0:res);
yEdge = xEdge;
[N,~,~,binX,binY] = histcounts2(A2(:,1), A2(:,2), xEdge, yEdge);
[unqBins,~,binID] = unique([binX,binY],'rows');
Asum = splitapply(@sum,A(:,3),binID);
B = zeros(size(N));
ind = sub2ind(size(B),unqBins(:,1),unqBins(:,2));
B(ind) = Asum;
To compare the results of B between my version and the original, I plotted each using imagesc(B).
Note that B in my version is 400x400 whereas yours is 401x401. This is because you're preallocating B as B=zeros(res+1) and I'm not sure what the +1 is for. If you preallocate B as B=zeros(res) then the maximum difference between your B and my B is 4.1359e-24 which is just round-off error associated with floating point arithmetic.
maxDifference = max(abs(your_B(:) - my_B(:))); % when B is preallocated as zeros(res)
Since B is a bin-count, there will always be 1 less bin than the number of edges since it takes 2 edges to define a bin. It wouldn't be too difficult to redfine the edges/bins.
  1 comentario
Ramesh Venkatasubramanian
Ramesh Venkatasubramanian el 28 de Oct. de 2020
Thanks a lot Adam. It works like a charm.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre String Parsing en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by