Binning/Sorting 3D data sets
27 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Ramesh Venkatasubramanian
el 27 de Oct. de 2020
Comentada: Ramesh Venkatasubramanian
el 28 de Oct. de 2020
Hello all,
I have a data set with x, y, mass_of_particle (I have attached the data file). I wanted to sort the particle by creating new discretized x and y and accumulate the mass when it falls within the discretized positions as below,
A = dlmread('exp.txt','');
res = 400;
plane_size = 100;
st = -50;
en = 50;
B = zeros(res);
count = 0;
for x = 1:res
for y = 1:res
for i = 1:length(bbc)
if A(i,1)*1e3 >= st + ((res/plane_size) * (x-1)) ...
&& A(i,1)*1e3 < st + ((res/plane_size) * x)...
&& A(i,2)*1e3 >= st + ((res/plane_size) * (y-1))...
&& A(i,2)*1e3 < st + ((res/plane_size)*y)
B(x,y) = B(x,y) + A(i,3);
end
end
end
end
The above code should give something like this, meaning it should accumulate the particle mass that falls into the discretization,
The problem that I am facing is that the above code becomes computationally expensive if the number of particles increase and I am looking for something shorter as well, like using accumarray.
Thank you.
3 comentarios
Adam Danz
el 27 de Oct. de 2020
Editada: Adam Danz
el 27 de Oct. de 2020
The problem you're solving is actually more complicated than counting the number of particles in a 2D grid. What you're doing is adding all elements in column 3 of nx3 matrix A according to the binning of columns 1 and 2.
Your approach is to loop through each bin (400x400=160000 bins) and within each bin, loop through every row of matrix A which has 401150 rows. That's a total of 400*400*401150 which is more than 64 billion iterations of the conditional statements and sum().
See my answer which uses vectorization to avoid loops and cuts down the execution time by a factor of 845.
Respuesta aceptada
Adam Danz
el 27 de Oct. de 2020
Editada: Adam Danz
el 27 de Oct. de 2020
Excluding reading-in the data (first line), this entire solution takes about 0.25 seconds which is 845 times faster than the original version (3.5 minutes).
This uses histcounts2 to bin the 2D data (first 2 columns of A) and to identify which bin each element belongs to.
It then uses splitapply to add all values from the 3rd column of A within each bin and uses indexing to assign the results to B.
A = readmatrix('exp.txt'); % Use readmatrix instead of the outdated dlmread.
res = 400;
plane_size = 100;
st = -50;
en = 50;
count = 0;
dxy = plane_size/res;
A2 = A(:,1:2) .* [1e3, 1e3];
xEdge = st + dxy * (0:res);
yEdge = xEdge;
[N,~,~,binX,binY] = histcounts2(A2(:,1), A2(:,2), xEdge, yEdge);
[unqBins,~,binID] = unique([binX,binY],'rows');
Asum = splitapply(@sum,A(:,3),binID);
B = zeros(size(N));
ind = sub2ind(size(B),unqBins(:,1),unqBins(:,2));
B(ind) = Asum;
To compare the results of B between my version and the original, I plotted each using imagesc(B).
Note that B in my version is 400x400 whereas yours is 401x401. This is because you're preallocating B as B=zeros(res+1) and I'm not sure what the +1 is for. If you preallocate B as B=zeros(res) then the maximum difference between your B and my B is 4.1359e-24 which is just round-off error associated with floating point arithmetic.
maxDifference = max(abs(your_B(:) - my_B(:))); % when B is preallocated as zeros(res)
Since B is a bin-count, there will always be 1 less bin than the number of edges since it takes 2 edges to define a bin. It wouldn't be too difficult to redfine the edges/bins.
Más respuestas (0)
Ver también
Categorías
Más información sobre String Parsing en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!