Binning/Sorting 3D data sets

Question

Ramesh Venkatasubramanian el 27 de Oct. de 2020

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/627948-binning-sorting-3d-data-sets

Comentada: Ramesh Venkatasubramanian el 28 de Oct. de 2020

Respuesta aceptada: Adam Danz

exp.zip

Abrir en MATLAB Online

Hello all,

I have a data set with x, y, mass_of_particle (I have attached the data file). I wanted to sort the particle by creating new discretized x and y and accumulate the mass when it falls within the discretized positions as below,

A = dlmread('exp.txt','');
res = 400;
plane_size = 100;
st = -50;
en = 50;
B = zeros(res);
count = 0;
for x = 1:res
    for y = 1:res
        for i = 1:length(bbc)
            if A(i,1)*1e3 >= st  + ((res/plane_size) * (x-1)) ...
                    && A(i,1)*1e3 < st + ((res/plane_size) * x)...
                    && A(i,2)*1e3 >= st + ((res/plane_size) * (y-1))...
                    && A(i,2)*1e3 < st + ((res/plane_size)*y)
                B(x,y) = B(x,y) + A(i,3);
            end
        end
    end
end

The above code should give something like this, meaning it should accumulate the particle mass that falls into the discretization,

The problem that I am facing is that the above code becomes computationally expensive if the number of particles increase and I am looking for something shorter as well, like using accumarray.

Thank you.

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Ramesh Venkatasubramanian el 27 de Oct. de 2020

Abrir en MATLAB Online

bbc is same as A.
Yes the mass of the particles are really small.
The current "for" loop implementation works, but the problem is it is computationally expensive when the number of particles increase.

I just noticed I made a typo in the code and I have also added few lines to the code, please refer to the corrected code below,

A = dlmread('exp.txt','');
res = 400;
plane_size = 100;
st = -50;
en = 50;
B = zeros(res+1);
count = 0;
dxy = plane_size/res;
for x = 1:res
    for y = 1:res
        for i = 1:length(A)
            if A(i,1)*1e3 >= st  + (dxy * (x-1)) ...
                    && A(i,1)*1e3 < st + (dxy * x)...
                    && A(i,2)*1e3 >= st + (dxy * (y-1))...
                    && A(i,2)*1e3 < st + (dxy * y)
                B(x,y) = B(x,y) + A(i,3);
            end
        end
    end
end
xx = -50:dxy:50;
yy = -50:dxy:50;
figure(1)
contourf(xx,yy,B,51);
colorbar;
colormap(jet);
xlim([-50 50]);
ylim([-50 50]);
zlim([0 inf]);

Adam Danz el 27 de Oct. de 2020

Editada: Adam Danz el 27 de Oct. de 2020

The problem you're solving is actually more complicated than counting the number of particles in a 2D grid. What you're doing is adding all elements in column 3 of nx3 matrix A according to the binning of columns 1 and 2.

Your approach is to loop through each bin (400x400=160000 bins) and within each bin, loop through every row of matrix A which has 401150 rows. That's a total of 400*400*401150 which is more than 64 billion iterations of the conditional statements and sum().

See my answer which uses vectorization to avoid loops and cuts down the execution time by a factor of 845.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Adam Danz el 27 de Oct. de 2020

1
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/627948-binning-sorting-3d-data-sets#answer_526273

Editada: Adam Danz el 27 de Oct. de 2020

Abrir en MATLAB Online

exp.zip

Excluding reading-in the data (first line), this entire solution takes about 0.25 seconds which is 845 times faster than the original version (3.5 minutes).

This uses histcounts2 to bin the 2D data (first 2 columns of A) and to identify which bin each element belongs to.

It then uses splitapply to add all values from the 3rd column of A within each bin and uses indexing to assign the results to B.

A = readmatrix('exp.txt');  % Use readmatrix instead of the outdated dlmread.
res = 400;
plane_size = 100;
st = -50;
en = 50;
count = 0;
dxy = plane_size/res;
A2 = A(:,1:2) .* [1e3, 1e3]; 
xEdge = st + dxy * (0:res); 
yEdge = xEdge; 
[N,~,~,binX,binY] = histcounts2(A2(:,1), A2(:,2), xEdge, yEdge); 
[unqBins,~,binID] = unique([binX,binY],'rows');
Asum = splitapply(@sum,A(:,3),binID);
B = zeros(size(N)); 
ind = sub2ind(size(B),unqBins(:,1),unqBins(:,2)); 
B(ind) = Asum;

To compare the results of B between my version and the original, I plotted each using imagesc(B).

Note that B in my version is 400x400 whereas yours is 401x401. This is because you're preallocating B as B=zeros(res+1) and I'm not sure what the +1 is for. If you preallocate B as B=zeros(res) then the maximum difference between your B and my B is 4.1359e-24 which is just round-off error associated with floating point arithmetic.

maxDifference = max(abs(your_B(:) - my_B(:))); % when B is preallocated as zeros(res)

Since B is a bin-count, there will always be 1 less bin than the number of edges since it takes 2 edges to define a bin. It wouldn't be too difficult to redfine the edges/bins.

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ramesh Venkatasubramanian el 28 de Oct. de 2020

Thanks a lot Adam. It works like a charm.

Iniciar sesión para comentar.

Binning/Sorting 3D data sets

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Binning/Sorting 3D data sets

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos