How to quickly group numerical data without giving bin sizes
10 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Dominik Rhiem
el 15 de Ag. de 2023
Comentada: Star Strider
el 16 de Ag. de 2023
I am trying to find an efficient and quick way to group numerical data. In short, I have several paths towards a particular pixel, and these paths consist of rays of slightly different lengths (as any ray that crosses the pixel anywhere is valid for a path). These paths can therefore be considered groups of rays. I want to differentiate the paths by their (average) length and select the path that contains the largest amount of rays, or, in other words, identify the groups and select the largest group.
Importantly though, I do not just need the length, but also an index to identify one ray, e.g. the "middle" one of the group. (Say I have an array of size 10, and the first 7 and last 3 elements form 2 groups. I would like to identify the groups, then, out of the 7 elements of the larger group, I would like to get the index of the 4th element as the "middle".)
My current solution is to round the ray lengths (to third decimal, as the pixel size is on the millimeter scale) and use the "mode" function, however, this is both inefficient (because I want to do this column-wise for a matrix that also contains NaN that I would like to ignore) and in some cases inaccurate. For example:
array = [0.2248 0.2249 0.2250 0.2251 0.2399 0.2400 0.2401];
array2 = round(array,2);
mode(array2)
Of course it would be logical to group the first four entries and the last three, but the rounding operation is ill-suited when the values vary around the .5. I have used to Histogram function to plot examples in my code and it groups the entries in a satisfactory way, however, I actively do not want to have the plot itself, I just need the grouping, and the histogram function seems to have a rather large overhead for this purpose (as this operation has to be performed thousands of times for a proper run of the program). The discretize function unfortunately needs me to give it an explicit number of bins, i.e. I would need to have an a priori idea of the groups.
Is there any function that can efficiently do this, or are there suggestions for a better way to do it myself than "mode"?
0 comentarios
Respuesta aceptada
Star Strider
el 16 de Ag. de 2023
I am not certain that there is a robust approach to this sorts of problem. For multivariable problems (each point is a vector determined by more than one value), there are built-in clustering functions. This is a bit unique.
The data ideally need to be ordered (although that may not be an absolute requirement), the reason being that it is easier to calculate the differences if they are. This approach may be too much for this particular problem, however I decided to make it a bit more robust and so be appropriate for other problems, although I cannot be ceertain it will be robust for all such problems, and may need tweaking in some instances.
Try this —
array = [0.2248 0.2249 0.2250 0.2251 0.2399 0.2400 0.2401];
% array = [array array+0.51] % Test Vector
DifMtx = abs(array(:)-array) % Difference MAtrix
[Col1,ixs] = sort(DifMtx(:,1)); % First Column & Inmdices
Col1Dif = diff([0; Col1]); % Ordered Column Differences
BP = [1; find(Col1Dif >= 5*min(Col1Dif(Col1Dif>0))); numel(Col1)+1]; % Break Points
for k = 1:numel(BP)-1
idxrng = BP(k) : BP(k+1)-1;
Cluster{k} = array(idxrng);
end
figure
hold on
for k = 1:numel(Cluster)
stem(Cluster{k}, ones(size(Cluster{k})), '.', 'filled', 'DisplayName',["Cluster #"+k])
end
hold off
grid
xlim([0.22 0.245]) % Optional
ylim([0 2])
legend('Location','best')
xlabel('Array')
title('Clusters')
The ‘ixs’ vector indexes into the original ‘Col1’ vector (and the original ‘array’ vector) if that information is needed.
.
2 comentarios
Star Strider
el 16 de Ag. de 2023
As always, my pleasure!
I did my best to make it as robust as I could, however if you encounter a vector in which it has problems, post back and I will see if I can improve it to make it work with the new vector.
Más respuestas (0)
Ver también
Categorías
Más información sobre Histograms en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!