clusterDBSCAN cannot handle the size of my input Matrix

6 visualizaciones (últimos 30 días)
VALENTINA CARMONA MONGUA
VALENTINA CARMONA MONGUA el 10 de Mayo de 2023
Respondida: Shivam Malviya el 18 de Mayo de 2023
I'm using clusterDBSCAN with a Matrix of 37687*4 data, at some point of the algorith it converts it to 37687*37687 matrix and gives the following errors:
Error using zeros
Requested 376487x376487 (1056.1GB) array exceeds maximum array size preference (15.8GB). This might cause MATLAB to become
unresponsive.
Error in phased.internal.AbstractClusterDBSCAN.calcPairwiseDist (line 1307)
D = zeros(nx,ny);
Error in phased.internal.AbstractClusterDBSCAN/dbscanHyperellipse (line 1171)
distances = obj.calcPairwiseDist(x,x);
Error in phased.internal.AbstractClusterDBSCAN/clusterdataDBSCAN (line 218)
[idxTmp,~] = dbscanHyperellipse(obj,xCluster);
Error in clusterDBSCAN/stepImpl (line 202)
[idx, clusterIDs] = clusterdataDBSCAN(obj,x,varargin{:});
Related documentation
I want to know how to use the algorith or if there is another way to perform DBSCAN without this problem, my code is this one:
%Xnorm= 37687x4 matrix
clusterer = clusterDBSCAN('Epsilon',2,'MinNumPoints',8);
idx = clusterer(Xnorm);
I'm using MATLAB R2021b
I have also tried with the fuction dbscan :
IDX = dbscan(X, EPSILON, MINPTS)
But it also takes too long, and never finishes.

Respuestas (1)

Shivam Malviya
Shivam Malviya el 18 de Mayo de 2023
Hi Valentina,
I understand that you are facing an issue with "clusterDBSCAN" throwing an error due to the maximum array size preference of MATLAB.
To resolve this issue, the first option would have been to increase the "MATLAB array size limit" option in "Preferences > Workspace". However, based on the required intermediate array size of 1056.1GB, it seems unlikely that any RAM would be able to handle this.
The second option is to use "dbscan" which does not require such a large memory size. However, given the size of your input data which is 376487, it may take some time to execute.
Another approach is to use a different clustering algorithm such as k-means clustering that does not require a large amount of time to execute. Here is an example of the same.
% Create a dummy data
Xnorm = rand(376487, 4);
% Cluster the data
idx = kmeans(Xnorm, 100);
Please refer to the following links for a better understanding:
Regards,
Shivam

Productos


Versión

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by