Error using pdist2mex Error in kmeans>distfun

Hi, To represent our data (3233477*256) with Bag of visual word (BOW) which use KMeans clustering to extract visual words when we choose K=5000 this problem show:
Error using pdist2mex
Requested 3233477x5000 (120.5GB) array exceeds maximum array size preference. Creation of arrays greater than this limit may take a
long time and cause MATLAB to become unresponsive. See array size limit or preference panel for more information.
Error in kmeans>distfun (line 747)
D = pdist2mex(X,C,'sqe',[],[],[]);
Error in kmeans/loopBody (line 445)
D = distfun(X, C, distance, 0, rep, reps);
Error in internal.stats.parallel.smartForReduce (line 136)
reduce = loopbody(iter, S);
Error in kmeans (line 335)
ClusterBest = internal.stats.parallel.smartForReduce(...
Error in BOWHistogram (line 12)
[idx,c,sumd,D2] = kmeans(double(Tab_Feature_Data),NumClust);
What can I do to fix the error? Please advise me
Tripoli Settou
Tripoli Settou on 18 Mar 2018
How can i do that i must extract the visual word from all can i split it and get the visual word of all data?

Bernhard Suhm
Bernhard Suhm on 25 Mar 2018
You could try converting your large input data into a tall array (maybe as simple as t = tall(double(Tab_Feature_Data)), and then pass that tall array to kmeans. Though watch there are limitations which options of kmeans are available with tall arrays, see
Tripoli Settou
Tripoli Settou on 26 Mar 2018
So u advise me to use the Parallel Computing Toolbox to leverage multiple cores? right? also, I need the 4th output variable (D2) in next work
[idx,c,sumd,D2] = kmeans(double(Tab_Features),NumClust);
and can I use the Parallel Computing Toolbox without tall version KMeans i mean KMeans with Parallel Computing Toolbox?? it's work??

carpcarp carpcarp
carpcarp carpcarp on 2 Apr 2021
Walter Roberson
Walter Roberson on 2 Apr 2021
(User points out that there can be problems if you accidentally have your own kmeans.m instead of using MATLAB's)

