Find Optimal Number of Cluster using Silhoutte Criterion from Scratch In MATLAB
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Hammad Younas
el 15 de Feb. de 2023
Comentada: Gian23
el 16 de Feb. de 2023
ello, I Hope you are doing well. I am trying to Find optimal Number of Cluster using evalclusters with K-means and silhouette Criterion
The build in Command takes very large time to find optimal Cluster. I am implementing this method from scratch. I have the following code. The score obtained by scratch algorithm is different from build in Function
The Dataset and the build-in function in the following section. The evaluation.CriterionValues are the scores for optimal K
x =[ [0.1 0.2 0.15 0.2 0.21 ] 1+[0.1 0.2 0.15 0.2 0.21 ]];
y =[ [0.1 0.2 0.15 0.2 0.21 ] 1+[0.1 0.2 0.15 0.2 0.21 ]];
X = [x.' y.'];
dataset_len = size(X,1);
num_kmeans = 6;
%%
evaluation = evalclusters(X,"kmeans","silhouette","KList",1:num_kmeans)
evaluation.CriterionValues
Here is the Code to implement this from scratch. The array_silhoutte are the scores for optimal K
array_silhoutte = zeros(1,num_kmeans);
distance_a = [];
distance_b = [];
for j=1:num_kmeans
[cluster_assignments,centroids] = kmeans(X,j,'Distance','sqeuclidean','Start','sample');
%[~,grps_11]=grp2idx(cluster_assignments);
for i = 1:dataset_len
distance_a = [];
distance_b = [];
current_datapoint = X(i,:);
for k=1:dataset_len
if i~=k
if (cluster_assignments(i)== cluster_assignments(k))
dist = pdist2( current_datapoint,X(k,:),'squaredeuclidean') ;
distance_a = [distance_a;dist];
else
dist = pdist2( current_datapoint,X(k,:),'squaredeuclidean') ;
distance_b=[distance_b;dist];
end
end
end
Average_a=mean(distance_a);
Average_b=mean(distance_b);
end
array_silhoutte(j) = (Average_b-Average_a)./max(Average_b, Average_a);
end
Can anybody help me with this to equal the score for scratch and build-in-function
Respuesta aceptada
Marco Riani
el 16 de Feb. de 2023
Editada: Marco Riani
el 16 de Feb. de 2023
x =[ [0.1 0.2 0.15 0.2 0.21 ] 1+[0.1 0.2 0.15 0.2 0.21 ]];
y =[ [0.1 0.2 0.15 0.2 0.21 ] 1+[0.1 0.2 0.15 0.2 0.21 ]];
X = [x.' y.'];
dataset_len = size(X,1);
num_kmeans = 6;
evaluation = evalclusters(X,"kmeans","silhouette","KList",1:num_kmeans)
disp("Criterion values from evalclusters")
disp(evaluation.CriterionValues)
array_silhoutte = zeros(1,num_kmeans);
for j=1:num_kmeans
% [cluster_assignments,centroids] = kmeans(X,j,'Distance','sqeuclidean','Start','sample');
[cluster_assignments,centroids] = kmeans(X,j,'Replicates',100);
avgDWithin=zeros(dataset_len,1);
avgDBetween=Inf(dataset_len,j);
for i=1:dataset_len
for jj=1:j
boo=cluster_assignments==cluster_assignments(i);
Xsamecluster=X(boo,:);
if size(Xsamecluster,1)>1
avgDWithin(i)=sum(sum((X(i,:)-Xsamecluster).^2,2))/(size(Xsamecluster,1)-1);
end
boo1= cluster_assignments~=cluster_assignments(i);
Xdifferentcluster=X(boo1 & cluster_assignments ==jj,:);
if ~isempty(Xdifferentcluster)
avgDBetween(i,jj)=mean(sum((X(i,:)-Xdifferentcluster).^2,2));
end
end
end
% Calculate the silhouette values
minavgDBetween = min(avgDBetween, [], 2);
silh = (minavgDBetween - avgDWithin) ./ max(avgDWithin,minavgDBetween);
array_silhoutte(j) =mean(silh);
end
disp("Criterion values computed manually")
disp(array_silhoutte)
I slighly rewrote your code and put Replicates',100 in the call to kmeans. Please let me know if now everything is clear. Of course kmeans does not take into account the correlation among the variables and it is not robust to the presence of atypical observations. Anyway, this is another story.
Best
Marco
Más respuestas (0)
Ver también
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!