use a custom distance with the kmeans

38 visualizaciones (últimos 30 días)
Emanuele Gandolfi
Emanuele Gandolfi el 13 de En. de 2022
Comentada: Walter Roberson el 15 de En. de 2022
Hello everyone.
I'm not very good with matlab so I ask you for help. For a university project I need to be able to group users who are furthest away from each other within a rectangular area. I am using kmeans and I have two possibilities: the first is to create a custom function, but I have read that I should use kMedoids; the second is to pass it the custom distance matrix.
At the moment I am following the second path but I do not understand how to do it. I am attaching the code of the one done so far.
N = 10
x=rand(N,1)*5
y=rand(N,1)*2.5
figure
scatter(x,y)
M = [x,y]
num_medoids = 1;
eucli_dis = pdist(M);
eucli_dis = squareform(eucli_dis);
inv_eucli_dis = 1./eucli_dis;
for ii = 1:10
text(M(ii,1),M(ii,2),num2str(ii));
end
gscatter(M(:,1),M(:,2))
  6 comentarios
Walter Roberson
Walter Roberson el 13 de En. de 2022
What is the code for your custom distance function?
What is your code for your call to kmedoids ?
Emanuele Gandolfi
Emanuele Gandolfi el 14 de En. de 2022
Editada: Walter Roberson el 14 de En. de 2022
the code for the custom distance function:
function inv_eucli_dis = customeucli (x,y)
M = [x,y]
eucli_dis = 1+sqrt(sum(bsxfun(@minus,M,reshape(M',1,size(M,2),size(M,1))).^2,2)); %calculation of the Euclidean distance
inv_eucli_dis = 1/eucli_dis; %inverse distance calculation
dist_mat = squeeze(inv_eucli_dis); %result representation in matrix form
code for kmedoids:
[idx,C,sumd,D] = Ckmedoids (M,num_medoids, 'distance','customeucli');

Iniciar sesión para comentar.

Respuesta aceptada

Image Analyst
Image Analyst el 14 de En. de 2022
Here is code to randomly lay down points and draw thin black lines between a pair of points if they are far away from each other and draw a green line between endpoints of a pair if the points are close together:
N = 10
x=rand(N,1)*5
y=rand(N,1)*2.5
plot(x, y, 'b.', 'MarkerSize', 30);
grid on;
xy = [x(:), y(:)];
distances = pdist2(xy, xy)
% Zero out lower triangle because it's a repeat of the upper triangle
distances = triu(distances)
nonZeroIndexes = distances > 0;
medianDistance = median(distances(nonZeroIndexes))
thresholdValue = medianDistance/2; % Whatever you want.
% Find pairs that are far apart.
[rows, columns] = find(distances > thresholdValue);
hold on;
% Plot pairs that are far apart.
for k = 1 : length(rows)
index1 = columns(k);
index2 = rows(k);
xp = [x(index1), x(index2)];
yp = [y(index1), y(index2)];
plot(xp, yp, 'k-', 'LineWidth', 1, 'MarkerSize', 30);
end
% Find pairs that are close together.
[rows2, columns2] = find((distances > 0) & (distances <= thresholdValue));
hold on;
% Plot pairs that are close together.
for k = 1 : length(rows2)
index1 = columns2(k);
index2 = rows2(k);
xp = [x(index1), x(index2)];
yp = [y(index1), y(index2)];
plot(xp, yp, 'r-', 'LineWidth', 2, 'MarkerSize', 30);
end
title('Black lines are far away, red lines are close')
  7 comentarios
Emanuele Gandolfi
Emanuele Gandolfi el 15 de En. de 2022
yes I've seen them. however, I wanted to know for sure if it could also be solved with kmeans by using a custom distance function or by passing it an inverse distance matrix. that's all
Walter Roberson
Walter Roberson el 15 de En. de 2022
I wanted to know for sure if it could also be solved with kmeans by using a custom distance function or by passing it an inverse distance matrix.
No, it cannot be solve that way "for sure". Unless there are exactly two choices in two dimensions, then The Voting Paradox shows that there is no possible algorithm that can reliably generate optimal outcomes for all nodes.

Iniciar sesión para comentar.

Más respuestas (2)

Image Analyst
Image Analyst el 13 de En. de 2022
Not sure why you think there are clusters. I'd just use pdist2() and then threshold to find points that are farther apart than some distance. Something like
xy = [x(:), y(:)];
distances = pdist2(xy, xy);
thresholdValue = .3; % Whatever you want.
[rows, columns] = find(distances > thresholdValue);

Image Analyst
Image Analyst el 14 de En. de 2022
Editada: Image Analyst el 14 de En. de 2022
You might also consider SVM. It tries to find a dividing line between two groups such that the gap between the two groups is widest so the two groups are farthest apart. See
or use dbscan (demo attached):
It tries to find all points that can be connected with a distance less than what you specify:
A point that is found to be in a cluster with more than a certain number of close neighbors is called a "core point". It can also be part of the cluster if any points are within that distance. So for example, a dumbbell shape could have core points in the ends, connected points in the middle, and the whole thing being one single cluster. If it's an isolated point not closer to any other point than your specified distance, it's not a core point. (Like point N above.)
In this diagram, minPts = 4. Point A and the other red points are core points, because the area surrounding these points in an ε radius contain at least 4 points (including the point itself). Because they are all reachable from one another, they form a single cluster. Points B and C are not core points, but are reachable from A (via other core points) and thus belong to the cluster as well. Point N is a noise point that is neither a core point nor directly-reachable. You can consider it as essentially being in a clluster all by itself.

Categorías

Más información sobre Statistics and Machine Learning Toolbox en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by