Evaluation of k-means performance in terms of 'sumd'

Question

Salad Box el 27 de Sept. de 2019

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/482456-evaluation-of-k-means-performance-in-terms-of-sumd

Comentada: Salad Box el 27 de Sept. de 2019

Hi,

To give a simple example:

I have 4 data points p1, p2, p3, p4 (in blue dots). I performed k-means twice with k = 2 and plotted the output centroids for the two clusters C1 and C2 (green dots).

The two iteration of kmeans are shown below (left and right). Noticed that in the second iteration (right), C2 and p2 are in the same location.

To compare the performance of k-means in this two iterations, or to find out which of these two cases is a better clustering, do I just look at 'sumd' which is the sum of the distance of each point to the centroid in that cluster?

In this case, sumd of left is [0.5000, 0.5000] while sumd of right is [1.3333, 0].

In order to compare the two cases,

Do I just sum the 'sumd' of left and get '1', and sum the 'sumd' of right and get '1.3333', and take the smaller number which is '1' and claim left is better clustered?

Am I doing it correctly?

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Adam el 27 de Sept. de 2019

There's not any single definition of what is the 'best' clustering so you have to pick one that you feel fits your case.

I don't have a sumd function in my Matlab so I don't know its specific details and why it gives two numbers (well, one for each cluster seems obvious).

I tend to measure the quantisation error by summing the distance of each point to its defined cluster node, which may or may not be the same thing, relatively, though in that case, normalising by the number of points it would give an error of 0.5 for the left case and something like 0.65 for the right. So I guess that is the same as you averaging over your results (i.e. average distance per cluster).

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

the cyclist el 27 de Sept. de 2019

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/482456-evaluation-of-k-means-performance-in-terms-of-sumd#answer_393792

I agree with Adam's comment that there is not really a single "best" here. One way to think about it is in terms of a "utility function" -- what are you trying to achieve with the clustering, and can you write a mathematical function that captures that?

That being said, the sum of the sumd outputs is certainly a sensible, low-effort metric for the best clustering. After all, the kmeans algorithm itself is attempting to minimize the sumd values.

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Salad Box el 27 de Sept. de 2019

Thanks for the confirmation, I'm going to use sum of 'sumd' with strong confidence.:)

Iniciar sesión para comentar.

Evaluation of k-means performance in terms of 'sumd'

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Evaluation of k-means performance in terms of 'sumd'

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos