Clustering using Gower's Distance
    5 visualizaciones (últimos 30 días)
  
       Mostrar comentarios más antiguos
    
Hello all, I have a dataset that includes both categorical and numerical features, and I'm looking to perform clustering on it. I've read that Gower's Distance (code is available) is suitable for handling mixed data types. However, I am getting an "isnan" error. How can I fix the problem? Thanks for the help. 
DataSet = readtable("Test.xlsx", 'ReadVariableNames', true);
GowerDst = gower(DataSet);
[Idx, C] = kmedoids(DataSet, 2, 'Distance', GowerDst);
Error using isnan
Invalid data type. Argument must be numeric, char, or logical.
Error in kmedoids (line 220)
wasnan = any(isnan(X),2);
             ^^^^^^^^
Error in Gower_Distance (line 9)
[Idx, C] = kmedoids(DataSet, 2, 'Distance', GowerDst);
2 comentarios
  the cyclist
      
      
 el 22 de Jul. de 2025
				Can you upload the data, or a representative sample that illustrates the problem? You can use the paper clip icon in the INSERT section of the toolbar.
Respuestas (1)
  Torsten
      
      
 el 22 de Jul. de 2025
        
      Editada: Torsten
      
      
 el 22 de Jul. de 2025
  
      To use a distance that is not implemented, you have to define a function handle. Since I guess that GowerDst is not a function handle, MATLAB errors.
Look at the documentation for "kmedoids" for more details:
@distfun
Custom distance function handle. A distance function has the form
function D2 = distfun(ZI,ZJ)
% calculation of distance
...where
- ZI is a 1-by-n vector containing a single observation.
- ZJ is an m2-by-n matrix containing multiple observations. distfun must accept a matrix ZJ with an arbitrary number of observations.
- D2 is an m2-by-1 vector of distances, and D2(k) is the distance between observations ZI and ZJ(k,:).
If your data is not sparse, you can generally compute distance more quickly by using a built-in distance instead of a function handle.
2 comentarios
  Torsten
      
      
 el 22 de Jul. de 2025
				
      Editada: Torsten
      
      
 el 22 de Jul. de 2025
  
			See Edward Barnard's answer here:
I suggest you test whether it's correct for implemented distances once by supplying the distance matrix as below, second by using the 'Distance','...' option and comparing the results.
Or take a look at
DataSet = readtable("Test.xlsx", 'ReadVariableNames', true);
GowerDst = gower(DataSet);
K = 2;
N = 18;
[idx, C, sumd] = kmedoids((1:N)', K, 'Distance', @(ZI, ZJ) GowerDst(ZJ, ZI));
function D = gower(data)
[n, p] = size(data);
D = zeros(n, n);
for i = 1:p
    column = data{:, i};
    if isnumeric(column)
        range = max(column) - min(column);
        if range == 0
            continue;
        end
        d = abs(column - column') / range;
    elseif iscell(column) || iscategorical(column) || isobject(column)
        d = zeros(n, n);
        for j = 1:n
            for k = 1:n
                d(j,k) = ~isequal(column{j}, column{k});
            end
        end
    else
        warning('Skipping column %d: unsupported data type', i);
        continue;
    end
    D = D + d;
end
D = D / p;
end
Ver también
Categorías
				Más información sobre Graphics Performance en Help Center y File Exchange.
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


