Main Content

cmdscale

Classical multidimensional scaling

Description

Y = cmdscale(D) performs classical multidimensional scaling on the n-by-n distance or dissimilarity matrix D, and returns an n-by-p configuration matrix. The rows of Y correspond to the coordinates of n points in a p-dimensional space, where p < n.

When D is a Euclidean distance matrix, its elements are the pairwise distances between the n points, and p is the dimension of the smallest space in which these points can be embedded.

When D is a non-Euclidean distance matrix or a dissimilarity matrix, p is the number of positive eigenvalues of Y*Y'. In this case, the reduction to p or fewer dimensions provides a reasonable approximation to D only if the negative eigenvalues of Y*Y' are small in magnitude.

Y = cmdscale(D,p) returns a configuration matrix with the embedding dimensionality p, where p is a positive integer less than or equal to n. If a p-dimensional embedding is possible, then Y has size n-by-p. If only a q-dimensional embedding with q < p is possible, then Y has size n-by-q.

[Y,e] = cmdscale(___) additionally returns the eigenvalues of Y*Y' as a numeric column vector of length n, using any of the input argument combinations in the previous syntaxes. If you specify p, then e has length p.

example

Examples

collapse all

Construct a map of 10 US cities based on the geographical distances between them by using the cmdscale function.

Create a distance matrix D that contains the intercity distances in miles. Because D is a full distance matrix, it is square and symmetric, with zeros on the diagonal and positive values off the diagonal.

cities = ...
{"Atl","Chi","Den","Hou","LA","Mia","NYC","SF","Sea","WDC"};
D = [    0  587 1212  701 1936  604  748 2139 2182   543;
       587    0  920  940 1745 1188  713 1858 1737   597;
      1212  920    0  879  831 1726 1631  949 1021  1494;
       701  940  879    0 1374  968 1420 1645 1891  1220;
      1936 1745  831 1374    0 2339 2451  347  959  2300;
       604 1188 1726  968 2339    0 1092 2594 2734   923;
       748  713 1631 1420 2451 1092    0 2571 2408   205;
      2139 1858  949 1645  347 2594 2571    0  678  2442;
      2182 1737 1021 1891  959 2734 2408  678    0  2329;
       543  597 1494 1220 2300  923  205 2442 2329     0];

Pass the distance matrix to the cmdscale function, and return the configuration matrix and eigenvalues.

[Y,e] = cmdscale(D)
Y = 10×6
103 ×

   -0.7188    0.1430    0.0351   -0.0012   -0.0074    0.0015
   -0.3821   -0.3408    0.0296   -0.0082   -0.0120   -0.0023
    0.4816   -0.0253    0.0534    0.0013    0.0157   -0.0010
   -0.1615    0.5728    0.0015   -0.0018   -0.0007    0.0027
    1.2037    0.3901   -0.0186    0.0150   -0.0032   -0.0017
   -1.1335    0.5819   -0.0323   -0.0024    0.0030   -0.0020
   -1.0722   -0.5190   -0.0343   -0.0143    0.0064    0.0003
    1.4206    0.1126   -0.0078   -0.0181   -0.0008    0.0009
    1.3417   -0.5797   -0.0237    0.0060   -0.0014    0.0006
   -0.9796   -0.3355   -0.0029    0.0237    0.0004    0.0010

e = 10×1
106 ×

    9.5821
    1.6868
    0.0082
    0.0014
    0.0005
    0.0000
    0.0000
   -0.0009
   -0.0055
   -0.0355

The configuration matrix Y is 10-by-6 because there are only six positive eigenvalues. Some eigenvalues are negative, indicating that the original distances are non-Euclidean. This result occurs because the Earth's surface is curved.

The two largest positive eigenvalues are much larger in magnitude than the others. Therefore, the first two coordinates of Y are sufficient for a reasonable approximation to D, despite the negative eigenvalues.

Calculate the maximum relative difference between D and an approximation that uses the largest two eigenvalues.

Dapprox = squareform(pdist(Y(:,1:2)));
MaxRelDiff = max(abs(D-Dapprox),[],"all")/max(D,[],"all")
MaxRelDiff = 
0.0075

The reconstructed distance matrix provides a good approximation of the original distance matrix.

Plot the reconstructed city locations as a map. Because D only contains information about the intercity distances, the geographical orientation of the reconstruction is arbitrary.

scatter(Y(:,1),Y(:,2),".")
text(Y(:,1)+25,Y(:,2),cities)
xlabel("Miles")
ylabel("Miles")

Figure contains an axes object. The axes object with xlabel Miles, ylabel Miles contains 11 objects of type scatter, text.

Copyright 2012–2024 The MathWorks, Inc.

Determine how the quality of the distance matrix approximation varies when you reduce points to distances using different metrics.

Randomly generate 10 points in 4-D space that are close to 3-D space. Apply a linear transformation to the points so that their transformed values are close to a 3-D subspace that does not align with the coordinate axes.

rng(0,"twister"); % For reproducibility
A = [normrnd(0,1,10,3),normrnd(0,0.1,10,1)];
B = randn(4,4);
X = A*B;

Create a distance matrix for the points in X using the default Euclidean metric.

D = pdist(X);

Create a configuration matrix Y from the interpoint distances using the cmdscale function.

Y = cmdscale(D);

Compare the quality of the reconstructions when using 2, 3, or 4 dimensions.

maxerr2 = max(abs(pdist(X)-pdist(Y(:,1:2)))) 
maxerr2 = 
0.1631
maxerr3 = max(abs(pdist(X)-pdist(Y(:,1:3)))) 
maxerr3 = 
0.0187
maxerr4 = max(abs(pdist(X)-pdist(Y)))
maxerr4 = 
1.1768e-14

The small maxerr3 value indicates that the first three dimensions provide a good reconstruction of the distance matrix.

Create a distance matrix for the points in X using the cityblock metric.

D = pdist(X,"cityblock");

Create a configuration matrix Y from the interpoint distances and return the eigenvalues.

[Y,e] = cmdscale(D)
Y = 10×6

   -6.0490    0.7237    0.0722   -0.5919    0.3802    0.1243
   11.3882   -0.0401    2.0088    0.0420   -0.0079   -0.1248
   -9.6056   -0.8110   -0.0974    0.7589    0.2705    0.0341
   -2.3302    1.9110    0.1458    0.3004   -0.2000   -0.0681
   -0.5196   -0.2462    0.0621   -0.0654    0.6354   -0.0340
   -9.7676    0.0801   -0.0075   -0.4611   -0.3454    0.0470
   -6.1074   -0.2389    0.0042    0.1992   -0.4226   -0.1820
    1.8210   -1.7431   -0.0503   -0.2991   -0.2408    0.0089
   10.9626    0.1941   -1.5058   -0.0541    0.0600   -0.4369
   10.2077    0.1703   -0.6322    0.1712   -0.1295    0.6315

e = 10×1

  624.6469
    8.0643
    6.7448
    1.3965
    1.0377
    0.6631
   -0.0000
   -0.0685
   -0.6633
   -5.6586

Evaluate the quality of the reconstruction by calculating the maximum relative difference.

maxerr = max(abs(D-pdist(Y,"cityblock")))/max(D)
maxerr = 
0.2019

One of the eigenvalues is highly negative, which might account for the relatively poor quality of the reconstruction.

Input Arguments

collapse all

Distance or dissimilarity matrix for n points, specified as an n-by-n numeric matrix. You can also specify D as a numeric row vector of length k that contains the n*(n-1)/2 upper triangle elements of the distance or dissimilarity matrix. In this case, the software converts D into a square matrix using the squareform function. D can have one of the following matrix types.

Matrix TypeSymmetricDescription
Euclidean distanceYesPairwise Euclidean distances between n points
Non-Euclidean distanceYesNon-Euclidean pairwise distances between n points (see pdist)
Full dissimilarityYes

Zeros along the diagonal, and positive dissimilarity values off the diagonal

Full dissimilarity (upper triangle form)NoPositive dissimilarity values above the diagonal, and zeros elsewhere
Full similarityYes
  • Ones along the diagonal, and values less than one off the diagonal.

  • cmdscale transforms a similarity matrix to a dissimilarity matrix in such a way that the distances between the points in Y equal or approximate sqrt(1 – D). To use a different transformation, transform the similarities prior to calling cmdscale.

Data Types: single | double

Dimensionality of the embedding, specified as an integer between 1 and n, where n is the number of rows and columns in D. Specify p to reduce the computational burden when n is very large.

When you specify p:

  • If a p-dimensional embedding is possible, then Y has size n-by-p.

  • If only a q-dimensional embedding with q < p is possible, then Y has size n-by-q.

Data Types: single | double

Output Arguments

collapse all

Configuration matrix for the n-by-n matrix D, returned as an n-by-p numeric matrix. The rows of Y correspond to the coordinates of n points in a p-dimensional space, where p < n.

When you do not specify p:

  • If D is a Euclidean distance matrix, its elements are the pairwise distances between the n points, and p is the dimension of the smallest space in which these points can be embedded.

  • If D is a non-Euclidean distance matrix or a dissimilarity matrix, p is the number of positive eigenvalues of Y*Y'. In this case, the reduction to p or fewer dimensions provides a reasonable approximation to D only if the negative eigenvalues of Y*Y' are small in magnitude.

When you specify p:

  • If a p-dimensional embedding is possible, then Y has size n-by-p.

  • If only a q-dimensional embedding with q < p is possible, then Y has size n-by-q.

Eigenvalues of Y*Y', returned as a numeric column vector. If you specify p, then e has length p. Otherwise, e has length n. When D is Euclidean, the first p elements of e are positive, and the rest are zero. If the first k elements of e are much larger than the remaining (n – k) elements, you can use the first k columns of Y as k-dimensional points whose interpoint distances approximate D. This approach can provide a useful dimension reduction for visualization, which is analogous to Principal Component Analysis (PCA).

References

[1] Cox, Trevor F., and Michael A. A. Cox. Multidimensional Scaling. 2nd ed. Monographs on Statistics and Applied Probability 88. Boca Raton: Chapman & Hall/CRC, 2001.

[2] Davison, Mark L. Multidimensional Scaling. Wiley Series in Probability and Mathematical Statistics. New York: Wiley, 1983.

[3] Seber, G. A. F. Multivariate Observations. 1st ed. Wiley Series in Probability and Statistics. Wiley, 1984.

Version History

Introduced before R2006a