PCA output: coefficients vs loadings

19 visualizaciones (últimos 30 días)
Mathew Guilfoyle
Mathew Guilfoyle el 11 de Feb. de 2013
Editada: Seung Yi Lee el 30 de Ag. de 2021
I would be grateful for some explanation on the output of principal components analysis (pca) from the Statistics Toolbox.
I have a dataset with 150 variables and ~50000 observations.
When I submit this to PCA there is one dominant PC/latent variable that accounts for >95% of the variance.
However, the first column of the output coefficient matrix has very low values for the loading of all the original variables (~0.06). My understanding is that the sum of squared loadings (i.e. the sum of squares of each column of the coefficient matrix) should equal the eigenvalues corresponding to each PC. However the sum(coeff.^2) shows 1 for all columns. This leads me to suspect that the loadings for each column are being scaled?
If I put the same data into SPSS I get the same eigenvalues/% explained but the component loadings on PC1 are now between 0.7 and 0.95.
Could anyone explain why and how these outputs differ?
Thanks

Respuestas (5)

Juyeong Choi
Juyeong Choi el 21 de Dic. de 2014
So, how do we calculate the loading for the PC1 as obtained in SPSS? Is there anyone who has an idea?
  1 comentario
Yuchun Zhou
Yuchun Zhou el 8 de Jul. de 2019
Hi, do you know how to convert eventually?

Iniciar sesión para comentar.


Xiaosha Wang
Xiaosha Wang el 31 de Jul. de 2015
The output of matlab is coefficient matrix, whereas the output of SPSS is loadings, defined as the correlation between a given principle component and the original variable. The two outputs (coefficient and loadings) are proportional.

the cyclist
the cyclist el 11 de Feb. de 2013
Editada: the cyclist el 13 de Feb. de 2013
Disclaimer: I am not an expert on PCA. [EDIT: Proof of this is that I was wrong that MATLAB scales. See Ilya's answer, and my comment to my own answer, below.]
I believe that this difference is due to the fact that MATLAB first "centers and scales" the original data into z-scores. I am guessing that differences in the loadings are going to be related to that transformation. (Maybe a scaling factor of the standard deviation of each variable?)
The wikipedia page ( http://en.wikipedia.org/wiki/Principal_component_analysis ) is a good resource. The second paragraph has a brief discussion of the scaling.
  1 comentario
the cyclist
the cyclist el 13 de Feb. de 2013
Matthew, did you ever resolve this? As Ilya pointed out, I was mistaken that MATLAB also scales the data to a z-score. It may be that SPSS does scale. I could not find definitive documentation online about this. I did see that SAS seems to do the scaling automatically. (It's often a good idea to scale, especially if your variables have very different magnitudes.)

Iniciar sesión para comentar.


Ilya
Ilya el 11 de Feb. de 2013
The princomp and pca functions center the data but do not scale. (In addition, pca allows not to center.)
The easiest way to understand PCA is using eigenvalue decomposition of the covariance matrix Sigma:
Sigma = V*Lambda*V'
Lambda is the diagonal matrix of eigenvalues. V is an orthonormal matrix of coefficients. Orthonormality implies that the 2-norm of every column is 1.
This is what the MATLAB implementation does. I am not familiar with the SPSS implementation.

Seung Yi Lee
Seung Yi Lee el 30 de Ag. de 2021
Editada: Seung Yi Lee el 30 de Ag. de 2021
Many years later of the original question posted, I ran into the same problem then figured out.
Coefficient (loading) is scaled by their corresponding egienvalue. Correcting them into the unscaled loading worked for me by using the equation below.
unscaled_loading = coeff.*sqrt(latent)'

Categorías

Más información sobre Dimensionality Reduction and Feature Extraction en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by