using 'als' algorithm in pca

17 visualizaciones (últimos 30 días)
Sagar
Sagar el 9 de Jul. de 2015
Respondida: Purvaja el 6 de Mzo. de 2025
I am doing PCA using 'als' algorithm as below. When I specify a specific number of components (1 here), the 'explained' term gives 100% as explained variance. How is it possible? Shouldn't be there more than one components as I have more than 1 variables? Shouldn't the explained give the percentage of variance explained by the first component only?
However, When I use 'eig' as the algorithm, it gives the percentage variance explained by each components, which I expected. Could someone explain this?
[COEFF, SCORE, latent, tsquared, explained, mu1] = pca (combined_aod_wind_raw, 'Centered', 'off', 'numcomponents', 1, 'Algorithm', 'als');

Respuestas (1)

Purvaja
Purvaja el 6 de Mzo. de 2025
Hi @Sagar,
To understand why Alternating Least Squares (als) algorithm gives 100% variance as compared to eigen value decomposition (eig) algorithm in PCA, lets dive into their concepts:
  1. When you perform PCA using the "eig" algorithm, MATLAB computes the full eigenvalue decomposition of the data’s covariance matrix. This method produces a complete set of principal components, and the percentage of variance each component explains is calculated by dividing its corresponding eigenvalue by the sum of all eigenvalues. Even if you request only one component, the reported “explained” variance represents just that component’s fraction of the total variance present in the data.
  2. In contrast, the "als" algorithm takes an iterative approach to approximate your data with a low-rank model. When you specify a single component using "als", the algorithm finds the best one-dimensional approximation by minimizing reconstruction error. In doing so, it normalizes the variance within that approximation. As a result, because the model is forced to use only one component, it reports that this component accounts for 100% of the variance of the approximated data—even though it might not capture 100% of the total variance from the original dataset.
Thus, the difference in the variance percentages arises from the way each algorithm computes and normalizes variance:
'eig' Algorithm:
  1. Computes a full decomposition of the covariance matrix.
  2. Variance explained is based on the complete variance distribution across all components.
  3. The reported percentage for a single component shows its share relative to the total variance in the original data.
'als' Algorithm:
  1. Uses an iterative method to derive a low-rank approximation.
  2. When limited to one component, the algorithm normalizes the variance within the approximated subspace.
  3. It then reports that the single component explains 100% of the variance in that approximation, not the full data.
For more clarification, you can refer to the following resources,
  1. pca: https://www.mathworks.com/help/stats/pca.html
  2. eig: https://www.mathworks.com/help/matlab/ref/eig.html
For “als” you can find the example on “pca’s” documentation page or else enter following command in your command line for example using “als”:
openExample('stats/PCAUsingALSforMissingDataExample')
Or you can access release specific documentation using these commands in your MATLAB command window respectively:
web(fullfile(docroot, 'stats/pca.html'))
web(fullfile(docroot, 'matlab/ref/eig.html'))
Hope this solves your doubt!

Categorías

Más información sobre Dimensionality Reduction and Feature Extraction en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by