How to know if PCA worked?

7 visualizaciones (últimos 30 días)
Emil Ås
Emil Ås el 30 de Oct. de 2019
Respondida: Emil Ås el 6 de Dic. de 2019
Hi. This question is from an assignment in financial econometrics in university.
I have calculated the principal components from a big set of data. Our data consists of thirty columns of variables which is thirty bonds ranging from 1 to 30 years to maturity. This data is recorded for a long period of time for each day the market has been open, ranging back to about 1960s. Therefore is contains 8461 rows.
So, this is what I have done so far after importing the data:
T1TT = table2timetable(T1); %creating a timetable of T1 (the data)
cr = corr(T1TT{:,:}); % Calculating the correlation between the different bonds
% calculate eigenvectors and eigenvalues of the correlation matrix
[eigenVectors,eigenValues] = eig(cr);
%% eig function returns eigenValues as a diagonal matrix (i.e., zeros elsewhere)
eigenValues = diag(eigenValues);
%% sort eigenValues in decending order
eigenValues = sort(eigenValues, 'descend');
Our job now is to check if the principal component analysis worked. This text is taken from the assignment paper :
Finally, you want to make sure that your principal component analysis worked and it really transformed the correlated explanatory variables into uncorrelated principal components. To standardize each of the thirty time series, subtract from each observation the mean of the time series and divide the result by the standard deviation of the time series. Next, multiply the matrix containing all the standardized time series with the matrix of eigenvectors to compute the time series of the thirty principal components. Calculate all possible correlations between the thirty principal components - but do not report it in your solution paper! Instead describe the pattern the correlation matrix shows. Did your principal component analysis work?
This is what I did to try to answer the question above:
mean = -0.0061; % the mean of the eigenVectors (seen from workspace)
std = std(eigenVectors); % std of the eigenVectors
standardized = (eigenVectors - mean) / std; % standardizing the time series
multiplied = standardized .* eigenVectors; % multiplying
multiplied_corr = corr(multiplied(:,:)); % finding the correlation matrix (they should now be uncorrelated)
However, the problem is that the correlation matrix which is being returned in "multiplied_corr" seems strange to me. This is because the matrix doesnt return uncorrelated principal components which I think it should do, they are still correlated in some way or another.
DOES ANYONE KNOW ABOUT OTHER WAYS FOR SOLVING THAT QUESTION AND CHECK IF THE PC ANALYSIS WORKED?

Respuesta aceptada

Ridwan Alam
Ridwan Alam el 21 de Nov. de 2019
Editada: Ridwan Alam el 21 de Nov. de 2019
Finally, you want to make sure that your principal component analysis worked and it really transformed the correlated explanatory variables into uncorrelated principal components. To standardize each of the thirty time series, subtract from each observation the mean of the time series and divide the result by the standard deviation of the time series. Next, multiply the matrix containing all the standardized time series with the matrix of eigenvectors to compute the time series of the thirty principal components. Calculate all possible correlations between the thirty principal components - but do not report it in your solution paper! Instead describe the pattern the correlation matrix shows. Did your principal component analysis work?
Assuming your data is in a table T1 of size = 8641x30 (excluding the day/timeindex)
To standardize each of the thirty time series, subtract from each observation the mean of the time series and divide the result by the standard deviation of the time series.
There are two ways to do this:
standardized_T1 = zscore(table2array(T1));
or,
standardized_T1 = table2array(T1);
standardized_T1 = (standardized_T1 - mean(standardized_T1))./std(standardized_T1);
Next, multiply the matrix containing all the standardized time series with the matrix of eigenvectors to compute the time series of the thirty principal components.
[eigen_vector,eigen_values] = eig(cov(standardized_T1),'vector'); % since you standardized, corr() and cov() will give same output
[eigen_values,descending_index] = sort(eigen_values,'descend'); % sorting is optional for your task, I guess
eigen_vector = eigen_vector(:,descending_index); % sorting is performed to make the first PC capture max variance
pca_scores = standardized_T1*eigen_vector; % size should be 8461x30
Calculate all possible correlations between the thirty principal components - but do not report it in your solution paper! Instead describe the pattern the correlation matrix shows. Did your principal component analysis work?
pca_corr = corr(pca_scores);
% it should be a diagonal matrix
% which means all the 30 PCs are orthogonal, i.e. "uncorrelated"
Hope this helps!!
%% sanity check
using MATLAB's built-in function pca():
[pca_coeff_m,pca_scores_m] = pca(standardized_T1);
% pca_scores_m should match with the pca_scores above
% then the correlation would give same outcomes :)

Más respuestas (1)

Emil Ås
Emil Ås el 6 de Dic. de 2019
Hi! Yes Ridwan, I were able to get it right afterall. Either way, I appreciate your answer.

Categorías

Más información sobre Financial Toolbox en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by