Transfering any point in PC space to original space

Dear experts,
I have a difficult question for you. Basically I have a dataset with 6 variables and 27 cases. I did PCA and plottet it. Afterwards I created a circle around it that includes 95% of the points (the circle is regardless in this case.). I have created 8 new points next (A-D and W-Z) as you can See in the following image. Now I want to do PCA reproduction for these 8 points as I want to know what values the variables have for these points.
I would be very glad if you could tell me how I can handle this problem. Thanks in advance.
To make it clear once more. I had 6 variables at first and then seperated 2 PCs, now I have 8 new points and I need to know what values the 6 variables have for them. I hope it´s possible and if it is, I would be very glad if you could tell me how I can handle this problem. Thanks in advance.
edit: I have already found a formular that has to do something with it but to be honest I can´t quite tell what i should do with this formular in my case.
Formular i found:
PCA reconstruction = PC scores * Eigenvectors + Mean
Kind Regards TG

5 comentarios

the cyclist
the cyclist el 16 de Sept. de 2021
I am a little confused about the 8 new points, so let's clarify the terminology.
In your original data, you had 27 observations measured across 6 variables. Then, you did PCA, and have plotted component 1 and component 2. Importantly, that plot is in principal component space (not the original variable space).
Then, it looks like you are constructing your 8 new points in a way that they are defined in the PC space, at regular intervals. For example, Point W has coordinates [PC1,PC2] = [17, 0], approximately.
Now, I am not 100% on what you mean by "I want to do PCA reproduction". Are you asking what values of the original variables would have given those 8 new points, if you apply the same PCA transformation to them?
If that is not what you mean, maybe you could clarify.
Also, FYI, adding a user's name as a tag does not notify them. (Tags are for topics, not users.) Instead, mention them with the @ sign, like this: @Tom.
Tom
Tom el 16 de Sept. de 2021
Editada: Tom el 16 de Sept. de 2021
Thanks for the Info with the tag, but if i typed @the cyclist you didnt show up. Nvm. Yeah you are right. I want to know what values the original variables have for the 8 points.
Eg. I have 6 variables: var1,var2,var3,var4,var5,var6 i have 27 observations and i do PCA and then plot the points in PC space. Basically everything as you said. Now I add these 8 points A-D,W-Z and i want to know what value var1,var2,var3,var4 and so on has for each of those 8 points. I think you got everything right. Maybe you can help me with that. Sadly i cannot give mor information on the dataset as i work for a military concern and it's trusted data. I hope you still understand what I mean. Thanks a lot for helping me!
Tom
Tom el 16 de Sept. de 2021
And I still don't know how to Tag you as your name has an empty space in it. Haha
We don't need your trusted data. Can you make up some generic, non-proprietary data and attach that? And we're still not sure what you want. OK, so you have 4 variables and 27 observations. So what do you want to know? Do you just want 6 PC variables? If so, why -- what are you going to do with them? Or do you want to model the data and use the 6 variables to predict some kind of output value?
Okey I´ll try but I basically think that @the cyclist already almost got it right.
I´ll start with my data:
dataset = readtable(Exampledata);
data = table2arry(dataset(:,4:9)); %now I have a 27x6 table with 6 variables and 27 observations
data = data - mean(data);
[coeff, score, ~, ~, explained, mu] = pca(data)
figure; %now i´m plotting my data
hold on;
plot1 = plot(score(:,1), score(:,2),'r.');
set (plot1, 'Markersize', 16);
widthandheight_cosy = 25
set(gca, 'XLim', [-widthandheight_cosy, widthandheight_cosy], 'YLim',[-widthandheight_cosy,widthandheight_cosy], 'Box','on' );
axis square;
%then i plot the circle but that´s not meaningful for this
XtremeW = [radius_circle 0]; %plotting the extreme points
plot(XtremeW(:,1),XtremeW(:,2),'*black');
text(radius_circle,0,' W');
XtremeY = [-radius_circle 0];
plot(XtremeY(:,1),XtremeY(:,2),'*black');
text(-radius_circle,0,' Y');
XtremeX = [0 radius_circle];
plot(XtremeX(:,1),XtremeX(:,2),'*black');
text(0,radius_circle,' X');
XtremeZ = [0 -radius_circle];
plot(XtremeZ(:,1),XtremeZ(:,2),'*black');
text(0,-radius_circle,' Z');
XtremeA = [XandYkoord XandYkoord];
plot(XtremeA(:,1),XtremeA(:,2),'*black');
text(XandYkoord,XandYkoord,' A');
XtremeB = [-XandYkoord XandYkoord];
plot(XtremeB(:,1),XtremeB(:,2),'*black');
text(-XandYkoord,XandYkoord,' B');
XtremeC = [XandYkoord -XandYkoord];
plot(XtremeC(:,1),XtremeC(:,2),'*black');
text(XandYkoord,-XandYkoord,' C');
XtremeD = [-XandYkoord -XandYkoord];
plot(XtremeD(:,1),XtremeD(:,2),'*black');
text(-XandYkoord,-XandYkoord,' D');
So that is basically everything important of my script for this. Now I would like to get values for the original variables 1 to 6 of the table that I loaded in the beginning for all of the 8 new Points XtremeA - XtremeZ. So Basically I want a new table where I have the 8 points as observations and the 6 original variables as variables and I want values for each variable for each of the points. I hope it makes sense now.
I will attach an excel document that looks similar to that one that i used.

Iniciar sesión para comentar.

 Respuesta aceptada

the cyclist
the cyclist el 16 de Sept. de 2021
Borrowing the first few lines of code from my PCA tutorial ...
rng 'default'
M = 7; % Number of observations
N = 5; % Number of variables observed
% Made-up data
X = rand(M,N);
% De-mean (MATLAB will de-mean inside of PCA, but I want the de-meaned values later)
X = X - mean(X); % Use X = bsxfun(@minus,X,mean(X)) if you have an older version of MATLAB
% Do the PCA
[coeff,score,latent,~,explained] = pca(X);
It is noted that there that coeff transforms the data from the original space to the PC space:
dataInPrincipalComponentSpace = X*coeff;
If we have data in the principal component space, we can transform back to the original space like this:
X_again = dataInPrincipalComponentSpace*inv(coeff); % Will be the same as X (within floating point error)
That particular line of code will transform all of the original data points back from PC space to the original coordinates. Each row of dataInPrincipalComponentSpace is the coordinates of one of the original data points.
If you want to transform some other points, then just use those points' coordinates as rows. Here, I'll just choose those coordinates at random:
random_point_in_pc_space = rand(2,N); % Randomly chosen coordinates for two points in the 5-dimensional PC space
random_point_in_orginal_space = random_point_in_pc_space * inv(coeff); % Same random point, in original coordinate system
Instead of random points, you'll want to use the coordinates of your points (A, B, etc).
A wrinkle in your case is that your points are only specified by the first two PC dimensions, PC1 and PC2. So, your W could be
W = [17, 0, 0, 0, 0, 0]; % Coordinates of one possible W
but it could also be
W = [17, 0, 2, -3, 5, -7]; % Coordinates of a different possible W, with the same PC1 and PC2
In fact, an infinite number of points would project from your 6-dimensional space to your point W in PC coordinates, which means there are also an infinite number of data points from the original space that would transform to W.
I don't know your application, so I can't help you interpret the implications for you.

3 comentarios

Tom
Tom el 17 de Sept. de 2021
Editada: Tom el 17 de Sept. de 2021
Thanks for your answer @the cyclist. I still don´t quite get it. I have explained the case a little more detailed above.
I mean, of course I undestand that coeff transforms the data to PC space and I then also understand why you have to multiply the data with inv(coeff) to get it from PC space back to original space.
But if I now for example have my point
XtremeW = [16 0 0 0 0 0]
It should normally not really matter what the last four coordinates are as they won´t change the result in a drastic way. The last four digits (so the last 4 PCs) only describe less than 20% of the variance, so why not simply put 0 there?
Next step would be:
XtremeW.*inv(coeff) %I think you also need +mean(data) here as pca subtracts it.
So now I don´t quite understand why this doesn´t give me reasonable values vor the original variables. Because correct me if I am wrong again but as far as I know the PCs include every single variable so every PC is influenced by every variable. That should in theory mean that the value of a single PC (PC 1 preferable as it describes most of the variance) for any point in PC space should be enough to get reasonable values for the six original variables. Am I wrong?
Tom
Tom el 17 de Sept. de 2021
Editada: Tom el 17 de Sept. de 2021
So basically I just tried a few things and I figured out that I was right and that it doesn´t matter for me which values I put in for the other four PCs. I need extreme cases to use them to optimise my project. For that, I also get my extreme cases even if I put in a 0 for all the other PCs.
I think that´s my problem solved, I tag you if I find that I am wrong at the moment.
THANKS a lot for your help @the cyclist
Is it possible to rate you somewhere?
the cyclist
the cyclist el 17 de Sept. de 2021
I'm glad it worked out.
Accepting and upvoting answers is the way to "rate" contributors here. No other rating required. :-)

Iniciar sesión para comentar.

Más respuestas (1)

Hello everyone,
I have a question in PCA.I'm working on EEG, I have taken EEG data applied EEMD, got IMFs then applied PCA on IMFs.
[coeff,score,latent,~,explained] = pca(modos);
dataInPrincipalComponentSpace = modos*coeff;
X_again = dataInPrincipalComponentSpace*inv(coeff)';
for me 2 or 3 PCs are enough to retrive the original data. I have tried with above 2 lines but I'm unable to get it.please suggest me how to do it.

8 comentarios

the cyclist
the cyclist el 14 de Dic. de 2021
Editada: the cyclist el 14 de Dic. de 2021
If you only need 2 or 3 principal components, then the most common thing to do would be to simply use the first 2 or 3 columns of dataInPrincipalComponentSpace as the new dataset. You don't need to transform back to the original space (which is what the original poster here needed to do).
I suggest that you take a look at the PCA tutorial that I wrote, which has a more complete explanation of how to use PCA. Then, if you still don't, then I suggest you either
  • make a comment on that thread (not an "answer", as you did here), OR
  • make a whole new question
Sir still I have a doubt on that. Please share your mail ID so that I will attach everything
You can create a new Question and attach everything there.
BOMMALA SILPA
BOMMALA SILPA el 15 de Dic. de 2021
Editada: Walter Roberson el 15 de Dic. de 2021
After EEMD on my EEG signal,got IMFs with size 8X251.
The number of principal components retained for the reconstruction of the clean EOG is based on the threshold value driven from the scree plot. I'm unable to do this exactly.
I have followed the following procedure
[coeff, score, latent, tsquared, explained,mu]=pca(IMF,'NumComponents',3);
% IMFs reconstruction
Re_IMF=score * coeff' + mu;
from these IMFs I have to use only few to get clean EOG. Please suggest me how to do it
and how to plot scree plot between eigen values and principal components
In this comment in my PCA tutorial, I explain how to make a scree plot from the output of pca().
Thank you sir. I have got the scree plot
I think you were not clear with my question
I have an EEG signal,i want to extract the EOG activity in that.
EEMD was performed on the contaminated EEG signal to get the IMFs. We have determined the principal components and arranged them in decreasing order of their respective variation after performing PCA on the IMFs.
only 2 or 3 PCs were sufficient to extract EOG features from the data. This is the thing I have to do.
I have written the code like
load('sc4002e0_recm.mat');
% EEMD
Nstd=0.3*std(X);
NR=100;
MaxIter=10;
[modos its]=eemd(X,Nstd,NR,MaxIter);
for i=1:K
IMF(:,i)=modos(i,:)';
end
%% PCA
[coeff, score, latent, tsquared, explained]=pca(IMF);
then what is the process to extract the EOG from EEG using 2 or 3 PCs. I have tried with this formula also
Re_IMF=score * coeff' + mu;
but I'm not getting the results
Sorry, I don't know the answer to your question.

Iniciar sesión para comentar.

Categorías

Productos

Versión

R2019b

Preguntada:

Tom
el 16 de Sept. de 2021

Comentada:

el 17 de Dic. de 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by