Linear regression on training set

I have some data that I want to divide into a training set and a validation set in order to do linear regression on the training set to find y0 and r. The training set should contain at least 50% of the data. My code so far is that below:
A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300]';
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
idx=randperm(numel(A))
subSet1=A(idx(1:5)) %Trainingset
subSet2=A(idx(6:end)) %Validationset
If I can assume the function is exponential and is y(t)= y0*e^rt how do I continue to plot the training set to find y0 and r?
Thankful for all help!

9 comentarios

J. Alex Lee
J. Alex Lee el 10 de Sept. de 2020
you already identified that your regression can be made into linear form, so that's already a big hint for you...
katara
katara el 10 de Sept. de 2020
Yeah so, I tried rewriting the function as log(y)=log(y0) + rt and then using polyfit(t, log(y),1) but since y0 is unknown that doesn't work.
katara
katara el 10 de Sept. de 2020
Editada: katara el 10 de Sept. de 2020
I just realized I could just name a new variable y = log(y) and use polyfit from there. So my code is:
A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300]';
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
t1=[1930, 1943, 1966, 1976, 1991];
idx=randperm(numel(A));
subSet1=A(idx(1:5)); %Trainingset
subSet2=A(idx(6:end)); %Validationset
y=log(subSet1);
c=polyfit(t1,y, 1)
r=c(1);
lny0=c(2);
y0=exp(c(2));
y2 = y0*exp(r*t);
plot(t,y2,'*')
But now I have chosen that t1 is the first five years of t, which won't correspond correctly to the randomly chosen values of the training set. Is there a way of choosing five t values that will correspond to the randomly chosen values?
the five t values that will correspond to the randomly chosen values are used by using the idx vector similarly to what you do for A.
A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300]';
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
idx=randperm(numel(A));
subSet1=A(idx(1:5)); %Trainingset
subSet2=A(idx(6:end)); %Validationset
t1 = t(idx(1:5)); %t values for Trainingset
y=log(subSet1);
c=polyfit(t1,y, 1)
r=c(1);
lny0=c(2);
y0=exp(c(2));
y2 = y0*exp(r*t);
plot(t,y2,'*')
And to apply your polyfit result you could just use polyval.
% Or you could use
y2 = exp(polyval(c,t));
plot(t,y2);
Adam Danz
Adam Danz el 10 de Sept. de 2020
Editada: Adam Danz el 10 de Sept. de 2020
Johannes has the right approach (maybe it can be written as an answer). It can be generalized to any size dataset using
idx = randperm(numel(A));
nTrain = ceil(numel(A)/2);
% nTest = numel(A)-nTrain; % if needed
trainIdx = 1:nTrain;
testIdx = nTrain+1 : numel(A);
trainSet = [A(trainIdx); t(trainIdx)]; % assuming A and t are row vectors
testSet = [A(testIdx); t(testIdx)]; % same assumptionx
% Then proceed with fitting on the trainSet and measuring
% error on the testSet
Also note that if you're planning on using a more rigorous cross validation, use cvpartition to partition your data.
katara
katara el 10 de Sept. de 2020
Thank you!
One question to Johannes, how can I plot the polyfit using polyval. In other problems I have used for example:
c=polyfit(t, temp, 2)
x=polyval(c,t)
plot(t,temp,'*', t, x)
However, for this problem I tried:
y=log(subSet1);
c=polyfit(t1,y, 1)
p=polyval(c,t);
r=c(1);
lny0=(c(2));
y0=exp(c(2));
y2 = y0*exp(r*t);
plot(t,y2,'*',t,p)
And it didn't work. The code You wrote with polyval didn't work either.
The whole code is now:
A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300]';
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
idx=randperm(numel(A));
subSet1=A(idx(1:5)); %Trainingset
subSet2=A(idx(6:end)); %Validationset
t1=t(idx(1:5)); %t values for Trainingset
y=log(subSet1);
c=polyfit(t1,y, 1)
p=polyval(c,t);
r=c(1);
lny0=(c(2));
y0=exp(c(2));
y2 = y0*exp(r*t);
plot(t,y2,'*',t,p)
you just need to exponentiate the result of polyval (remember you took the log), and I would wager the plot you really want is
plot(t,A,'*',t,exp(polyval(c,t)))
Or if I may:
A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300];
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
idx=randperm(numel(A));
subSet1=A(idx(1:5)); %Trainingset
subSet2=A(idx(6:end)); %Validationset
t1=t(idx(1:5)); %t values for Trainingset
t2=t(idx(6:end)); %t values for Trainingset
y=log(subSet1);
c=polyfit(t1,y, 1)
p=polyval(c,t);
r=c(1);
y0=exp(c(2));
yMdlFn = @(t)(y0*exp(r*t));
% to evaluate on test set
yMdlTest = yMdlFn(t2)
% more comprehensive plot
figure(1); cla; hold on
plot(t1,subSet1,'*')
plot(t2,subSet2,'o')
fplot(yMdlFn,[1929,2009])
But also recommend implement Adam's generalization to arbitrarily large data sets partitioned into arbitrarily sized training and test sets (although i think the code posted doesn't work)
Image Analyst
Image Analyst el 10 de Sept. de 2020
If you want a log fit, use fitnlm() rather than polyfit().
J. Alex Lee
J. Alex Lee el 10 de Sept. de 2020
i would take linear least squares anywhere i can get it, including this situation. linear fitting doesn't require initial guesses and guaranteed to give a "result", and is faster. now you could use the result of the polyfit to do a nonlinear fit, if you want to define the least squares differently. But you're still left with a choice on how to define your residual anyway, so you have a lot more things to worrry about if you care to that level with nonlinear fitting.

Iniciar sesión para comentar.

Respuestas (1)

Johannes Hougaard
Johannes Hougaard el 11 de Sept. de 2020
the five t values that will correspond to the randomly chosen values are used by using the idx vector similarly to what you do for A.
A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300]';
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
idx=randperm(numel(A));
subSet1=A(idx(1:5)); %Trainingset
subSet2=A(idx(6:end)); %Validationset
t1 = t(idx(1:5)); %t values for Trainingset
y=log(subSet1);
c=polyfit(t1,y, 1)
r=c(1);
lny0=c(2);
y0=exp(c(2));
y2 = y0*exp(r*t);
plot(t,y2,'*')
And to apply your polyfit result you could just use polyval.
% Or you could use
y2 = exp(polyval(c,t));
plot(t,y2);

Etiquetas

Preguntada:

el 10 de Sept. de 2020

Respondida:

el 11 de Sept. de 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by