Linear data fitting
Mostrar comentarios más antiguos
Hi everyone,
So, I have two sets of data, and plotting them against each other I get something like this:
Easily, it is possible to identify 3 groups of data that would be fitted by 3 linear regressions. The problem is that the data points don't follow an order in witch it is possible just to break the arrays in 3 parts and get 3 different data sets. The data points are all mixed up, and there are even points that don't fit in any of these linear fitting groups.
My question is: what is the easiest way for me to break the data set in smaller segments that will be fitted by a linear regression, ignoring the data points that will not fit in any group?
Thank you all in advance.
Respuesta aceptada
Más respuestas (3)
Fangjun Jiang
el 26 de Sept. de 2011
0 votos
Certainly this requires some intelligence. I don't see a simple algorithm that can achieve it. What I suggest is to combine programming with your visual inspection.
You've seen the figure so you can roughly find out the fitting for those three straight lines. Use those three lines to calculate the errors for each data point. If the error is large for a particular line, eliminate that data point for that line.
With that, you can separate all the data points into three groups and then you can do the accurate fitting.
Teja Muppirala
el 27 de Sept. de 2011
One possible way, if you have access to the functionality in the Optimization Toolbox, is to treat this as an optimization problem. Find 6 parameters (2 for each line) that minimize the error amongst the points.
Try it out a few times using this script, it works great!
figure;
x1 = randn(1,100);
x2 = 1+randn(1,100);
x3 = 2+randn(1,100);
Ptrue = 3*randn(1,6);
y1 = Ptrue(1)*x1+Ptrue(2);
y2 = Ptrue(3)*x2+Ptrue(4);
y3 = Ptrue(5)*x3+Ptrue(6);
y1(1:10:end) = randn(size(y1(1:10:end))); % Add noise
plot([x1 x2 x3],[y1 y2 y3],'*')
x = [x1 x2 x3];
y = [y1 y2 y3];
% Use sum of absolute differences (not squares) for robustness
Pfun = @(P) sum(min(abs(bsxfun(@minus,[P(1)*x + P(2) ; P(3)*x + P(4) ; P(5)*x + P(6)],y))));
options = optimset('display','off','Largescale','off');
options.MaxIter = 1e6;
options.MaxFunEvals = 1e6;
% Do it 10 times just to make sure
fmin = Inf;
for n = 1:10
[P0,f] = fminunc(Pfun,randn(1,6),options);
if f < fmin
fmin = f;
P_opt = P0;
end
end
xplot = [min(x), max(x)];
hold on;
plot(xplot,P_opt(1)*xplot + P_opt(2),'r')
plot(xplot,P_opt(3)*xplot + P_opt(4),'r')
plot(xplot,P_opt(5)*xplot + P_opt(6),'r')
Another approach, if you are familiar with image processing, is to use a Hough transform to try and find those lines. For only 3 lines, I think the optimization approach is much easier and more accurate though.
1 comentario
Nuno Martins
el 27 de Sept. de 2011
Richard Willey
el 27 de Sept. de 2011
From my perspective, the easiest way to solve this one is
- Fit a linear model to the complete data set
- Apply a clustering algorithm to the residuals
Check out the following
Generate some data
X1 = 1:100;
X1 = X1';
Y1 = 5*X1;
X2 = 1:100;
X2 = X2'+ 1/3;
Y2 = 5* X2 + 50;
X3 = 1:100;
X3 = X3'+ 2/3;
Y3 = 5* X2 + 100;
X = vertcat(X1,X2,X3);
Y = vertcat(Y1,Y2,Y3);
scatter(X,Y, '.')
%%Apply KMeans Clustering to the residuals
B = regress(Y,X);
YHat = X*B;
resid = Y - YHat;
index = kmeans(resid,3);
%%Generate three regression models
for i = 1:3
B(i) = regress(Y(index == i), X(index == i));
end
Categorías
Más información sobre Linear and Nonlinear Regression en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!