Linear data fitting

Question

0 votos

Hi everyone,

So, I have two sets of data, and plotting them against each other I get something like this:

http://i.imgur.com/uN7ERl.jpg

Easily, it is possible to identify 3 groups of data that would be fitted by 3 linear regressions. The problem is that the data points don't follow an order in witch it is possible just to break the arrays in 3 parts and get 3 different data sets. The data points are all mixed up, and there are even points that don't fit in any of these linear fitting groups.

My question is: what is the easiest way for me to break the data set in smaller segments that will be fitted by a linear regression, ignoring the data points that will not fit in any group?

Thank you all in advance.

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Teja Muppirala el 29 de Sept. de 2011

Abrir en MATLAB Online

0 votos

The problem is that your data is scaled very differently than the random data. If you normalize the data to a better scale first, then fit lines, and then undo the normalization, everything works fine. Convergence to the right answer depends on initial conditions, so I just have it running in an infinite loop until the user stops it. You could certainly implement some check to stop automatically.

thisfig = figure;
load DAT_SET  %<----- Run it with your data
plot(x,y,'*');
%Rescale to [0 1]
xmin = min(x); xrng = max(x)-min(x);
ymin = min(y); yrng = max(y)-min(y);
xscl = (x-xmin)/(xrng);
yscl = (y-ymin)/(yrng);
% Sum of squares seems to work faster/better than sum of absolute differences
Pfun = @(P) sum(min(abs(bsxfun(@minus,[P(1)*xscl + P(2) ; P(3)*xscl + P(4) ; P(5)*xscl+ P(6)],yscl)).^2));
options = optimset('display','off','Largescale','off');
options.MaxIter = 1e6;
options.MaxFunEvals = 1e6;
% Runs forever. Close the window when you want it to stop.
fmin = Inf;
hplots = [];
n = 0;
while ishandle(thisfig)
    n = n+1;
    [P0,f] = fminunc(Pfun,randn(1,6),options);
    if f < fmin
        fmin = f;
        P_opt = P0;
        delete(hplots);
        xplot = [xmin xmin+xrng];
        hold on;
        % Convert back to the original coordinates
        P_opt([2 4 6]) = -P_opt([1 3 5])*(yrng/xrng)*xmin + P_opt([2 4 6])*(yrng) + ymin;
        P_opt([1 3 5]) =  P_opt([1 3 5])*(yrng/xrng);
        hplots(1) = plot(xplot,P_opt(1)*xplot + P_opt(2),'r');
        hplots(2) = plot(xplot,P_opt(3)*xplot + P_opt(4),'r');
        hplots(3) = plot(xplot,P_opt(5)*xplot + P_opt(6),'r');
    end
    title(['Close this figure when you are happy. Iteration number: ' num2str(n)]);
    drawnow;
end
% Replot
figure(thisfig);
hold all;
plot(x,y,'*');
plot(xplot,P_opt(1)*xplot + P_opt(2),'r');
plot(xplot,P_opt(3)*xplot + P_opt(4),'r');
plot(xplot,P_opt(5)*xplot + P_opt(6),'r');

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Nuno Martins el 30 de Sept. de 2011

Thanks Teja, it worked for most of my cases. In some I'm experiencing some bad fitting, but I guess that is due to the non uniform data that I have.

Once again, thank you very much. You were very helpful.

Iniciar sesión para comentar.

Answer 2

Fangjun Jiang el 26 de Sept. de 2011

0 votos

Certainly this requires some intelligence. I don't see a simple algorithm that can achieve it. What I suggest is to combine programming with your visual inspection.

You've seen the figure so you can roughly find out the fitting for those three straight lines. Use those three lines to calculate the errors for each data point. If the error is large for a particular line, eliminate that data point for that line.

With that, you can separate all the data points into three groups and then you can do the accurate fitting.

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 3

Teja Muppirala el 27 de Sept. de 2011

Abrir en MATLAB Online

0 votos

One possible way, if you have access to the functionality in the Optimization Toolbox, is to treat this as an optimization problem. Find 6 parameters (2 for each line) that minimize the error amongst the points.

Try it out a few times using this script, it works great!

figure;
x1 = randn(1,100);
x2 = 1+randn(1,100);
x3 = 2+randn(1,100);
Ptrue = 3*randn(1,6);
y1 = Ptrue(1)*x1+Ptrue(2);
y2 = Ptrue(3)*x2+Ptrue(4);
y3 = Ptrue(5)*x3+Ptrue(6);
y1(1:10:end) = randn(size(y1(1:10:end))); % Add noise
plot([x1 x2 x3],[y1 y2 y3],'*')
x = [x1 x2 x3];
y = [y1 y2 y3];
% Use sum of absolute differences (not squares) for robustness
Pfun = @(P) sum(min(abs(bsxfun(@minus,[P(1)*x + P(2) ; P(3)*x + P(4) ; P(5)*x + P(6)],y))));
options = optimset('display','off','Largescale','off');
options.MaxIter = 1e6;
options.MaxFunEvals = 1e6;
% Do it 10 times just to make sure
fmin = Inf;
for n = 1:10
    [P0,f] = fminunc(Pfun,randn(1,6),options);
    if f < fmin
        fmin = f;
        P_opt = P0;
    end
end
xplot = [min(x), max(x)];
hold on;
plot(xplot,P_opt(1)*xplot + P_opt(2),'r')
plot(xplot,P_opt(3)*xplot + P_opt(4),'r')
plot(xplot,P_opt(5)*xplot + P_opt(6),'r')

Another approach, if you are familiar with image processing, is to use a Hough transform to try and find those lines. For only 3 lines, I think the optimization approach is much easier and more accurate though.

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Nuno Martins el 27 de Sept. de 2011

Hi there Teja,

Your script works great with the random data. Unfortunately it doesn't fit the linear regressions so well with my own data. Here is what I'm getting using your algorithm:

all area: http://i.imgur.com/GWGRk.png

data set area: http://i.imgur.com/bMJvU.png

I have no idea what can be wrong as it works perfectly with the random data set.

Here is my data if you would like to check it out:

http://dl.dropbox.com/u/30586819/DAT_SET.mat

Thank you for your answer.

Iniciar sesión para comentar.

Answer 4

Richard Willey el 27 de Sept. de 2011

Abrir en MATLAB Online

0 votos

From my perspective, the easiest way to solve this one is

Fit a linear model to the complete data set
Apply a clustering algorithm to the residuals

Check out the following

Generate some data

X1 = 1:100;
X1 = X1';
Y1 = 5*X1;
X2 = 1:100;
X2 = X2'+ 1/3;
Y2 = 5* X2 + 50;
X3 = 1:100;
X3 = X3'+ 2/3;
Y3 = 5* X2 + 100;
X = vertcat(X1,X2,X3);
Y = vertcat(Y1,Y2,Y3);
scatter(X,Y, '.')
%%Apply KMeans Clustering to the residuals
B = regress(Y,X);
YHat = X*B;
resid = Y - YHat;
index = kmeans(resid,3);
%%Generate three regression models
for i = 1:3
      B(i) = regress(Y(index == i), X(index == i));
end

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Linear data fitting

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Más respuestas (3)

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Categorías

Productos

Etiquetas

Community Treasure Hunt

Linear data fitting

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Más respuestas (3)

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Categorías

Productos

Etiquetas

Ver también

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos