how to use maximum likelihood estimation (MLE) to deal with censored data to get its linear regression
7 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Dear guys,
The matlab code is shown below. x and y are experimental data and plotted in figure1 with blue stars. The relationship between x and y is supposed to be linear following the equation y=x and it is plotted in figure1 with blue line. But due to limitation of data acquisition method, the last 5 data in y vector are censored to 15. When using polyfit function (least square estimation?) to get the linear equation (result shown in figure1 with red line), the result is slightly wrong.
Someone has told me that maximum likelihood method could more or less fixed this problem. I guess I should use function MLE. But could someone tell me how to use it?
Thank you very much!
David
%%%plot experimental measured data (somehow censored).
x=[0:1:20];
y=[-0.709 1.017 2.50 2.93 3.08 4.12 6.16 7.59 8.25 8.10 9.57 11.4 11.3 13.4 13.9 14.3 15 15 15 15 15];
figure(1);plot(x,y,'*');hold on;
%%%plot correct (original) linear regression line
z=x;
figure(1);plot(x,z);
%%plot censored linear regression line
p=polyfit(x,y,1);
figure(1);plot(x,p(2)+p(1)*x,'r');
legend('censored data','correct linear regression','wrong linear regression')
0 comentarios
Respuestas (2)
the cyclist
el 24 de Mzo. de 2011
Let me start by thanking you for such a well written question, with functioning code! (It could have been made slightly better if you had used the markup tools to format the code.) I have been despairing lately under a rash of really hard-to-understand questions.
POLYFIT uses the method of least squares, which results in the maximum likelihood estimates for the coefficients (under the assumption of normally distributed errors). So, I would not strictly say that what you need is maximum likelihood, because you are already doing that.
Rather, what you might need is to do a non-linear fit, where your fitting function is linear for data up to 15, then equal to 15 for larger values. If you have the Statistics Toolbox, you can do this type of fit with the NLINFIT function.
It might be easier, though, to simply self-censor the input to your function by manually identifying the y's that are equal to 15:
indexToCensoredData = (y==15);
and feeding only only the non-censored data into POLYFIT.
6 comentarios
the cyclist
el 24 de Mzo. de 2011
At the risk of repeating myself: The key is to decide what you want to assume about those measurements. If you do anything other than (a) treating them like the other points, or (b) ignoring them altogether, then it seems to me you are doing a non-linear regression, and NLINFIT will do the job for you.
the cyclist
el 25 de Mzo. de 2011
I see several options here:
- If you are able to identify a priori the points where the value is over the limit, and therefore know that the datum is not a good representative of the distribution you are fitting, then I would censor those manually, and fit only the known good points.
- Alternatively, if you know the limit (say, x=15, as in your example ), you could instead fit a nonlinear function that would be something like
f = @(b,x) b.*x .* (x < 15) + 15 * (x>=15)
This would include the censored points, but I think would correctly render the linear part of the function and effectively ignore the rest.
- Finally, if you do not know the limit, you could still define a nonlinear function
f = @(bc,x) bc(1).*x .* (x < bc(2)) + bc(2) * (x>=bc(2))
where "c" (the second element of the vector) bc is the unknown limit, and you could fit both the linear piece and the limit.
Here is code that adds the second alternative to your plot:
x=(0:1:20)';
y=[-0.709 1.017 2.50 2.93 3.08 4.12 6.16 7.59 8.25 8.10 9.57 11.4 11.3 13.4 13.9 14.3 15 15 15 15 15]';
%%%correct (original) linear regression line
z=x;
%%plot censored linear regression line
p=polyfit(x,y,1);
f = @(b,x) b.*x .* (x < 15) + 15 .* (x>=15);
beta = nlinfit(x,y,f,0);
figure(1)
plot(x,y,'*',x,z,'b',x,p(2)+p(1)*x,'r',x,f(beta,x),'g');
legend('censored data','correct linear regression','wrong linear regression','nonlinear regression','Location','NorthWest')
For the third method, swap in:
f = @(bc,x) bc(1).*x .* (x < bc(2)) + bc(2) .* (x>=bc(2));
beta = nlinfit(x,y,f,[0 10])
0 comentarios
Ver también
Categorías
Más información sobre Get Started with Curve Fitting Toolbox en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!